Test Time Compute (also known as Inference Time Compute) refers to the amount of computational resources used by a model during the inference stage (when it is generating an answer).
Key Concept
- Training a model involves a “Training Stage”.
- Using the model to get answers is the “Inference Stage”.
- Test Time Compute specifically measures the resources (tokens, time, processing power) consumed during this inference stage.
Example
- Low Test Time Compute: A regular LLM answers “11” using 3 tokens.
- High Test Time Compute: A reasoning model explains the steps (“Roger started with five balls…”, “5 + 6 = 11”) using 23 tokens.
In the second case, the model uses significantly more resources (7-8 times more) to generate the answer, which corresponds to “thinking more” to arrive at a better result.
