Test Time Compute

Test Time Compute (also known as Inference Time Compute) refers to the amount of computational resources used by a model during the inference stage (when it is generating an answer).

Key Concept

Training a model involves a “Training Stage”.
Using the model to get answers is the “Inference Stage”.
Test Time Compute specifically measures the resources (tokens, time, processing power) consumed during this inference stage.

Example

Low Test Time Compute: A regular LLM answers “11” using 3 tokens.
High Test Time Compute: A reasoning model explains the steps (“Roger started with five balls…”, “5 + 6 = 11”) using 23 tokens.

In the second case, the model uses significantly more resources (7-8 times more) to generate the answer, which corresponds to “thinking more” to arrive at a better result.

Key Concept

Example

Chat with Mike 3.0