Batch Size

LLM

Batch Size refers to the number of data samples processed by the model in one iteration before updating its internal parameters (weights).

Implementation

In LLM training, data is processed in batches (e.g., 4, 8, 32 sequences at a time) rather than one by one or all at once.

Batch Size = 1: Updates parameters after every single sample. High noise, slow training.
Batch Size > 1: More stable gradient estimates, better utilization of Parallel Computing resources (GPUs).

Trade-offs

Memory: Larger batch sizes require more VRAM.
Speed: Larger batches are generally faster per epoch due to parallelism.
Noise: Smaller batches introduce more noise, which can sometimes help generalization but makes training unstable.

Mike 3.0

Send a message to start the chat!

You can ask the bot anything about me and it will help to find the relevant information!

Try asking: