Stride is a parameter in the Sliding Window Technique that determines how many token positions the window moves forward to create the next input batch. It controls the amount of overlap between consecutive training samples.
Function
- Stride = 1: The window moves one token at a time. This creates maximum overlap between batches. E.g., Input 1: “In the heart of”, Input 2: “the heart of the”.
- Stride = Context Size: The window moves by the full length of the input. This results in no overlap between batches, ensuring the model sees unique chunks of text in each step.
Trade-offs
- Smaller Stride: More training samples, but high redundancy and potential for overfitting due to overlap.
- Larger Stride (e.g., equal to Context Size): Faster processing (less computation), less redundancy, and full dataset coverage without overlap.
