Pre-training

Pre-training is a technique in machine learning, specifically a form of Transfer Learning, where a model is first trained on a large dataset to learn general features and patterns before being adapted (or fine-tuned) for a specific downstream task.

The core intuition is that it is easier to solve a specific problem if you already have a general understanding of the domain.

Core Mechanism

The workflow typically consists of two stages:

Pre-training: The model is trained on a massive amount of generic data (e.g., the entire internet, ImageNet) to learn broad representations. This is often the most computationally expensive phase.
Fine-tuning: The pre-trained “base model” is then updated using a smaller, task-specific dataset to specialize its performance.

Types of Pre-training

1. Unsupervised / Self-Supervised

This is the dominant paradigm for Large Language Model. The model learns from the internal structure of the data without explicit labels.

Method: Next Word Prediction (Casual Language Modeling) or Masked Language Modeling (like BERT).
Goal: To learn the statistical probability of language, grammar, and world knowledge.

2. Supervised

Common in older Computer Vision workflows (e.g., ResNet).

Method: Training on a fully labeled dataset like ImageNet (classifying images into 1000 categories).
Goal: To learn feature extractors (edges, textures, shapes) that can be transferred to other visual tasks.

Role in Large Language Models

In the context of LLMs, Pre-training is the phase where the model gains its “intelligence” or base capabilities.

It consumes raw text from diverse sources (books, code, web data).
It is purely probabilistic: The model learns to complete text plausible, but does not yet follow instructions or act as an assistant.
The output of this stage is a Foundational Model (or Base Model), which is then refined via Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL) to become a helpful assistant. (Reasoning Model Blueprint (SFT + RL))

Pre-training

Pre-training

Core Mechanism

Types of Pre-training

1. Unsupervised / Self-Supervised

2. Supervised

Role in Large Language Models

Chat with Mike 3.0