Neural Networks

Training phases

In AI work, the training happens in three main stages.

​The first is Pre-training. This stage takes the raw data—the truly massive stuff (think petabytes or exabytes) that is basically the complete internet data. This data is fed through a large neural network, and we get a base model. This model is definitely intelligent—it knows language structure and general facts—but it’s not yet optimized for our specific problems.

​Next, this base model is Fine-Tuned. For this, we use known input data paired with the target output we want. The model makes a prediction, and the loss (the error) is calculated. We then minimize this loss, and the weights of the network are adjusted accordingly. This gives us a fine-tuned model that’s ready for specific, targeted tasks.

​Finally, the model goes through an Alignment phase, often using Reinforcement Learning from Human Feedback (RLHF). As part of RLHF, the model generates several different outputs for a single prompt. Then, a human labeler or a Reward Model (RM)—which acts like our internal “grader program”—ranks or scores these outputs based on how helpful and safe they are. This feedback is used to further train the model, making it perfectly aligned with the desired target behavior.

Role of Tokenizer and Embeddings

​The input text is converted into tokens using a tokenizer. These tokens are IDs found in the model’s vocabulary. Models use different strategies to convert text into tokens. These tokens are then mapped into the embeddings.

​For an input text, we’ll have multiple tokens. Each token is assigned a single embedding vector. So, the entire text gets converted into a large matrix—specifically, a sequence of embedding vectors. This matrix is then processed by the LLM neural network.

​The neural network processes this matrix and generates the embedding vector for the next token. This new embedding vector is converted back into a token ID, which corresponds to the new token text (our output).

​This new token is then added back to the input sequence, and the process repeats. The generation will continue, token by token, until the network generates a special STOP sequence token, at which point the process will stop.

The Three Phases of AI Training

​There are three phases of training.

Phase 1: Pre-training

Pre-training generates an “internet document simulator”. This is like a lossy compression of the internet. This is just a “token tumbler”. Given a sequence of tokens, it tries to predict the next token.

Phase 2: Supervised Fine-Tuning

​But we need more than a token generator; we need an assistant. To adopt this assistant-like persona, there is a need to tweak the parameters. To do this, we need another set of training data in the form of conversations between a user and an assistant. We need many such questions and answers.

​Human labellers are hired by AI companies to produce these sets. The model is then further trained on these documents. This training is called Supervised Fine-Tuning. We can also use a few other models to generate these question-answer sets. Human labellers can then review and correct these documents.

Phase 3: Reinforcement Learning

​The third stage of training is Reinforcement Learning. As part of this technique, questions are presented to the model, and the model will produce various answers to a single question. A human labeller or a judge model can then select the most appropriate answer from the set. There is then a need to run this new data on the model so that the model’s parameters can be further optimized to follow the right path.

The Compute Challenge

​Generally, these models cannot perform a very big compute in a single pass. Therefore, it is better for them to break the problem into multiple stages and statements. This gives the model “ample pauses,” as the final answer is produced after various other rounds of token generation.

The Use of External Tools

​Models are generally not proficient at complex “mental math” problems because they rely on the lossy, compressed knowledge acquired from the internet during pre-training. Consequently, they often “mess up” mental calculations.

  • Tool Integration: It is recommended to ask the model to use specific tools for these problems.
  • Examples: This is particularly useful for counting characters in a substring or solving math problems.
  • Programming: A model will perform significantly better if it converts a puzzle into a Python program and then runs that program to find the answer.

The Mechanics of Tokenization and Domain Training

Data as Token Sequences

Any data that the network uses in the training can only be the sequence of tokens. In the first training phase, the internet text is converted into tokens, and then these one-dimensional tokens are used by the network to predict the next token.

Structuring Conversations in Supervised Fine-Tuning

In the Supervised Fine-Tuning phase, questions and answers are also converted into a single sequence of tokens. To manage this, special “Conversation State” tokens are used—such as user, assistant, and conversation end tokens—so that entire conversations can be converted into a single array of tokens.

Specialized Tasks and Domain Expertise

Supervised fine-tuning data also includes instructions for the model to perform specific tasks, such as suggesting a web search for a question. There are special tokens designated for these actions as well.

​This phase is an important step: if we want a model to behave like an expert in our specific domain, we must create these custom data points and use them to train the model further.