Neural Networks

Training phase

In AI work, the training happens in three main stages.

​The first is Pre-training. This stage takes the raw data—the truly massive stuff (think petabytes or exabytes) that is basically the complete internet data. This data is fed through a large neural network, and we get a base model. This model is definitely intelligent—it knows language structure and general facts—but it’s not yet optimized for our specific problems.

​Next, this base model is Fine-Tuned. For this, we use known input data paired with the target output we want. The model makes a prediction, and the loss (the error) is calculated. We then minimize this loss, and the weights of the network are adjusted accordingly. This gives us a fine-tuned model that’s ready for specific, targeted tasks.

​Finally, the model goes through an Alignment phase, often using Reinforcement Learning from Human Feedback (RLHF). As part of RLHF, the model generates several different outputs for a single prompt. Then, a human labeler or a Reward Model (RM)—which acts like our internal “grader program”—ranks or scores these outputs based on how helpful and safe they are. This feedback is used to further train the model, making it perfectly aligned with the desired target behavior.

Role of Tokenizer and Embeddings

​The input text is converted into tokens using a tokenizer. These tokens are IDs found in the model’s vocabulary. Models use different strategies to convert text into tokens. These tokens are then mapped into the embeddings.

​For an input text, we’ll have multiple tokens. Each token is assigned a single embedding vector. So, the entire text gets converted into a large matrix—specifically, a sequence of embedding vectors. This matrix is then processed by the LLM neural network.

​The neural network processes this matrix and generates the embedding vector for the next token. This new embedding vector is converted back into a token ID, which corresponds to the new token text (our output).

​This new token is then added back to the input sequence, and the process repeats. The generation will continue, token by token, until the network generates a special STOP sequence token, at which point the process will stop.