Understanding Large Language Models (LLMs)

Source: TH

Subject: Science and Technology

Context: Bengaluru-based startup Sarvam AI recently launched two indigenous Large Language Models (35B and 105B parameters) at the AI Impact Summit 2026, marking a major milestone for India’s Sovereign AI ambitions.

About Understanding Large Language Models (LLMs):

What it is?

• A Large Language Model (LLM) is a type of Artificial Intelligence trained on vast amounts of text data to understand, generate, and manipulate human language.

• They are large because they contain billions of parameters—internal variables that the model learns during training to make predictions.

How it Works?

• Breaking text into tokens: An LLM doesn’t read whole words like humans; it splits text into tokens (word pieces/characters) so it can represent rare words, names, spellings, and grammar patterns efficiently.

• The Transformer “map”: Tokens get turned into vectors (embeddings) in a high-dimensional space, where semantic + syntactic similarity makes tokens closer helping the model generalize meaning.

• Self-attention mechanism: For each token, the model assigns attention weights to other tokens to decide what matters most, letting it link references, handle long dependencies, and resolve ambiguity (like what “it” points to).

• Predicting the next token: The model outputs a probability distribution over possible next tokens; generation is choosing tokens step-by-step, which is why it can sound fluent without knowing like a person.

• Layers of refinement: Many stacked transformer layers progressively build richer representations—lower layers catch form/grammar, higher layers capture relationships, intent, and reasoning patterns—then a final layer converts that into the next-token prediction.

Principles Behind Training:

• Pre-training: The model is fed petabytes of raw data (books, websites, code) and tasked with predicting the next word in a sequence. This helps it learn grammar, facts, and reasoning.

• Fine-Tuning: The model is further trained on narrower, high-quality datasets to perform specific tasks, like medical diagnosis or legal drafting.

• RLHF (Reinforcement Learning from Human Feedback): Human testers rank the model’s responses, teaching it to be more helpful, accurate, and safe.

• Compute Intensity: Training requires massive clusters of GPUs (Graphics Processing Units) and high electricity consumption, often costing millions of dollars.

Key Features:

• Generative Capability: Can create original text, code, poems, and summaries.

• In-context Learning: Can follow instructions or replicate a style based on a few examples provided in a prompt.

• Multilingualism: Can translate and understand multiple languages, though performance varies based on the training data.

• Zero-shot Reasoning: Ability to solve problems it has never explicitly been trained for by using general logic.