What Are Weights?
the numerical values inside a machine learning model that store what the model has learned during training
Definition
Weights are the numerical values inside a machine learning model that store what the model has learned during training. They determine how the model processes information, recognizes patterns, and generates outputs. In a large language model (LLM), the weights collectively represent the knowledge and behavior the model acquired from analyzing vast amounts of text.
Although people often say that a model “knows” something, that knowledge is not stored as sentences, facts, or rules. Instead, it is encoded within billions of numerical weights distributed throughout the model’s neural network. When a model generates a response, it uses these weights to calculate which output is most appropriate based on the input it receives.
Why It Matters
Weights are the most important part of a trained AI model. Without them, the model is little more than an empty architecture that knows nothing.
Whenever someone downloads a language model, they are primarily downloading its weights. Likewise, when developers release a new version of a model or convert it into a GGUF file, they are packaging those learned weights so they can be loaded and used for inference.
Understanding weights also makes many other AI concepts easier to grasp. Terms such as parameters, quantization, fine-tuning, and model size all revolve around the weights stored inside a model.
How It Works
A useful analogy is to think of learning to play the piano.
A beginner starts with little understanding of which keys to press. After years of practice, the pianist develops instincts about melodies, timing, and harmony.
The pianist’s knowledge is not stored as a list of every song ever learned. Instead, it exists as countless small adjustments made through experience.
Weights work in a similar way.
During training, a neural network begins with randomly initialized weights. It repeatedly processes examples from its training data and compares its predictions with the correct answers.
Whenever the model makes a mistake, a training algorithm slightly adjusts millions—or even billions—of weights.
Each individual adjustment is tiny.
After trillions of these adjustments, the collection of weights gradually captures patterns found in the training data.
No single weight corresponds to a specific fact such as “Paris is the capital of France.”
Instead, information is distributed across enormous numbers of weights working together.
This distributed representation is one reason modern language models can generalize to situations they have never encountered exactly before.
It is also why removing or changing a single weight has almost no noticeable effect. Meaning emerges from the interactions among billions of weights rather than from individual numbers.
Technically, weights are stored inside structures called tensors, which are multidimensional arrays of numbers.
Each layer of a neural network contains one or more tensors holding the weights for that part of the model.
When text is entered into a language model, information flows through these layers. At every stage, the model performs mathematical operations using its weights to transform the input into increasingly abstract representations.
Eventually, these calculations allow the model to predict the next token—the next piece of text it believes is most likely to follow.
This process happens extremely quickly, often involving billions or even trillions of individual calculations for a single response.
Weights are also closely related to the term parameters.
In most discussions of neural networks, every trainable weight is considered a parameter.
When someone describes a model as having 7 billion parameters or 70 billion parameters, they are essentially referring to the number of trainable weights inside the network.
The size of a model therefore depends largely on how many weights it contains.
However, the storage required for those weights depends on their numerical precision.
For example, each weight might originally be stored using 16-bit or 32-bit numbers.
Techniques such as quantization reduce the number of bits used to represent each weight.
This makes the model significantly smaller and faster to run while preserving most of its capabilities.
Importantly, quantization changes how the weights are stored, not what they fundamentally represent.
A GGUF file is largely a container holding these weights, together with metadata describing how they should be interpreted.
The weights themselves remain the heart of the model.
Common Misconceptions
Misconception: Each weight stores a specific fact.
Weights do not function like entries in a database. Knowledge is distributed across billions of weights, with many contributing to every prediction the model makes.
Misconception: Weights are the same as training data.
A model does not contain copies of the documents it was trained on. Instead, the training process adjusts the weights so the model learns statistical patterns from the data.
Misconception: Larger weights mean more important knowledge.
The numerical value of a weight does not directly indicate its importance. A model’s behavior emerges from the combined interactions of many weights rather than from individual values.
Misconception: Quantization changes what a model has learned.
Quantization reduces the numerical precision used to store weights. While this may slightly affect performance, it does not fundamentally retrain the model or replace its learned knowledge.
Misconception: Two models with the same architecture have the same abilities.
The architecture defines the structure of a neural network, but the weights determine what it has learned. Two identical architectures with different weights can perform very differently.
Related Terms
Neural Network
Weights are one of the core building blocks of a neural network. Understanding the overall structure of neural networks makes it easier to see how weights work together across multiple layers.
Parameter
A parameter is a trainable value within a machine learning model. In most neural networks, the terms weight and parameter are closely related, making this one of the most important concepts to understand alongside weights.
Tensor
Weights are stored inside tensors, which are multidimensional arrays of numbers. Learning about tensors explains how billions of weights are organized efficiently inside modern AI models.
Transformer
The Transformer architecture organizes weights into layers, attention mechanisms, and feed-forward networks. Understanding Transformers reveals how these weights are used during inference.
Training
Training is the process that creates and refines a model’s weights. Every improvement the model makes during learning comes from adjusting these numerical values.
Fine-Tuning
Fine-tuning modifies an already trained model by updating some or all of its existing weights. This allows developers to specialize a general-purpose model for new tasks or behaviors.
Quantization
Quantization compresses a model by storing its weights with lower numerical precision. It is one of the key techniques that makes large language models practical on consumer hardware.
GGUF
A GGUF file primarily consists of a model’s weights together with metadata describing how they should be loaded. Understanding weights makes the purpose of the GGUF format much clearer.
Inference
During inference, the model uses its learned weights to process an input and generate predictions. This is the stage where all the knowledge stored in the weights is put to practical use.

