Enterprise RAG on Your Own Infrastructure

Embedding Models Demystified

What embedding models actually do, the MTEB leaderboard, top open-source options for 2025-2026, Matryoshka embeddings, multilingual considerations, and the critical distinction between embedding and generative models.

What embedding models actually do

An embedding model takes a piece of text -- a sentence, a paragraph, a document chunk -- and converts it into a list of floating-point numbers. That list is called a vector, and it typically has 768 to 4096 dimensions.

The mathematical intuition is this: the model learns to place semantically similar texts close together in a high-dimensional space, and semantically different texts far apart. "The quarterly revenue exceeded projections" and "Q3 earnings beat analyst estimates" should produce vectors that are close together (high cosine similarity), while "The quarterly revenue exceeded projections" and "The office kitchen needs new coffee filters" should produce vectors that are far apart.

How does the model learn this? Through contrastive training. The model is shown millions of pairs: (query, relevant document) and (query, irrelevant document). It learns to produce vectors where the relevant pair has high similarity and the irrelevant pair has low similarity. The specific architecture is usually a transformer encoder (similar to BERT, but much larger and better trained) that processes the input text and produces a single fixed-size vector as output.

Critically, embedding models are encoders, not generators. They process text in one direction -- text in, vector out. They cannot produce text. They cannot answer questions. They cannot reason. They are a mapping function from the space of all possible texts to a point in high-dimensional geometric space. That mapping is what makes similarity search possible: instead of comparing texts linguistically, you compare their geometric positions.

This is fundamentally different from what generative models like Gemma 4 do. A generative model takes text in and produces text out. It can reason, follow instructions, and synthesise information. But it cannot efficiently produce fixed-size vector representations of meaning. These are different architectures solving different problems.

?

An embedding model produces a 1024-dimensional vector for the text 'Annual revenue was $4.2 billion.' Which of the following is true about this vector?