Enterprise RAG on Your Own Infrastructure

Chunking Strategies That Actually Work

Beyond fixed-size chunking -- semantic chunking, document-structure-aware approaches, multi-granularity indexing, parent-child patterns, synthetic query generation, and the precision-recall tradeoff.

Why fixed-size chunking is terrible

The default chunking strategy in every RAG tutorial is fixed-size: split the document every N tokens (typically 256-512), with some overlap (typically 50-100 tokens). It is simple, fast, and deterministic. It is also the single largest source of retrieval failures in production RAG systems.

Here is why. A 512-token fixed-size chunk is a arbitrary window that has no relationship to the document's semantic structure. Consider a legal contract:

...The Vendor shall deliver all equipment specified in Exhibit B
within thirty (30) calendar days of the Effective Date, subject to
[--- CHUNK BOUNDARY ---]
force majeure conditions as defined in Section 14.2. Failure to
deliver within the specified timeframe shall result in liquidated
damages of 0.5% of the total contract value per day of delay...

Chunk 1 contains the delivery obligation but not the penalty. Chunk 2 contains the penalty and force majeure reference but not the delivery timeframe. Neither chunk, on its own, answers the question "What happens if the vendor delivers late?" The semantic unit -- the complete delivery clause -- has been split across two chunks.

Overlap (including 50-100 tokens from the previous chunk) partially mitigates this, but it is a crude fix. The overlap might capture the tail of the delivery obligation in chunk 2, but it might not. And overlap increases your total vector count (and therefore storage cost and search time) by 10-30% without any guarantee of capturing the right information.

At enterprise scale, fixed-size chunking compounds these problems. A 50,000-page corpus might produce 500,000 chunks, of which 20-40% have semantic breaks at chunk boundaries. That is 100,000-200,000 chunks that are degraded retrieval units. Every one of them is a potential missed answer.

?

Your RAG system uses 512-token fixed-size chunks with 50-token overlap. Users report that the system can answer questions about individual facts but fails on questions requiring understanding of complete clauses or procedures. What is the most likely cause?