900K-Token RAG Test: Simplest Line Split Wins; Enterprise KBs Stop Overpaying

The Vecta team tested 7 AI document chunking methods across 905,746 tokens, and the conclusion was surprising: the "dumbest" method—recursive splitting by newline—achieved 69% accuracy, far surpassing so-called "intelligent semantic chunking." Enterprise RAG knowledge bases built at great expense are likely failing at the most basic step: chunking.

What This Is

RAG (Retrieval-Augmented Generation—the technique of having LLMs retrieve enterprise internal documents before answering questions) is currently the mainstream approach for enterprise AI adoption. But LLMs can't process dozens of pages of long documents in one go; the documents must first be sliced into smaller pieces—a step called Chunking.

We've noticed that many developers spend copious time tuning prompts or switching to more expensive models, yet treat chunking strategy as an afterthought: just cut at 512 characters with 50 characters of overlap, and call it done. This is the root cause of poor RAG quality. When text is compressed into embeddings (converting text into mathematical vectors for similarity comparison), mixing multiple topics in one chunk dilutes the semantics; if the parties and dates in a contract get split into different chunks, structural information is lost entirely. Chunking quality sets the ceiling for all downstream optimizations.

Industry View

The industry is converging on a consensus: when it comes to chunking, simple often beats fancy. Vecta's benchmark shows that Recursive (splitting recursively at natural boundaries like newlines and periods) holds first place at 69% accuracy, while the theoretically perfect Semantic Chunking (splitting at semantic similarity breakpoints) managed only 54%. The latter requires calling an embedding API for every split—costly, slow, and producing chunks of wildly inconsistent sizes—resulting in worse real-world performance.

In production environments, the industry increasingly recommends the Parent-Child chunking strategy: use small chunks for precise retrieval, then provide the LLM with the full parent chunk as context upon a hit. But we should pay attention to sharp dissenting voices on Reddit, which point out that current chunking optimizations merely cater to the convenience of vectorization—they don't align with how humans actually use documents. No form of automated chunking can truly understand the structural logic of specific business documents; over-indexing on chunking algorithms while neglecting the document's own formatting and metadata is putting the cart before the horse.

Impact on Regular People

For enterprise IT: when procuring AI knowledge bases, don't get fooled by buzzwords like "intelligent semantic chunking." Prioritize evaluating whether the vendor supports chunking debugging capabilities based on document heading hierarchy and paragraph structure.

For individual professionals: when using AI to process long documents, format your documents well first (use more line breaks and hierarchical headings). The clearer your document structure, the more accurate the AI's chunking—and the more reliable its answers.

For the consumer market: if AI reading assistants frequently "hallucinate," it's often not because the model is dumb—it's because the backend chunked your document at the wrong places, losing critical context.

900K-Token RAG Test: Simplest Line Split Wins; Enterprise KBs Stop Overpaying

What This Is

Industry View

Impact on Regular People

Related Reading

90% of Enterprise AI Knowledge Base Failures Lie in Retrieval, Not LLMs

LangChain Breaks AI Into 4 Components: Orchestration Layer, Not Just Framework

Traditional DBs Fail at AI Semantics: Vector DB Selection Decides Knowledge Base Fate

RAG Architectures Split From 1 to 9: Production AI Ditches 'Good Enough'

Copilot's Token Billing Shift: AI Giants Pass the Tab to Developers

YC: Top AI Firms Are Fully Queryable—But No Product Connects It All