对比阅读 | opcnew

AI engineering circles have a consensus: 90% of RAG (Retrieval-Augmented Generation: the tech where LLMs query enterprise data before answering) projects deliver poor results, and the bottleneck is fundamentally not the LLM, but the retrieval layer failing to feed it the correct data.

What This Is

We notice that when most developers encounter an AI knowledge base giving irrelevant answers, their first reaction is to swap for a larger model or tweak the prompt. This completely misses the root cause. In a typical RAG architecture, the component actually failing is the retrieval phase.

First, vector similarity (a metric calculating semantic closeness of text) does not equal relevance. When a user asks "how to handle API timeouts," the system recalls numerous passages about "what is a timeout"—semantically similar but completely useless for solving the problem. Second, stuffing the LLM with a massive amount of loosely related content is not providing context; it is forcing the model to guess answers from noise, which is not only costly but also increases latency. Finally, the vast majority of projects chunk documents by a fixed character count, entirely ignoring paragraph logic and resulting in broken context.

The industry has already figured out solutions: using Hybrid Search (combining semantic and BM25 keyword matching) to solve the issue of precise nouns not being found; using Parent-Child Chunking (retrieving with small text chunks, and upon a hit, returning the parent large text chunk to supplement context) to solve information fragmentation; and using HyDE (Hypothetical Document Embedding: having the AI generate a hypothetical answer first, then using that to search for real documents) to handle colloquial user queries.

Industry View

We believe the industry is shifting from "LLM worship" to "data engineering worship." Rather than obsessing over model parameters, it is better to spend effort solidifying the retrieval pipeline; this is the true leverage point for reducing costs and increasing efficiency.

However, there are obvious dissenting voices regarding this approach: complicating the retrieval layer will significantly drive up engineering costs. Introducing hybrid search, reranking, and query rewriting means increased system latency, with each step potentially requiring extra model calls, making maintenance far harder than anticipated. Some architects point out that for clearly structured internal documents, a simple keyword search plus a bit of fine-tuning might perform no worse than fancy retrieval strategies; over-engineering can easily become a new trap that drags down project progress.

Impact on Regular People

For enterprise IT: Stop fixating solely on procuring the most expensive LLMs. Whether AI deployment succeeds or not is 80% determined by how well your data cleaning and retrieval pipelines are built.

For individual careers: The dividend period of merely knowing LLM prompting is fading; people who can organize messy enterprise data into "AI-friendly" structures will become more valuable.

For the consumer market: Enterprise knowledge base products will shift from "sounds smart but constantly fails" to "rigid but accurate," and B2B AI software that can actually help users solve specific problems will increase.

AI 工程圈有个共识：90% 的 RAG（检索增强生成：让大模型先查企业资料再回答的技术）项目效果差，瓶颈根本不在大模型，而是检索层没把正确的数据喂给它。

这是什么

我们注意到，大多数开发者遇到 AI 知识库答非所问时，第一反应是换更大的模型或改提示词。这完全没对症。在典型的 RAG 架构里，真正出问题的是检索环节。

首先是向量相似度（计算文本语义接近度的指标）不等于相关性。用户问“如何处理 API 超时”，系统召回了大量“什么是超时”的段落，语义相近但对解决问题毫无用处。其次，把大量松散相关的内容全塞给大模型，不叫提供上下文，叫让模型在噪声里猜答案，不仅成本高昂，还会增加延迟。最后，绝大多数项目用固定字数切分文档，完全无视段落逻辑，导致上下文断裂。

行业已经摸索出解法：用 Hybrid Search（混合检索：结合语义和 BM25 关键词匹配）解决精准名词查不到的问题；用 Parent-Child Chunking（父子分块：用小块文本去检索，命中后返回所属大块文本补充上下文）解决信息碎裂；用 HyDE（假设文档嵌入：让 AI 先生成假设性答案，再用它去搜真文档）来应对用户口语化提问。

行业怎么看

我们认为，行业正在从“大模型崇拜”转向“数据工程崇拜”。只盯着模型参数，不如花精力把检索链路做扎实，这才是真正能降本增效的着力点。

但这套做法也有明显的反对声音：检索层的复杂化会大幅推高工程成本。引入混合检索、重排和查询改写，意味着系统延迟增加、每一步都可能额外调用模型，维护难度远超预期。有架构师指出，对于结构清晰的内部文档，简单的关键词搜索加上一点微调，效果未必比花哨的检索策略差，过度工程化反而容易成为拖累项目进度的新陷阱。

对普通人的影响

对企业 IT：别再只盯着采购最贵的大模型了，AI 落地成不成功，八成取决于你的数据清洗和检索管道搭得好不好。

对个人职场：懂大模型提示词的红利期正在消退，能把企业散乱数据整理成“AI 友好”结构的人会更值钱。

对消费市场：企业级知识库产品将从“听起来聪明但总翻车”走向“死板但准确”，真正能帮用户解决具体问题的 B2B AI 软件会变多。

对比阅读：90% of Enterprise AI Knowledge Base Failures Lie in Retrieval, Not LLMs 与企业 AI 知识库总答非所问，90% 的败因在检索层而非大模型

90% of Enterprise AI Knowledge Base Failures Lie in Retrieval, Not LLMs