AI Infrastructure

Retrieval's Reckoning

Seventy-two percent of enterprise RAG projects stalled in 2025. Meanwhile, agentic RAG, graph retrieval, and multimodal search are quietly rewriting the playbook. Is RAG dead—or just growing up?

Listen
A neural network pulling data fragments from floating archives into its luminous core, visualizing the tension between retrieval and generation
01

The Brain Gets a Memory: Opus 4.6 Goes All-In on Agentic RAG

An AI agent orchestrating streams of documents like a conductor in a vast digital library

Here's the tell that RAG isn't dead: Anthropic just spent considerable engineering effort making its flagship model better at it. Claude Opus 4.6, now generally available on Azure Databricks, is explicitly tuned for "Agentic RAG" workflows—the model doesn't just consume retrieved documents, it reasons about whether the retrieval was good enough, requests more context when it isn't, and cross-validates conflicting sources before answering.

The early numbers are striking: a 40% reduction in reasoning errors when synthesizing conflicting information from retrieved documents. That's not an incremental improvement. That's the difference between a system that confidently parrots the first result and one that actually thinks about what it read.

The shift here is architectural, not cosmetic. We're moving from "chat with your documents" (the 2023 pitch that launched a thousand failed demos) to autonomous agents that treat retrieval as one tool among many. The LLM decides when to retrieve, what to retrieve, and whether to trust what came back. If RAG were truly dead, the most capable model lab in the world wouldn't be optimizing for it. They'd be optimizing around it.

Timeline showing RAG evolution from Naive RAG in 2023 to Multimodal Agentic RAG in 2026
The evolution of RAG: each generation adds a layer of intelligence to the retrieval process, from simple vector search to fully autonomous, multi-hop agent orchestration.
02

Forget Fishing—Retrieval Is Now Cartography

Glowing nodes connected by luminous paths, visualizing knowledge graph traversal

The single biggest knock against RAG has always been multi-hop reasoning. Ask a question that requires connecting facts from three different documents and traditional vector search falls on its face—it finds the most similar chunk, not the most useful path through your knowledge. Tracert-RAG changes the metaphor entirely: retrieval isn't fishing (cast a query, hope for a match), it's navigation.

The research team treats retrieval as a graph traversal problem, using a "direction-aware" algorithm that hops between related documents the way you'd navigate a city map. When the system encounters a question like "Which EU regulations affect the supply chain of the company that acquired the chip manufacturer mentioned in last week's earnings call?"—a question no single document can answer—it traces a path through the knowledge graph, collecting evidence at each stop.

This hit state-of-the-art on multi-hop QA benchmarks, beating standard semantic search by a significant margin. The implication: the "RAG can't reason" crowd was critiquing a specific implementation (naive vector search), not the architecture. Fix the retrieval, fix the reasoning. The authors' framing says it best: "We propose viewing retrieval not as fishing for a match, but as navigating a map of knowledge."

03

Open Science Gets an Open RAG Model—and It Beats GPT-4o

An open book radiating light into a constellation of interconnected research papers

If you want proof that RAG is the right architecture for accuracy-critical domains, look at OpenScholar. Built by AI2 and the University of Washington, it's a fully open-source RAG model optimized for scientific literature—and it outperforms GPT-4o and Claude 3.5 Sonnet on scientific QA benchmarks by grounding every claim in specific passages from a 45-million-paper index.

The key insight is that for domains where being wrong is unacceptable—medical research, legal analysis, scientific review—you cannot rely on a model's parametric memory. You need provenance. You need citations. You need to trace every assertion back to a source document. That's exactly what RAG provides and what pure long-context approaches sacrifice when they blur the line between "the model memorized this" and "the model retrieved this."

OpenScholar drastically reduces hallucinations by citing specific passages for every claim. This isn't a nice-to-have feature; in regulated industries, it's the difference between a tool and a liability. The "RAG is dead" argument presumes that context windows will grow forever and accuracy will magically follow. OpenScholar shows the opposite: grounding is the accuracy mechanism.

04

NVIDIA's Quiet Fix for RAG's Dirtiest Secret

A complex PDF being deconstructed into visual elements scanned by a precise laser grid

Ask any team that's deployed RAG in production and they'll tell you the same thing: the retrieval part isn't the hard part. The data ingestion part is. Specifically, turning the mess of PDFs, PowerPoints, and scanned documents that constitute "enterprise knowledge" into something a vector database can actually work with. Nemotron Parse, NVIDIA's new document understanding tool, attacks this head-on.

Instead of treating documents as text strings to be OCR'd (the approach that turns a beautifully formatted financial table into gibberish), Nemotron Parse uses vision-language models to understand documents as visual information. Charts get parsed as data relationships. Tables retain their structure. Slide layouts inform context. Early adopter Justt reported a 25% reduction in data extraction errors for financial chargeback analysis.

This matters because the 72% failure rate in enterprise RAG (more on that below) isn't primarily a model problem or a retrieval algorithm problem. It's a garbage-in, garbage-out problem. If the chunks you're searching over are mangled representations of source documents, no amount of sophisticated re-ranking will save you. NVIDIA is building the infrastructure layer that RAG has always needed—and it's betting that the investment is worth it.

05

Cache vs. Retrieve: The Real Debate That "RAG Is Dead" Missed

Two architectural forms facing each other: a vast archive with search beams versus a compact crystalline memory cube

The "RAG is dead" discourse got a new protagonist this week: Cache-Augmented Generation (CAG). The pitch is seductive—if your dataset fits within a model's context window (roughly under 1 million tokens for current frontier models), why bother with the complexity of retrieval? Just pre-load everything into the cache and let the model figure it out. Faster latency. Simpler architecture. Higher accuracy on contained datasets.

And for certain use cases, the proponents are right. A company FAQ with 500 entries? A product manual that's 200 pages? An internal style guide? Cache it. The overhead of building and maintaining a retrieval pipeline for a static, mid-sized corpus is genuine complexity you don't need.

Horizontal bar chart comparing RAG and CAG across six dimensions: latency, cost, dynamic data, scale, accuracy, and setup complexity
RAG dominates on scale and dynamic data handling; CAG wins on latency and setup simplicity. The right choice depends on your data characteristics, not ideology.

But here's what the "CAG kills RAG" take misses: scale, cost, and freshness. Pre-loading a million tokens into context for every query isn't free—it's expensive per-query compute. Your data changes? You rebuild the entire cache. Your corpus grows past the context window? You're back to retrieval. CAG is a legitimate optimization for a specific niche, not a replacement for RAG at enterprise scale. The real insight isn't "RAG vs. CAG." It's knowing which tool fits which problem—and most interesting problems require both.

06

The 72% Failure Rate and the €20 Million Success Story

A corporate skyscraper with half its floors lit in gold and the other half dark, representing enterprise AI project success versus failure

Let's reckon with the uncomfortable number: 72% of enterprise RAG pilots initiated in 2025 either stalled or were abandoned before reaching production. The primary failure modes are depressingly familiar—poor data quality, latency that made users switch back to Google, and hallucinations that eroded trust faster than any demo could build it.

Bar chart showing 28% abandoned before pilot, 44% stalled in pilot, 18% reached production, and 10% delivering ROI
Only 10% of enterprise RAG projects from 2025 are delivering measurable ROI—but the winners are seeing transformative returns. The gap between demo and production remains the industry's central challenge.

But that's only half the story. A major European bank using Squirro's RAG platform for compliance automation revealed this week that it saved €20 million over three years and redeployed the equivalent of 36 full-time employees. The system doesn't just find regulatory documents—it reasons about compliance gaps, acting as what the bank called "a tireless junior auditor." That's not a stalled pilot. That's an enterprise transformation.

The lesson isn't that RAG doesn't work. It's that the gap between a working demo and a production system is data hygiene, evaluation infrastructure, and organizational patience—not architecture. As the Codewave report puts it: "The gap between a working demo and a production RAG system is not code; it's data hygiene and evaluation infrastructure." The teams that invested in those foundations are the 10% collecting outsized returns. The easy phase is over. The hard, valuable phase has begun.

The Verdict: Not Dead, Just Shedding Its Skin

RAG in 2023 was a party trick—dump some vectors in a database, slap a chat interface on it, call it enterprise AI. That version of RAG deserved to die. But what's emerging in its place—agentic orchestration, graph-based multi-hop retrieval, multimodal document understanding, rigorous evaluation frameworks—is something fundamentally more capable. The question was never "RAG or no RAG." It was always "naive RAG or serious RAG." The serious version is just getting started.