Enterprise RAG on Your Own Infrastructure

Knowledge Graphs from Documents

Why vector search alone fails for relationship queries, entity and relationship extraction with Gemma 4, graph storage options, combining graph traversal with vector search, and keeping the graph fresh.

Where vector search fails

Vector search finds documents that are semantically similar to a query. This is powerful for most questions, but it fails systematically for a specific category: relationship queries.

Consider these questions:

  • "Who approved the change order that introduced the $2M liability?"
  • "Which contracts reference the subsidiary that was acquired in 2024?"
  • "Show me all projects managed by people who report to Sarah Chen."
  • "What vendors are connected to the procurement irregularity flagged in the audit?"

These questions are not about finding a document -- they are about traversing relationships between entities. The answer is not in any single chunk; it emerges from connecting information scattered across multiple documents.

Vector search cannot traverse relationships. It can find documents that mention "change order" and documents that mention "$2M liability," but it cannot connect them through the approval chain. A query like "Who approved the change order that introduced the $2M liability?" requires:

  1. Finding the change order that introduced the $2M liability (which might be in one document)
  2. Finding who approved that specific change order (which might be in a different document)
  3. Connecting the two through a shared identifier (the change order number)

This is a graph problem, not a similarity problem. And for enterprises with complex organisational structures, contractual relationships, and regulatory obligations, these graph queries are some of the highest-value questions the RAG system needs to answer.

?

A compliance officer asks: 'Which of our active contracts were signed by employees who have since left the company, and do any of those contracts have upcoming renewal dates?' Why can't standard vector search answer this effectively?