GraphRAG: Using the Power of Knowledge Graphs to Improve Retrieval and Generation

Let’s talk about the new graph-based approach to processing complex datasets for context-rich LLM responses

Retrieval augmented generation (RAG) has been lauded as a technique for getting better answers from LLMs by providing the context from a reliable and controlled data source. But as RAG has made its way into more and more production environments, the limitations of baseline RAG have started to show. Its answers are still based on the verbatim text in the underlying data, and it often lacks the ability to draw more general and high-level conclusions about a dataset like a human would.

Microsoft recently open-sourced GraphRAG, and it is proving to be a game changer in enhancing RAG techniques. 

By combining graph-based techniques at indexing and query time, GraphRAG is able to return much more informative and contextually relevant answers than RAG alone.

What's more, GraphRAG automates the construction of knowledge graphs using large language models (LLMs), making the concept more accessible to a broad user base and opening up the possibility of graph-based RAG to teams working on AI products.

Applications and Use Cases of GraphRAG

When given the ability to ask anything about a subject, users often start with general questions. For example, to explore a database of earnings calls (the data we use in the demo linked at the end of this article), users might ask:

  • "What companies are in the dataset?"
  • "What are the top 5 themes in the data?"
  • "Which companies are investing in AI?"

These are all questions that RAG struggles with because the information is not literally in the data, but must be inferred from the knowledge base as a whole. GraphRAG, on the other hand, is excellent at answering these questions – and all questions about the relationship between different entities in the data in general.

In document collections about a particular domain or topic, many interrelated ideas and entities are often spread across documents. GraphRAG can map complex networks of information, providing a comprehensive view of the subject landscape. This holistic representation helps users find and analyze information more effectively. Some examples of such domains are:

  • Financial analysis and reporting
  • Legal document review and contract analysis
  • Medical research and literature review
  • News aggregation and summarization
  • Product reviews and sentiment analysis

GraphRAG transforms a collection of separate documents into an interconnected web of knowledge, revealing the underlying structure of information for a deeper understanding and more effective analysis. 

This makes it a powerful tool for anyone working with large, complex collections of textual information, unearthing insights that might otherwise remain hidden in the vastness of the data.

How Does GraphRAG Work?

At its core, GraphRAG works by constructing a knowledge graph from a given set of documents. This involves identifying key entities within the texts – such as people, places, concepts, or events – and representing them as nodes in a graph structure.

In GraphRAG, and this is a major innovation, the LLM plays a crucial role in the creation of graphs. Graphs have long fascinated people because of their superior ability to represent information and complex relationships between data points. But until now, creating graphs from text documents has been difficult. This has now changed as LLMs have become much better at information extraction.

When generating an answer, rather than basing it on a few documents at a time (as normal RAG would), GraphRAG can access different levels of information about interconnections in the data. This allows it to take both a birds-eye view of the entire knowledge base, while also being able to zoom in and observe more granular connections between data points. 

Here’s the process step by step:

Pre-processing and indexing

1. Ingest textual data and extract entities and their relationships using an external LLM like GPT-4.

2. Map entities to one another through edges that contain detailed information about their relationships. 

3. Organize relationships and entities into hierarchical "communities" – semantic clusters of related topics at varying levels of abstraction.

4. Summarize semantic concepts revealed by the clustered graph.

Retrieval and answer generation

5. Map incoming queries to the top matching context in an iterative process.

6. Pull relevant entities and relationships into the prompt, thus dramatically augmenting the context provided to LLM with each query.

7. Utilize an LLM to generate summarization-focused responses based on the enriched context.

Benefits of GraphRAG at a Glance

Products that use GraphRAG can provide users with a more complete understanding of complex datasets at varying levels of detail. This technology enables applications to handle complex queries that traditional RAG systems struggle with, providing users with richer, more contextually relevant answers. By implementing GraphRAG, AI product teams can deliver superior results where traditional approaches fall short. Key benefits for users of GraphRAG-powered systems include:

  • Comprehensive, contextual answers across complex domains.
  • The seamless integration of information from multiple documents.
  • Advanced reasoning about entity relationships and contexts.
  • Uncovering hidden insights and connections within datasets.
  • Effective handling of abstract, high-level queries.
  • Mitigation of misinformation in noisy datasets.

It's clear that a structured, hierarchical approach to interconnected data and the entities within it is superior to a purely semantics-based approach, and that many industries, especially document-heavy ones, are going to benefit from this innovative technique for document processing.

deepset Cloud Demo of GraphRAG on Earnings Call Transcripts 

To experience the power of GraphRAG firsthand, check out our latest demo comparing quarterly earnings call transcripts from a wide array of companies across various sectors, including software, financial services, and automotive. Have a look at the side-by-side comparison of GraphRAG's graph-assisted approach versus a traditional RAG setup – and let us know what you think of the results! 🙂

What’s Next

GraphRAG is a big step forward in information retrieval technology because it bridges the gap between traditional RAG systems and graph-based knowledge structures. As we've described in this article, it handles complex knowledge base queries better than previous methods. This makes it useful for AI teams, ML engineers, and product managers working on advanced information retrieval systems, especially those dealing with volumes of interconnected data.

We'll probably be hearing a lot more about GraphRAG in the coming weeks. We expect to see many advances in minimizing cost and latency in particular. The benefits of this technique for information retrieval and insight mining are immense and waiting to be explored by researchers and industry leaders alike.