Analyze Earnings Calls with AI

Learn how we set up a RAG system to extract insights from live earnings calls of the biggest companies in tech and financial services.

01.02.24

It’s earnings season at Wall Street. Top companies are reporting their Q4 2023 results, and AI is making the headlines. On the 24th of January, ServiceNow CEO Bill McDermott was the first tech CEO to announce the significant impact that AI had on their business:

“In Q4, our GenAI products drove the largest net new ACV contribution for our first full quarter of any of our new product family releases ever”, he said.

Here at deepset, we were curious about GenAI’s impact on other companies so we went ahead and built a RAG solution covering some of the most important companies in tech and financial services. We transcribed earnings calls using OpenAI’s Whisper model and made them available in an LLM-powered question answering app. Now you can discover what other industry heavyweights like Google, JPMC, Microsoft, and others have to say about AI through our RAG app.

Intrigued? Read on to find out how we built it.

RAG: a 1-minute refresher

RAG is short for Retrieval Augmented Generation. It's a technique for feeding a large language model (LLM) with data it wasn't trained on. RAG is perfect for our purposes since none of the LLMs out there have ever been trained on the earnings reports published in the last three weeks. To answer questions about these reports, we first use a search system to find relevant sections from all earnings calls. We then input these sections into the prompt for the LLM. Based on that, the LLM generates a conversational, human-like answer to your question.

Data: Transcription, Embedding, and Indexing

As always in an AI project, the first step is to get the data ready. In a search system, when we talk about the data preparation stage, we call it an indexing pipeline. The indexing pipeline is responsible for all data pre-processing and for storing the pre-processed data in the so-called “index” of a vector database.

A flowchart titled 'Indexing Pipeline' showing the steps for processing earnings call data. It starts with a dashed box on the left containing text '[earnings-url-1, ...]', which represents a list of URLs. These URLs are processed by 'PyTube', a tool indicated by a blue rectangle, which extracts 'Earnings Call 1 Audio', 'Earnings Call 2 Audio', etc., shown in another dashed box. The extracted audio is then fed into 'Whisper Large', another blue rectangle, which produces transcripts such as 'Earnings Call 1 Transcript', 'Earnings Call 2 Transcript', and so on. The transcripts are passed to a 'Preprocessor' stage, resulting in processed documents, each labeled as 'doc 1', 'doc 2', etc., for calls 1, 2, and onwards. These documents are then sent to an 'Embedder', which converts them into numerical vectors (e.g., 'Vector 1: [1.67, 2.3, ...]'). Finally, the vectors are stored in a 'Document Store', completing the indexing process.

These are all the steps in our indexing pipeline:

  1. Collect URLs to earnings call recordings on YouTube.
  2. Download the audio for each URL using the Python library PyTube.
  3. Transcribe the audio using the OpenAI’s Whisper model.
  4. Split the earnings call transcripts into smaller chunks (we call them documents) because that’s better for vector search.
  5. Use an embedding model to embed each document in the vector space.
  6. Write each document with its embedding and metadata (company name and stock ticker) into our document database.

Adding the query pipeline

Now that the data preparation part of our app is ready, we need to add a query pipeline so that users can ask questions about the earnings calls. The query pipeline has two steps: retrieval and LLM. The retrieval step selects the documents to pass on to the LLM. The LLM then generates a response based on these documents.

We wanted to give you the best results, so we opted for a more sophisticated retrieval system that uses both vector and semantic retrieval. It first fetches 40 candidate documents and then uses a ranking model to rerank them by the most relevant first. This approach gives better performance compared to a single retrieval system. We then use the top 10 documents to prompt GPT-3.5 for an answer to the user’s question. GPT-3.5 sits in a sweet spot for cost, speed, and quality but you could pick other LLMs like Mistral 7B or Anthropic’s Claude too.

A flowchart titled 'Query Pipeline' illustrating the process of retrieving and processing information. On the left, a user query 'What is the impact of AI?' is inputted. This query is then processed in parallel through two different retrievers: 'Embedding Retriever' and 'BM25 Retriever'. Both retrievers fetch documents from their respective 'Document Stores'. The results from these retrievers are then sent to a 'Re-Ranker', which refines the search results. The re-ranked results are finally processed by an LLM, which generates an output on the right-hand side within a dashed outline box stating 'Based on your data, the impact of AI is...'. This represents the completed response to the initial query.

This is a rough sketch of the query system we have built. The result is a nicely condensed answer generated by GPT-3.5 that is entirely grounded in our data. That’s how you work around the training data cut-off and put more recent data into an LLM.

Want to build your own RAG system?

We went through this whole process using our AI platform deepset Cloud. It streamlines the development life cycle by providing a unified environment, ready-made yet customizable components, and robust tooling for evaluation. Learn more about deepset Cloud in this blog post or schedule a demo with our team.