Analyze Earnings Calls with AI
Learn how we set up a RAG system to extract insights from live earnings calls of the biggest companies in tech and financial services.
01.02.24
It’s earnings season at Wall Street. Top companies are reporting their Q4 2023 results, and AI is making the headlines. On the 24th of January, ServiceNow CEO Bill McDermott was the first tech CEO to announce the significant impact that AI had on their business:
“In Q4, our GenAI products drove the largest net new ACV contribution for our first full quarter of any of our new product family releases ever”, he said.
Here at deepset, we were curious about GenAI’s impact on other companies so we went ahead and built a RAG solution covering some of the most important companies in tech and financial services. We transcribed earnings calls using OpenAI’s Whisper model and made them available in an LLM-powered question answering app. Now you can discover what other industry heavyweights like Google, JPMC, Microsoft, and others have to say about AI through our RAG app.
Intrigued? Read on to find out how we built it.
RAG: a 1-minute refresher
RAG is short for Retrieval Augmented Generation. It's a technique for feeding a large language model (LLM) with data it wasn't trained on. RAG is perfect for our purposes since none of the LLMs out there have ever been trained on the earnings reports published in the last three weeks. To answer questions about these reports, we first use a search system to find relevant sections from all earnings calls. We then input these sections into the prompt for the LLM. Based on that, the LLM generates a conversational, human-like answer to your question.
Data: Transcription, Embedding, and Indexing
As always in an AI project, the first step is to get the data ready. In a search system, when we talk about the data preparation stage, we call it an indexing pipeline. The indexing pipeline is responsible for all data pre-processing and for storing the pre-processed data in the so-called “index” of a vector database.
These are all the steps in our indexing pipeline:
- Collect URLs to earnings call recordings on YouTube.
- Download the audio for each URL using the Python library PyTube.
- Transcribe the audio using the OpenAI’s Whisper model.
- Split the earnings call transcripts into smaller chunks (we call them documents) because that’s better for vector search.
- Use an embedding model to embed each document in the vector space.
- Write each document with its embedding and metadata (company name and stock ticker) into our document database.
Adding the query pipeline
Now that the data preparation part of our app is ready, we need to add a query pipeline so that users can ask questions about the earnings calls. The query pipeline has two steps: retrieval and LLM. The retrieval step selects the documents to pass on to the LLM. The LLM then generates a response based on these documents.
We wanted to give you the best results, so we opted for a more sophisticated retrieval system that uses both vector and semantic retrieval. It first fetches 40 candidate documents and then uses a ranking model to rerank them by the most relevant first. This approach gives better performance compared to a single retrieval system. We then use the top 10 documents to prompt GPT-3.5 for an answer to the user’s question. GPT-3.5 sits in a sweet spot for cost, speed, and quality but you could pick other LLMs like Mistral 7B or Anthropic’s Claude too.
This is a rough sketch of the query system we have built. The result is a nicely condensed answer generated by GPT-3.5 that is entirely grounded in our data. That’s how you work around the training data cut-off and put more recent data into an LLM.
Want to build your own RAG system?
We went through this whole process using our AI platform deepset Cloud. It streamlines the development life cycle by providing a unified environment, ready-made yet customizable components, and robust tooling for evaluation. Learn more about deepset Cloud in this blog post or schedule a demo with our team.