Building AI Agents with LLMs

TLDR

Key Metrics:

What AI experts have long anticipated is now a reality: Large language models (LLMs) are mature enough to serve as the brains behind autonomous AI agents that can understand nuanced context, intelligently operate a variety of tools, and perform multi-step tasks with minimal human intervention.

‍

Unlike other established applications of generative AI, an agent doesn't just process information in a predefined order; it actively pursues goals and decides flexibly on the right tools to achieve it. Consider, for example, a personalized coding assistant. Rather than simply suggesting code snippets, it actively understands the developer's project context, navigates development tools and APIs, breaks down complex programming tasks into actionable steps, and becomes an active, useful assistant that extends the developer's problem-solving capabilities.

‍

At deepset, our work with organizations across industries gives us insight into the immense real-world value of agents. This guide provides a low-barrier introduction to current agent technology, helping business leaders and practitioners understand the possibilities and practicalities of LLM-powered agents.

What is an agent?

Agency involves understanding a situation, weighing options, and choosing the best path forward. While humans do this naturally, creating machines that can do this has been one of the most challenging and interesting aspects of AI.

‍

Before LLMs, AI agents were often designed for specific tasks. Chess engines, for example, learned to beat grandmasters, and robotic systems could navigate physical spaces. However, these agents could only operate within their highly specialized domains, lacked natural language interfaces, and were inaccessible to a wider user base.

‍

LLMs have changed this landscape. LLM agents can follow natural language instructions and use their general knowledge and domain expertise to devise customized solutions to complex problems. They can break these solutions down into manageable steps using different tools, data sources, and integrations. This not only makes them more likely to achieve their goal, but it also makes them more accountable to their users-because the agent has to provide insight into the steps it is taking to achieve its goal, rather than operating in a black box.

‍

Modern AI agents are the result of some or all of the following building blocks and capabilities working together:

‍

A reasoning unit (typically an LLM) that can understand context and make decisions.
Access to a collection of diverse tools. These can be as simple as a calculator API or as complex as a data analysis pipeline.
Memory systems that can maintain context across multiple interactions and handle ongoing conversations.
Planning and routing capabilities to create solutions, divide them into subtasks, and execute them.

Zooming in

Let's take a closer look at the characteristics we've identified as contributing to agentic systems, using a hypothetical agent as a case in point.

Tools and function calling

Tools are the interfaces through which an agent interacts with its environment. Consider a press release agent that writes and distributes company announcements. It might orchestrate the following tools:

‍

A content research tool that has access to selected databases to gather product information and market context.
A writing assistant, consisting of an API to an LLM-powered editing tool.
An SEO analytics platform to optimize visibility and impact.
An email integration to manage press release distribution.
A performance analytics platform to track press release engagement.

‍

The agent interacts with its tools through carefully defined functions (APIs), each of which performs a specific task. For the writing assistant tool, these might be:

‍

check_tone: Ensures content is consistent with brand voice parameters and style guide.
format_quotes: Correctly formats quotes from executives or stakeholders.
keyword_check: Ensures press release meets SEO requirements.

‍

Well-defined tools and APIs, combined with clear instructions on when and how to use them, give agents reliable ways to interact with their environment and accomplish specific tasks.

Memory systems

Memory provides context and continuity for agents through two distinct types: short-term memory for maintaining context within a current session, and long-term memory for retaining information across different tasks. Short-term memory acts like a context window or conversation buffer, while long-term memory typically uses vector stores or traditional databases for persistent knowledge storage.

‍

Our press release agent uses short-term memory to keep track of its current execution state. This includes key messages used in the current draft, reviews and instructions given by a human editor, and SEO optimization cycle results. Its long-term memory stores a rich knowledge base of past releases, their performance in terms of clicks and SEO ranking, automatically derived style preferences from the editor and feedback received between sessions. While short-term memory is already commonly used in agents, general long-term memory is less spread in production use cases as it can complicate the explainability of an AI system in production over time and can often be replaced by more specific solutions (e.g. RAG tool).

How agents work: routing, planning, and looping

Agents combine several key mechanisms to tackle complex tasks effectively:

‍

Planning helps them break big tasks down into small, easy-to-do steps. They use different planning methods, like chain-of-thought (developing steps one at a time) or tree-of-thought planning (exploring and evaluating different paths of action ahead of time).
Routing, which can be considered a subset of planning, allows them to evaluate and choose among multiple possible actions based on context and goals, introducing true multidirectionality to their work.
Reflection allows them to analyze intermediate results during execution and derive next steps for improvement.
Looping: Unlike linear systems, agents can also loop back to revisit and refine steps until they achieve the desired results.

‍

Our press release agent demonstrates how these mechanisms work together: It plans the path from brief to distribution, adjusting its approach when competitive research suggests new angles. It makes routing decisions about which content generation approach fits the product type and which distribution channels match the company's target audience. Throughout the process, it runs internal reflection loops–checking against style guides, validating technical claims, and optimizing SEO until quality thresholds are met.

‍

These combined capabilities enable agents to handle complex, open-ended tasks by continuously evaluating progress and adjusting their approach based on interim findings.

Building agents within the Compound AI framework

At deepset, we build Compound AI systems that combine multiple AI components, each designed for a specific task, to solve complex problems. This modular approach is perfect for agents because it

‍

Equips agents with a wide variety of tools (ready-made from the vast OSS ecosystem or custom, user-defined).
Allows tools to be reused across multiple agents.
Provides well-defined, atomic components and tools that set clear limits on the agent's powers and make each of its executions explainable by creating granular traces of its activity.

‍

Plus, the Compound AI framework helps clarify one of the more controversial topics in the agent discussion: when a system qualifies for the term "agent" in the first place:

‍

One of the best known system designs for using generative AI is retrieval augmented generation (RAG), which has a document retrieval component that comes before an LLM component. Such a system can be easily extended to adapt to changing business needs. For example, one could add external API calls to expand the system's ability to query different knowledge bases. Since this introduces a routing and decision component, the question arises whether such a system should be called an agent or not.

‍

To address this issue, the AI community has begun to use the term "agentic" to describe systems on a spectrum between linearity and autonomy. Systems become more agentic as they gain decision-making capabilities. The modular nature of compound AI allows AI builders to start simple and incrementally expand system capabilities.

‍

Consider how our press release system might have evolved. It could have started as a RAG-based writing assistant that generates announcements by combining public documentation with corporate style prompts. Adding style checking and SEO tools introduces refinement loops, while distribution capabilities require planning and routing decisions. Each addition moves the release writer further into agentic territory.

‍

This incremental approach helps teams make practical decisions about system architecture: how to structure prompts, what kind of monitoring to implement, how to handle errors, and when human oversight is necessary. Rather than building complex agent systems from scratch, you can start with basic components that solve your immediate problem, and then evolve the system as you gain expertise and a more refined understanding of the problem you're solving.

Evaluating agents

Agents, with their ability to perform complex processes internally with a dynamic number of steps involved, bring a new level of complexity to the task of evaluation. Because they are more flexible in their execution than other GenAI systems, they also require a high degree of observability to ensure that they're serving their purpose.

‍

Evaluating agents means tracking both outcomes and process efficiency. In addition to measuring the quality of an agent's output, we also need to understand how well the agent uses its decision-making and tool use capabilities. Compound AI provides a number of approaches to do this:

‍

Metrics are quantitative measurements that track the performance and efficiency of an agent system, such as completion times and accuracy of output. They provide concrete data points to evaluate how well the system is meeting its objectives and where improvements may be needed.
Logging captures detailed records for each event in the agent system, from LLM calls through tool selection and validation, allowing teams to identify specific points of failure or inefficiencies and make targeted improvements.
Tracing visualizes the detailed flow of information through the system for one particular execution of the agent, showing how different decisions and actions affect each other and helping teams understand how data is passed between different components and steps in the process. It’s commonly used as a debugging tool during the development process of an agent.

‍

For our press release agent, key metrics include: the time for different drafting stages, the number of validation cycles needed to meet quality thresholds, and the correctness of initial routing decisions–like whether technical validation was needed upfront or if the chosen content approach matched the announcement type.

‍

Real-time logging and tracing of the pipeline reveals process inefficiencies, such as redundant documentation queries or sequential validation steps that could be parallelized.

‍

The modular structure helps isolate and improve specific components. For example, if we notice the agent repeatedly circling back to query the same documentation sources, we can optimize its information gathering and validation steps.

Agentic futures

The next wave of generative AI in the enterprise is all about agents. Whether you're automating editorial workflows, building intelligent knowledge management solutions, or developing the next viral copilot, these powerful, highly specialized, autonomous systems can be customized to solve your specific business problems efficiently and creatively.

‍

At deepset, we empower you to harness this new class of AI through a modular approach. The deepset AI Platform lets you rapidly prototype agentic solutions, turning ideas into functional systems quickly. It lets you iterate based on real-world feedback, refining behavior and capabilities as you learn what works best for your use case. Most importantly, you can maintain and evolve your agents over time, ensuring they adapt as your business and technology needs change.

‍

Agents hold incredible potential, but the true power lies in making them work for your unique challenges and goals. At deepset, our goal is not just embracing the future of AI—but putting you in the driver's seat to shape it to your requirements and needs.

‍

Curious about building AI Apps and Agents?

Book Demo

meet the author

The deepset Team

Table of Contents

What is metadata?