What AI experts have long anticipated is now a reality: Large language models (LLMs) are mature enough to serve as the brains behind autonomous AI agents that can understand nuanced context, intelligently operate a variety of tools, and perform multi-step tasks with minimal human intervention.
Unlike other established applications of generative AI, an agent doesn't just process information in a predefined order; it actively pursues goals and decides flexibly on the right tools to achieve it. Consider, for example, a personalized coding assistant. Rather than simply suggesting code snippets, it actively understands the developer's project context, navigates development tools and APIs, breaks down complex programming tasks into actionable steps, and becomes an active, useful assistant that extends the developer's problem-solving capabilities.
At deepset, our work with organizations across industries gives us insight into the immense real-world value of agents. This guide provides a low-barrier introduction to current agent technology, helping business leaders and practitioners understand the possibilities and practicalities of LLM-powered agents.
Agency involves understanding a situation, weighing options, and choosing the best path forward. While humans do this naturally, creating machines that can do this has been one of the most challenging and interesting aspects of AI.
Before LLMs, AI agents were often designed for specific tasks. Chess engines, for example, learned to beat grandmasters, and robotic systems could navigate physical spaces. However, these agents could only operate within their highly specialized domains, lacked natural language interfaces, and were inaccessible to a wider user base.
LLMs have changed this landscape. LLM agents can follow natural language instructions and use their general knowledge and domain expertise to devise customized solutions to complex problems. They can break these solutions down into manageable steps using different tools, data sources, and integrations. This not only makes them more likely to achieve their goal, but it also makes them more accountable to their users-because the agent has to provide insight into the steps it is taking to achieve its goal, rather than operating in a black box.
Modern AI agents are the result of some or all of the following building blocks and capabilities working together:
Let's take a closer look at the characteristics we've identified as contributing to agentic systems, using a hypothetical agent as a case in point.
Tools are the interfaces through which an agent interacts with its environment. Consider a press release agent that writes and distributes company announcements. It might orchestrate the following tools:
The agent interacts with its tools through carefully defined functions (APIs), each of which performs a specific task. For the writing assistant tool, these might be:
Well-defined tools and APIs, combined with clear instructions on when and how to use them, give agents reliable ways to interact with their environment and accomplish specific tasks.
Memory provides context and continuity for agents through two distinct types: short-term memory for maintaining context within a current session, and long-term memory for retaining information across different tasks. Short-term memory acts like a context window or conversation buffer, while long-term memory typically uses vector stores or traditional databases for persistent knowledge storage.
Our press release agent uses short-term memory to keep track of its current execution state. This includes key messages used in the current draft, reviews and instructions given by a human editor, and SEO optimization cycle results. Its long-term memory stores a rich knowledge base of past releases, their performance in terms of clicks and SEO ranking, automatically derived style preferences from the editor and feedback received between sessions. While short-term memory is already commonly used in agents, general long-term memory is less spread in production use cases as it can complicate the explainability of an AI system in production over time and can often be replaced by more specific solutions (e.g. RAG tool).
Agents combine several key mechanisms to tackle complex tasks effectively:
Our press release agent demonstrates how these mechanisms work together: It plans the path from brief to distribution, adjusting its approach when competitive research suggests new angles. It makes routing decisions about which content generation approach fits the product type and which distribution channels match the company's target audience. Throughout the process, it runs internal reflection loops–checking against style guides, validating technical claims, and optimizing SEO until quality thresholds are met.
These combined capabilities enable agents to handle complex, open-ended tasks by continuously evaluating progress and adjusting their approach based on interim findings.
At deepset, we build Compound AI systems that combine multiple AI components, each designed for a specific task, to solve complex problems. This modular approach is perfect for agents because it
Plus, the Compound AI framework helps clarify one of the more controversial topics in the agent discussion: when a system qualifies for the term "agent" in the first place:
One of the best known system designs for using generative AI is retrieval augmented generation (RAG), which has a document retrieval component that comes before an LLM component. Such a system can be easily extended to adapt to changing business needs. For example, one could add external API calls to expand the system's ability to query different knowledge bases. Since this introduces a routing and decision component, the question arises whether such a system should be called an agent or not.
To address this issue, the AI community has begun to use the term "agentic" to describe systems on a spectrum between linearity and autonomy. Systems become more agentic as they gain decision-making capabilities. The modular nature of compound AI allows AI builders to start simple and incrementally expand system capabilities.
Consider how our press release system might have evolved. It could have started as a RAG-based writing assistant that generates announcements by combining public documentation with corporate style prompts. Adding style checking and SEO tools introduces refinement loops, while distribution capabilities require planning and routing decisions. Each addition moves the release writer further into agentic territory.
This incremental approach helps teams make practical decisions about system architecture: how to structure prompts, what kind of monitoring to implement, how to handle errors, and when human oversight is necessary. Rather than building complex agent systems from scratch, you can start with basic components that solve your immediate problem, and then evolve the system as you gain expertise and a more refined understanding of the problem you're solving.
Agents, with their ability to perform complex processes internally with a dynamic number of steps involved, bring a new level of complexity to the task of evaluation. Because they are more flexible in their execution than other GenAI systems, they also require a high degree of observability to ensure that they're serving their purpose.
Evaluating agents means tracking both outcomes and process efficiency. In addition to measuring the quality of an agent's output, we also need to understand how well the agent uses its decision-making and tool use capabilities. Compound AI provides a number of approaches to do this:
For our press release agent, key metrics include: the time for different drafting stages, the number of validation cycles needed to meet quality thresholds, and the correctness of initial routing decisions–like whether technical validation was needed upfront or if the chosen content approach matched the announcement type.
Real-time logging and tracing of the pipeline reveals process inefficiencies, such as redundant documentation queries or sequential validation steps that could be parallelized.
The modular structure helps isolate and improve specific components. For example, if we notice the agent repeatedly circling back to query the same documentation sources, we can optimize its information gathering and validation steps.
The next wave of generative AI in the enterprise is all about agents. Whether you're automating editorial workflows, building intelligent knowledge management solutions, or developing the next viral copilot, these powerful, highly specialized, autonomous systems can be customized to solve your specific business problems efficiently and creatively.
At deepset, we empower you to harness this new class of AI through a modular approach. The deepset AI Platform lets you rapidly prototype agentic solutions, turning ideas into functional systems quickly. It lets you iterate based on real-world feedback, refining behavior and capabilities as you learn what works best for your use case. Most importantly, you can maintain and evolve your agents over time, ensuring they adapt as your business and technology needs change.
Agents hold incredible potential, but the true power lies in making them work for your unique challenges and goals. At deepset, our goal is not just embracing the future of AI—but putting you in the driver's seat to shape it to your requirements and needs.