The failure of ReAct agents gives way to a new generation of agents — and possibilities
If 2023 was the year of retrieval augmented generation, 2024 has been the year of agents. Companies all over the world are experimenting with chatbot agents, tools like MultiOn are growing by connecting agents to outside websites, and frameworks like LangGraph and LlamaIndex Workflows are helping developers around the world build structured agents.
However, despite their popularity, agents have yet to make a strong splash outside of the AI ecosystem. Few agents are taking off among either consumer or enterprise users.
How can teams navigate the new frameworks and new agent directions? What tools are available, and which should you use to build your next application? As a leader at a company that recently built our own complex agent to act as a copilot within our product, we have some insights on this topic.
Defining Agents
First, it helps to define what we mean by an agent. LLM-based agents are software systems that string together multiple processing steps, including calls to LLMs, in order to achieve a desired end result. Agents typically have some amount of conditional logic or decision-making capabilities, as well as a working memory they can access between steps.
Let’s dive into how agents are built today, the current problems with modern agents, and some initial solutions.
The Failure of ReAct Agents
Let’s be honest, the idea of an agent isn’t new. There were countless agents launched on AI Twitter over the last year claiming amazing feats of intelligence. This first generation were mainly ReAct (reason, act) agents. They were designed to abstract as much as possible, and promised a wide set of outcomes.
Unfortunately, this first generation of agent architectures really struggled. Their heavy abstraction made them hard to use, and despite their lofty promises, they turned out to not do much of anything.
In reaction to this, many people began to rethink how agents should be structured. In the past year we’ve seen great advances, now leading us into the next generation of agents.
What is the Second Generation of Agents?
This new generation of agents is built on the principle of defining the possible paths an agent can take in a much more rigid fashion, instead of the open-ended nature of ReAct. Whether agents use a framework or not, we have seen a trend towards smaller solution spaces — aka a reduction in the possible things each agent can do. A smaller solution space means an easier-to-define agent, which often leads to a more powerful agent.
This second generation covers many different types of agents, however it’s worth noting that most of the agents or assistants we see today are written in code without frameworks, have an LLM router stage, and process data in iterative loops.
What Makes Up An Agent?
Many agents have a node or component called a router, that decides which step the agent should take next. The term router normally refers to an LLM or classifier making an intent decision of what path to take. An agent may return to this router continuously as they progress through their execution, each time bringing some updated information. The router will take that information, combine it with its existing knowledge of the possible next steps, and choose the next action to take.
The router itself is sometimes powered by a call to an LLM. Most popular LLMs at this point support function calling, where they can choose a component to call from a JSON dictionary of function definitions. This ability makes the routing step easy to initially set up. As we’ll see later however, the router is often the step that needs the most improvement in an agent, so this ease of setup can belie the complexity under the surface.
Each action an agent can take is typically represented by a component. Components are blocks of code that accomplish a specific small task. These could call an LLM, or make multiple LLM calls, make an internal API call, or just run some sort of application code. These go by different names in different frameworks. In LangGraph, these are nodes. In LlamaIndex Workflows, they’re known as steps. Once the component completes its work, it may return to the router, or move to other decision components.
Depending on the complexity of your agent, it can be helpful to group components together as execution branches or skills. Say you have a customer service chatbot agent. One of the things this agent can do is check the shipping status of an order. To functionally do that, the agent needs to extract an order id from the user’s query, create an api call to a backend system, make that api, parse the results, and generate a response. Each of those steps may be a component, and they can be grouped into the “Check shipping status” skill.
Finally, many agents will track a shared state or memory as they execute. This allows agents to more easily pass context between various components.
Examples of Agent Architectures
There are some common patterns we see across agent deployments today. We’ll walk through an overview of all of those architectures in the following pieces but the below examples are probably the most common.
In its simplest form an agent or assistant might just be defined with a LLM router and a tool call. We call this first example a single router with functions. We have a single router, that could be an LLM call, a classifier call, or just plain code, that directs and orchestrates which function to call. The idea is that the router can decide which tool or functional call to invoke based on input from the system. The single router comes from the fact that we are using only 1 router in this architecture.
A slightly more complicated assistant we see is a single router with skills. In this case, rather than calling a simple tooling or function call, the router can call a more complex workflow or skill set that might include many components and is an overall deeper set of chained actions. These components (LLM, API, tooling, RAG, and code calls) can be looped and chained to form a skill.
This is probably the most common architecture from advanced LLM application teams in production today that we see.
The general architecture gets more complicated by mixing branches of LLM calls with tools and state. In this next case, the router decides which of its skills (denoted in red) to call to answer the user’s question. It may update the shared state based on this question as well. Each skill may also access the shared state, and could involve one or more LLM calls of its own to retrieve a response to the user.
This is still generally straightforward, however, agents are usually far more complex. As agents become more complicated, you start to see frameworks built to try and reduce that complexity.
Agent Architecture Frameworks
LangGraph
LangGraph builds on the pre-existing concept of a Pregel graph, but translates it over to agents. In LangGraph, you define nodes and edges that your agent can travel along. While it is possible to define a router node in LangGraph, it is usually unnecessary unless you’re working with multi-agent applications. Instead, the same conditional logic that could live in the router now lives in the Nodes and Conditional Edges objects that LangGraph introduces.
Here’s an example of a LangGraph agent that can either respond to a user’s greeting, or perform some sort of RAG lookup of information:
Here, the routing logic instead lives within nodes and conditional edges that choose to move the user between different nodes depending on a function response. In this case, is_greeting and check_rag_response are conditional edges. Defining one of these edges looks like this:
graph.add_conditional_edges("classify_input", is_greeting, {True: "handle_greeting", False: "handle_RAG"})
Instead of collecting all of the routing logic in one node, we instead spread it between the relevant edges. This can be helpful, especially when you need to impose a predefined structure on your agent, and want to keep individual pieces of logic separated.
LlamaIndex Workflows
Other frameworks like LlamaIndex Workflows take a different approach, instead using events and event listeners to move between nodes. Like LangGraph, Workflows don’t necessarily need a routing node to handle the conditional logic of an agent. Instead, Workflows rely on individual nodes, or steps as they call them, to handle incoming events, and broadcast outgoing events to be handled by other steps. This results in the majority of Workflows logic being handled within each step, as opposed to within both steps and nodes.
CrewAI, Autogen, Swarm, and Others
There are other frameworks that are intended to make agent development easier, including some that specialize in handling groups of agents working together. This space is rapidly evolving and it’s worth checking out these and other frameworks.
Key Questions When Considering An Agent
Should You Use a Framework To Develop Your Agent?
Regardless of the framework you use, the additional structure provided by these tools can be helpful in building out agent applications. The question of whether using one of these frameworks is beneficial when creating larger, more complicated applications is a bit more challenging.
We have a fairly strong opinion in this area because we built an assistant ourselves. Our assistant uses a multi-layer router architecture with branches and steps that echo some of the abstractions of the current frameworks. We started building our assistant before LangGraph was stable. As a result, we constantly ask ourselves: if we were starting from scratch, would we use the current framework abstractions? Are they up to the task?
The current answer is not yet. There is just too much complexity in the overall system that doesn’t lend itself to a Pregel-based architecture. If you squint, you can map it to nodes and edges but the software abstraction would likely get in the way. As it stands, our team tends to prefer code over frameworks.
We do however, see the value in the agent framework approaches. Namely, it does force an architecture that has some best practices and good tooling. They are also getting better constantly, expanding where they are useful and what you can do with them. It is very likely that our answer may change in the near future as these frameworks improve.
Do You Actually Need An Agent?
This begs another important question: what types of applications even require an agent? After all, agents cover a broad range of systems — and there is so much hype about what is “agentic” these days.
Here are three criteria to determine whether you might need an agent:
- Does your application follow an iterative flow based on incoming data?
- Does your application need to adapt and follow different flows based on previously taken actions or feedback along the way?
- Is there a state space of actions that can be taken? The state space can be traversed in a variety of ways, and is not just restricted to linear pathways.
What Are The Common Issues To Expect?
Let’s say that you answer yes to one of these questions and need an agent. Here are several known issues to be aware of as you build.
The first is long-term planning. While agents are powerful, they still struggle to decompose complex tasks into a logical plan. Worse, they can often get stuck in loops that block them from finding a solution. Agents also struggle with malformed tooling calls. This is typically due to the underlying LLMs powering an agent. In each case, human intervention is often needed to course correct.
Another issue to be aware is inconsistent performance due to the vastness of the solution space. The sheer number of possible actions and paths an agent can take makes it difficult to achieve consistent results and tends to drive up costs. Perhaps this is why the market is tending toward constrained agents that can only choose from a set of possible actions, effectively limiting the solution space.
What Are Some Tactics for Addressing These Challenges?
As noted, one of the most effective strategies is to map or narrow the solution space beforehand. By thoroughly defining the range of possible actions and outcomes, you can reduce ambiguity. Incorporating domain and business heuristics into the agent’s guidance system is also an easy win, giving agents the context they need to make better decisions. Being explicit about action intentions (clearly defining what each action is intended to accomplish) and creating repeatable processes (standardizing the steps and methodologies that agents follow) can also enhance reliability and make it easier to identify and correct errors when they occur.
Finally, orchestrating with code and more reliable methods rather than relying solely on LLM planning can dramatically improve agent performance. This involves swapping your LLM router for a code-based router where possible. By using code-based orchestration, you can implement more deterministic and controllable processes, reducing the unpredictability that often comes with LLM-based planning.
Conclusion
With so much hype and the proliferation of new frameworks in a frenzied generative AI environment filled with FOMO, it can be easy to lose sight of fundamental questions. Taking the time to think about when and where a modern agent framework might — and might not — make sense for your use case before diving headlong into an MVP is always worthwhile.
Navigating the New Types of LLM Agents and Architectures was originally published in Towards Data Science on Medium, where people are continuing the conversation by highlighting and responding to this story.
Originally appeared here:
Navigating the New Types of LLM Agents and Architectures
Go Here to Read this Fast! Navigating the New Types of LLM Agents and Architectures