A tutorial to demonstrate how to use Burr, using simple OpenAI client calls to GPT4, and FastAPI to create a custom email assistant agent.
In this tutorial, I will demonstrate how to use Burr, an open source framework (disclosure: I helped create it), using simple OpenAI client calls to GPT4, and FastAPI to create a custom email assistant agent. We’ll describe the challenge one faces and then how you can solve for them. For the application frontend we provide a reference implementation but won’t dive into details for it.
Why are interactive agents applications a challenge?
LLMs rarely achieve complex goals on their own, and almost never on the first try. While it is in vogue to claim that ChatGPT given an internet connection can solve the world’s problems, the majority of high-value tools we’ve encountered use a blend of AI ingenuity and human guidance. This is part of the general move towards building building agents — an approach where the AI makes decisions from information it receives — this could be information it queries, information a user provides, or information another LLM gives it.
A simple example of this is a tool to help you draft a response to an email. You put the email and your response goals, and it writes the response for you. At a minimum, you’ll want to provide feedback so it can adjust the response. Furthermore, you will want it to give a chance to ask clarifying questions (an overly confident yet incorrect chatbot helps no one).
In designing this interaction, your system will, inevitably, become a back-and-forth between user/LLM control. In addition to the standard challenges around AI applications (unreliable APIs, stochastic implementations, etc…), you will face a suite of new problems, including:
- Logically modeling a set of interaction points/flows
- Persisting the state so the user can pick up the interaction/application from where it left off
- Monitoring the decisions the LLM made (E.G. whether to ask the user questions or not)
And so on… In this post we’re going to walk through how to approach solving these — we’ll use the Burr library as well as FastAPI to build a web service to address these challenges in an extensible, modular manner; so you can then use this as a blue print for your own agent assistant needs.
The Tools
Burr
Burr is a lightweight python library you use to build applications as state machines. You construct your application out of a series of actions (these can be either decorated functions or objects), which declare inputs from state, as well as inputs from the user. These specify custom logic (delegating to any framework), as well as instructions on how to update state. State is immutable, which allows you to inspect it at any given point. Burr handles orchestration, monitoring and persistence.
@action(reads=["counter"], writes=["counter"])
def count(state: State) -> Tuple[dict, State]:
current = state["counter"] + 1
result = {"counter": current}
return result, state.update(counter=counter)
Note that the action above has two returns — the results (the counter), and the new, modified state (with the counter field incremented).
You run your Burr actions as part of an application — this allows you to string them together with a series of (optionally) conditional transitions from action to action.
from burr.core import ApplicationBuilder, default, expr
app = (
ApplicationBuilder()
.with_state(counter=0) # initialize the count to zero
.with_actions(
count=count,
done=done # implementation left out above
).with_transitions(
("count", "count", expr("counter < 10")), # Keep counting if the counter is less than 10
("count", "done", default) # Otherwise, we're done
).with_entrypoint("count") # we have to start somewhere
.build()
)
Burr comes with a user-interface that enables monitoring/telemetry, as well as hooks to persist state/execute arbitrary code during execution.
You can visualize this as a flow chart, i.e. graph / state machine:
And monitor it using the local telemetry debugger:
While we showed the (very simple) counter example above, Burr is more commonly used for building chatbots/agents (we’ll be going over an example in this post).
FastAPI
FastAPI is a framework that lets you expose python functions in a REST API. It has a simple interface — you write your functions then decorate them, and run your script — turning it into a server with self-documenting endpoints through OpenAPI.
@app.get("/")
def read_root():
return {"Hello": "World"}
@app.get("/items/{item_id}")
def read_item(item_id: int, q: Union[str, None] = None):
"""A very simpler example of an endpoint that takes in arguments."""
return {"item_id": item_id, "q": q}
FastAPI is easy to deploy on any cloud provider — it is infrastructure-agnostic and can generally scale horizontally (so long as consideration into state management is done). See this page for more information.
React (or any frontend framework)
You can use any frontend framework you want — react-based tooling, however, has a natural advantage as it models everything as a function of state, which can map 1:1 with the concept in Burr. In the demo app we use react, react-query, and tailwind, but we’ll be skipping over this largely (it is not central to the purpose of the post).
Building
Let’s dig a bit more into the conceptual model. At a high-level, our email assistant will do the following:
- Accept an email + instructions to respond
- Come up with a set of clarifying questions (if the LLM deems it required)
- Generates a draft using the answer to those questions
- Accept feedback to that draft and generates another one, repeating until the user is happy
- Return the final draft (done)
Modeling Control Flow
As Burr requires you to build a control flow from actions and transitions, we can initially model this as a simple flowchart.
We drafted this before actually writing any code — you will see it transforms to code naturally.
The green nodes represent actions (these take state in and modify it), and the blue nodes represent inputs (these are points at which the app has to pause and ask the user for information). Note that there is a loop (formulate_draft ⇔process_feedback) — we iterate on feedback until we’re happy with the results.
This diagram is simply a stylized version of what Burr shows you — the modeling is meant to be close to the actual code. We have not displayed state information (the data the steps take in/return), but we’ll need to track the following (that may or may not be populated at any given point) so we can make decisions about what to do next:
- The initial inputs: {email_to_respond: str, response_instructions: str}
- The questions the LLM asks and the user responses (if any):{clarifications: list[str], response_instructions: list[str]}
- The list of drafts + feedback: {drafts: list[str], feedback_history: list[str]}
- The final result: {final_result: str}
Implementing/Testing
Looking at the requirements above, we can build a straightforward burr application since we can very closely match our code with our diagram above. Let’s take a look at the determine_clarifications step, for example:
@action(
reads=["response_instructions", "incoming_email"],
writes=["clarification_questions"]
)
def determine_clarifications(state: State) -> Tuple[dict, State]:
"""Determines if the response instructions require clarification."""
incoming_email = state["incoming_email"]
response_instructions = state["response_instructions"]
client = _get_openai_client()
result = client.chat.completions.create(
model="gpt-4",
messages=[
{
"role": "system",
"content": ("You are a chatbot that has the task of "
"generating responses to an email on behalf "
"of a user. "),
},
{
"role": "user",
"content": (
f"The email you are to respond to is: {incoming_email}."
# ... left out, see link above
"The questions, joined by newlines, must be the only "
"text you return. If you do not need clarification, "
"return an empty string."
),
},
],
)
content = result.choices[0].message.content
all_questions = content.split("n") if content else []
return {"clarification_questions": all_questions}, state.update(
clarification_questions=all_questions)
Note that this uses simple OpenAI calls — you can replace this with Langchain, LlamaIndex, Hamilton (or something else) if you prefer more abstraction, and delegate to whatever LLM you like to use. And, you should probably use something a little more concrete (E.G. instructor) to guarantee output shape.
To tie these together, we put them into the application builder — this allows us to set conditional transitions (e.g. len(clarification_questions>0) and therefore connect actions, recreating the diagram above.
application = (
ApplicationBuilder()
# define our actions
.with_actions(
process_input,
determine_clarifications,
clarify_instructions,
formulate_draft,
process_feedback,
final_result,
)
# define how our actions connect
.with_transitions(
("process_input", "determine_clarifications"),
(
"determine_clarifications",
"clarify_instructions",
expr("len(clarification_questions) > 0"),
),
("determine_clarifications", "formulate_draft"),
("clarify_instructions", "formulate_draft"),
("formulate_draft", "process_feedback"),
("process_feedback", "formulate_draft", expr("len(feedback) > 0")),
("process_feedback", "final_result"),
)
.with_state(draft_history=[])
.with_entrypoint("process_input")
.build()
)
To iterate on this, we used a jupyter notebook. Running our application is simple — all you do is call the .run() method on the Application, with the right halting conditions. We’ll want it to halt before any action that requires user input (clarify_instructions and process_feedback), and after final_result. We can then run it in a while loop, asking for user input and feeding it back to the state machine:
def request_answers(questions):
"""Requests answers from the user for the questions the LLM has"""
answers = []
print("The email assistant wants more information:n")
for question in questions:
answers.append(input(question))
return answers
def request_feedback(draft):
"""Requests feedback from the user for a draft"""
print(
f"here's a draft!: n {draft} n n What feedback do you have?",
)
return input("Write feedback or leave blank to continue (if you're happy)")
inputs = {
"email_to_respond" : EMAIL,
"response_instructions" : INSTRUCTIONS
}
# in our notebook cell:
while True:
action, result, state = app.run(
halt_before=["clarify_instructions", "process_feedback"],
halt_after=["final_result"],
inputs=inputs
)
if action.name == "clarify_instructions":
questions = state["clarification_questions"]
answers = request_answers(questions)
inputs = {
"clarification_inputs" : answers
}
if action.name == "process_feedback":
feedback = request_feedback(state["current_draft"])
inputs = {"feedback" : feedback}
if action.name == "final_result":
print("final result is:", state["current_draft"])
break
You can then use the Burr UI to monitor your application as it runs!
Persistence
We’re going to persist our results to an SQLite server (although as you’ll see later on this is customizable). To do this, we need to add a few lines to the ApplicationBuilder.
state_persister = SQLLitePersister(
db_path="sqllite.db",
table_name="email_assistant_table"
)
app = (
ApplicationBuilder().
... # the code we had above
.initialize(
initializer=state_persister,
resume_at_next_action=True,
default_state={"chat_history" : []},
default_entrypoint="process_input"
)
.with_identifiers(app_id=app_id)
.build()
)
This ensures that every email draft we create will be saved and can be loaded at every step. When you want to resume a prior draft of an email, all you have to do is rerun the code and it will start where it left off.
Integrating in a web server
To expose this in a web server we’ll be using FastAPI to create endpoints and Pydantic to represent types. Before we get into the details, we’ll note that Burr naturally provides an application_id (either generated or specified) for every instance of an application. In this case the application_id would correspond to a particular email draft. This allows us to uniquely access it, query from the db, etc… It also allows for a partition key (E.G. user_id) so you can add additional indexing in your database. We center the API around inputs/outputs
Endpoints
We will construct the following endpoints:
- POST /create: This will create a new application and return the ID
- PUT /initialize_draft/{id}/: This calls out to process_input, passing in the email and instructions
- PUT /clarify_instructions/{id}: This will give answers back to the LLM
- PUT /process_feedback/{id}: This will give feedback back to the LLM
- GET /{id}/state: This will return the current state of the application
The GET endpoint allows us to get the current state of the application — this enables the user to reload if they quit the browser/get distracted. Each of these endpoints will return the full state of the application, which can be rendered on the frontend. Furthermore, it will indicate the next API endpoint we call, which allows the UI to render the appropriate form and submit to the right endpoint.
Using FastAPI + Pydantic, this becomes very simple to implement. First, let’s add a utility to get the application object. This will use a cached version or instantiate it:
@functools.lru_cache(maxsize=128)
def get_application(app_id: str) -> Application:
app = email_assistant_application.application(app_id=app_id)
return app
All this does is call our function application in email_assistant that recreates the application. We have not included the create function here, but it calls out to the same API.
Data Model
Let’s then define a Pydantic model to represent the state, and the app object in FastAPI:
class EmailAssistantState(pydantic.BaseModel):
app_id: str
email_to_respond: Optional[str]
response_instructions: Optional[str]
questions: Optional[List[str]]
answers: Optional[List[str]]
drafts: List[str]
feedback_history: List[str]
final_draft: Optional[str]
# This stores the next step, which tells the frontend which ones to call
next_step: Literal[
"process_input", "clarify_instructions",
"process_feedback", None]
@staticmethod
def from_app(app: Application):
# implementation left out, call app.state and translate to
# pydantic model we can use `app.get_next_action()` to get
#the next step and return it to the user
...
Note that every endpoint will return this same pydantic model!
Endpoints
Given that each endpoint returns the same thing (a representation of the current state as well as the next step to execute), they all look the same. We can first implement a generic run_through function, which will progress our state machine forward, and return the state.
def run_through(
project_id: str,
app_id: Optional[str],
inputs: Dict[str, Any]
) -> EmailAssistantState:
email_assistant_app = get_application(project_id, app_id)
email_assistant_app.run(
halt_before=["clarify_instructions", "process_feedback"],
halt_after=["final_result"],
inputs=inputs,
)
return EmailAssistantState.from_app(email_assistant_app)
This represents a simple but powerful architecture. We can continue calling these endpoints until we’re at a “terminal” state, at which point we can always ask for the state. If we decide to add more input steps, we can modify the state machine and add more input steps. We are not required to hold state in the app (it is all delegated to Burr’s persistence), so we can easily load up from any given point, allowing the user to wait for seconds, minutes, hours, or even days before continuing.
As the frontend simply renders based on the current state and the next step, it will always be correct, and the user can always pick up where they left off. With Burr’s telemetry capabilities you can debug any state-related issues, ensuring a smooth user experience.
Adding a UI
Now that we have a set of endpoints, the UI is simple. In fact, it mirrors the API almost exactly. We won’t dig into this too much, but the high-level is that you’ll want the following capabilities:
- Render the current state (show the history, latest draft)
- Include a form for the next action’s inputs (provide feedback, answer clarifications)
- Post the results to your FastAPI endpoints, pause for response, GOTO (1)
You can see the UI here. Here’s an example of it in action:
You can play around with it if you download burr (`pip install “burr[start]” && burr`), and navigate to http://localhost:7241/demos/email-assistant.
Note that there are many tools that make this easier/simpler to prototype, including chainlit, streamlit, etc… The backend API we built is amenable to interacting with them as well.
Additional Capabilities
Customizing Persistence
While we used the simple SQLLite persister, you can use any of the others that come with Burr or implement your own to match your schema/db infrastructure. To do this you implement the BaseStatePersister class, and add it in with the ApplicationBuilder, instead of the SQLLite persister we used above.
Additional Monitoring/Visibility
Using the Burr UI to monitor is not the only way. You can integrate your own by leveraging lifecycle hooks, enabling you to log data in a custom format to, say, datadog, langsmith, or langfuse.
Furthermore, you can leverage additional monitoring capabilities to track spans/traces, either logging them directly to the Burr UI or to any of the above providers. See the list of available hooks here.
Async/Streaming
While we kept the APIs we exposed synchronous for simplicity, Burr supports asynchronous execution as well. Burr also supports streaming responses for those who want to provide a more interactive UI/reduce time to first token.
So how does it do in practice?
As with any LLM application, the entire prompt matters. If you can provide the right guidance, the results are going to be better than if you don’t. Much like if you are going to instruct a human, more guidance is always better. That said, if you find yourself always correcting some aspect, then changing the base prompt is likely the best course of action. For example, using a single-shot or few-shot approach might be a good choice to try to help instruct the LLM as to what you’d like to see given your specific context.
Post Summary
In this post we discussed how to address some of the challenges around building human-in-the-loop agentic workflows. We ran through an example of making an email assistant using Burr to build and run it as a state machine, and FastAPI to run Burr in a web service. We finally showed how you can extend the tooling we used here for a variety of common production needs — e.g. monitoring & storage.
Additional Resources
- Join Burr’s Discord for help or if you have questions!
- Burr’s Github repository
- FastAPI guide
- Technical deep-dive of built with a web-server on github
- Code for the email assistant
Building an Email Assistant Application with Burr was originally published in Towards Data Science on Medium, where people are continuing the conversation by highlighting and responding to this story.
Originally appeared here:
Building an Email Assistant Application with Burr
Go Here to Read this Fast! Building an Email Assistant Application with Burr