Go here to Read this Fast! Bitcoin ETF outflows surpass $300m, analysts warn of key price levels
Originally appeared here:
Bitcoin ETF outflows surpass $300m, analysts warn of key price levels
Go here to Read this Fast! Bitcoin ETF outflows surpass $300m, analysts warn of key price levels
Originally appeared here:
Bitcoin ETF outflows surpass $300m, analysts warn of key price levels
Go here to Read this Fast! TradFi, DeFi are like ‘two worlds’ converging: Zignaly co-founder
Originally appeared here:
TradFi, DeFi are like ‘two worlds’ converging: Zignaly co-founder
The sharp price dip saw a quick turnaround, although recovery was not complete
The higher timeframe trend since April was bearish, but this uptrend could be the beginning of a rally
dogwifha
The post WIF’s upside potential – Here’s how high the memecoin can REALLY go appeared first on AMBCrypto.
Massive inflow of ETH into exchanges as ICO continue to sell
The Dencun upgrade has seen ETH lose some revenue going to L2s
Ethereum (ETH), the market’s second-largest cryptocurrency after B
The post $259.2M ETH hits exchanges – Another sign of Ethereum facing price pressure? appeared first on AMBCrypto.
Originally appeared here:
$259.2M ETH hits exchanges – Another sign of Ethereum facing price pressure?
Bitcoin miners exiting the cycle may signal a market bottom, paving the way for fresh interest
And yet, specific conditions must align for a confirmed bull rally
A week of bearish downturn s
The post Bitcoin miners’ exit confirm $61K support – Why this is key for October’s rally appeared first on AMBCrypto.
As the meme coin scene gets hotter, Pepe (PEPE) is drawing attention with its market cap now at $4 billion, sparking discussions about its potential. PEPE isn’t alone in the spotlight—Shiba Inu (SHIB) is also seeing significant gains with a 25% increase. However, BlockDAG is witnessing massive inflows, with a notable $3M raised in just […]
Ripple-affiliated cryptocurrency XRP has become the top-trending digital asset recently due to key factors that have sparked heated discussions.
Originally appeared here:
Why Ripple’s XRP Has Emerged As The Top-Trending Crypto On Social Media Lately
Today, new libraries and low-code platforms are making it easier than ever to build AI agents, also referred to as digital workers. Tool calling is one of the primary abilities driving the “agentic” nature of Generative AI models by extending their ability beyond conversational tasks. By executing tools (functions), agents can take action on your behalf and solve complex, multi-step problems that require robust decision making and interacting with a variety of external data sources.
This article focuses on how reasoning is expressed through tool calling, explores some of the challenges of tool use, covers common ways to evaluate tool-calling ability, and provides examples of how different models and agents interact with tools.
At the core of successful agents lie two key expressions of reasoning: reasoning through evaluation and planning and reasoning through tool use.
While both expressions of reasoning are important, they don’t always need to be combined to create powerful solutions. For example, OpenAI’s new o1 model excels at reasoning through evaluation and planning because it was trained to reason using chain of thought. This has significantly improved its ability to think through and solve complex challenges as reflected on a variety of benchmarks. For example, the o1 model has been shown to surpass human PhD-level accuracy on the GPQA benchmark covering physics, biology, and chemistry, and scored in the 86th-93rd percentile on Codeforces contests. While o1’s reasoning ability could be used to generate text-based responses that suggest tools based on their descriptions, it currently lacks explicit tool calling abilities (at least for now!).
In contrast, many models are fine-tuned specifically for reasoning through tool use enabling them to generate function calls and interact with APIs very effectively. These models are focused on calling the right tool in the right format at the right time, but are typically not designed to evaluate their own results as thoroughly as o1 might. The Berkeley Function Calling Leaderboard (BFCL) is a great resource for comparing how different models perform on function calling tasks. It also provides an evaluation suite to compare your own fine-tuned model on various challenging tool calling tasks. In fact, the latest dataset, BFCL v3, was just released and now includes multi-step, multi-turn function calling, further raising the bar for tool based reasoning tasks.
Both types of reasoning are powerful independently, and when combined, they have the potential to create agents that can effectively breakdown complicated tasks and autonomously interact with their environment. For more examples of AI agent architectures for reasoning, planning, and tool calling check out my team’s survey paper on ArXiv.
Building robust and reliable agents requires overcoming many different challenges. When solving complex problems, an agent often needs to balance multiple tasks at once including planning, interacting with the right tools at the right time, formatting tool calls properly, remembering outputs from previous steps, avoiding repetitive loops, and adhering to guidance to protect the system from jailbreaks/prompt injections/etc.
Too many demands can easily overwhelm a single agent, leading to a growing trend where what may appear to an end user as one agent, is behind the scenes a collection of many agents and prompts working together to divide and conquer completing the task. This division allows tasks to be broken down and handled in parallel by different models and agents tailored to solve that particular piece of the puzzle.
It’s here that models with excellent tool calling capabilities come into play. While tool-calling is a powerful way to enable productive agents, it comes with its own set of challenges. Agents need to understand the available tools, select the right one from a set of potentially similar options, format the inputs accurately, call tools in the right order, and potentially integrate feedback or instructions from other agents or humans. Many models are fine-tuned specifically for tool calling, allowing them to specialize in selecting functions at the right time with high accuracy.
Some of the key considerations when fine-tuning a model for tool calling include:
With the growing importance of tool use in language models, many datasets have emerged to help evaluate and improve model tool-calling capabilities. Two of the most popular benchmarks today are the Berkeley Function Calling Leaderboard and Nexus Function Calling Benchmark, both of which Meta used to evaluate the performance of their Llama 3.1 model series. A recent paper, ToolACE, demonstrates how agents can be used to create a diverse dataset for fine-tuning and evaluating model tool use.
Let’s explore each of these benchmarks in more detail:
Each of these benchmarks facilitates our ability to evaluate model reasoning expressed through tool calling. These benchmarks and fine-tuned models reflect a growing trend towards developing more specialized models for specific tasks and increasing LLM capabilities by extending their ability to interact with the real-world.
If you’re interested in exploring tool-calling in action, here are some examples to get you started organized by ease of use, ranging from simple built-in tools to using fine-tuned models, and agents with tool-calling abilities.
Level 1 — ChatGPT: The best place to start and see tool-calling live without needing to define any tools yourself, is through ChatGPT. Here you can use GPT-4o through the chat interface to call and execute tools for web-browsing. For example, when asked “what’s the latest AI news this week?” ChatGPT-4o will conduct a web search and return a response based on the information it finds. Remember the new o1 model does not have tool-calling abilities yet and cannot search the web.
While this built-in web-searching feature is convenient, most use cases will require defining custom tools that can integrate directly into your own model workflows and applications. This brings us to the next level of complexity.
Level 2 — Using a Model with Tool Calling Abilities and Defining Custom Tools:
This level involves using a model with tool-calling abilities to get a sense of how effectively the model selects and uses it’s tools. It’s important to note that when a model is trained for tool-calling, it only generates the text or code for the tool call, it does not actually execute the code itself. Something external to the model needs to invoke the tool, and it’s at this point — where we’re combining generation with execution — that we transition from language model capabilities to agentic systems.
To get a sense for how models express tool calls we can turn towards the Databricks Playground. For example, we can select the model Llama 3.1 405B and give it access to the sample tools get_distance_between_locations and get_current_weather. When prompted with the user message “I am going on a trip from LA to New York how far are these two cities? And what’s the weather like in New York? I want to be prepared for when I get there” the model decides which tools to call and what parameters to pass so it can effectively reply to the user.
In this example, the model suggests two tool calls. Since the model cannot execute the tools, the user needs to fill in a sample result to simulate the tool output (e.g., “2500” for the distance and “68” for the weather). The model then uses these simulated outputs to reply to the user.
This approach to using the Databricks Playground allows you to observe how the model uses custom defined tools and is a great way to test your function definitions before implementing them in your tool-calling enabled applications or agents.
Outside of the Databricks Playground, we can observe and evaluate how effectively different models available on platforms like HuggingFace use tools through code directly. For example, we can load different models like Llama 3.2–3B-Instruct, ToolACE-8B, NexusRaven-V2–13B, and more from HuggingFace, give them the same system prompt, tools, and user message then observe and compare the tool calls each model returns. This is a great way to understand how well different models reason about using custom-defined tools and can help you determine which tool-calling models are best suited for your applications.
Here is an example demonstrating a tool call generated by Llama-3.2–3B-Instruct based on the following tool definitions and user message, the same steps could be followed for other models to compare generated tool calls.
import torch
from transformers import pipeline
function_definitions = """[
{
"name": "search_google",
"description": "Performs a Google search for a given query and returns the top results.",
"parameters": {
"type": "dict",
"required": [
"query"
],
"properties": {
"query": {
"type": "string",
"description": "The search query to be used for the Google search."
},
"num_results": {
"type": "integer",
"description": "The number of search results to return.",
"default": 10
}
}
}
},
{
"name": "send_email",
"description": "Sends an email to a specified recipient.",
"parameters": {
"type": "dict",
"required": [
"recipient_email",
"subject",
"message"
],
"properties": {
"recipient_email": {
"type": "string",
"description": "The email address of the recipient."
},
"subject": {
"type": "string",
"description": "The subject of the email."
},
"message": {
"type": "string",
"description": "The body of the email."
}
}
}
}
]
"""
# This is the suggested system prompt from Meta
system_prompt = """You are an expert in composing functions. You are given a question and a set of possible functions.
Based on the question, you will need to make one or more function/tool calls to achieve the purpose.
If none of the function can be used, point it out. If the given question lacks the parameters required by the function,
also point it out. You should only return the function call in tools call sections.
If you decide to invoke any of the function(s), you MUST put it in the format of [func_name1(params_name1=params_value1, params_name2=params_value2...), func_name2(params)]n
You SHOULD NOT include any other text in the response.
Here is a list of functions in JSON format that you can invoke.nn{functions}n""".format(functions=function_definitions)
From here we can move to Level 3 where we’re defining Agents that execute the tool-calls generated by the language model.
Level 3 Agents (invoking/executing LLM tool-calls): Agents often express reasoning both through planning and execution as well as tool calling making them an increasingly important aspect of AI based applications. Using libraries like LangGraph, AutoGen, Semantic Kernel, or LlamaIndex, you can quickly create an agent using models like GPT-4o or Llama 3.1–405B which support both conversations with the user and tool execution.
Check out these guides for some exciting examples of agents in action:
The future of agentic systems will be driven by models with strong reasoning abilities enabling them to effectively interact with their environment. As the field evolves, I expect we will continue to see a proliferation of smaller, specialized models focused on specific tasks like tool-calling and planning.
It’s important to consider the current limitations of model sizes when building agents. For example, according to the Llama 3.1 model card, the Llama 3.1–8B model is not reliable for tasks that involve both maintaining a conversation and calling tools. Instead, larger models with 70B+ parameters should be used for these types of tasks. This alongside other emerging research for fine-tuning small language models suggests that smaller models may serve best as specialized tool-callers while larger models may be better for more advanced reasoning. By combining these abilities, we can build increasingly effective agents that provide a seamless user experience and allow people to leverage these reasoning abilities in both professional and personal endeavors.
Interested in discussing further or collaborating? Reach out on LinkedIn!
AI Agents: The Intersection of Tool Calling and Reasoning in Generative AI was originally published in Towards Data Science on Medium, where people are continuing the conversation by highlighting and responding to this story.
Originally appeared here:
AI Agents: The Intersection of Tool Calling and Reasoning in Generative AI
Go Here to Read this Fast! Top Best Buy deals ahead of Prime Day 2024
Originally appeared here:
Top Best Buy deals ahead of Prime Day 2024
Originally appeared here:
The most underrated horror sequel of this century is finally streaming again on Max