The Berkeley Function Calling Leaderboard V3 (also called Berkeley Tool Calling Leaderboard V3) evaluates the LLM's ability to call functions (aka tools) accurately. This leaderboard consists of real-world data and will be updated periodically. For more information on the evaluation dataset and methodology, please refer to our blogs: BFCL-v1 introducing AST as an evaluation metric, BFCL-v2 introducing enterprise and OSS-contributed functions, and BFCL-v3 introducing multi-turn interactions. Checkout code and data.
FC = native support for function/tool calling. Prompt = walk-around for function calling, using model's normal text generation capability.
Cost is calculated as an estimate of the cost per 1000 function calls, in USD. Latency is measured in seconds.
Overall Accuracy is the unweighted average of all the sub-categories. For details on score composition, please refer to our blog.
Click on column header to sort. If you would like to add your model or contribute test-cases, please contact us via discord.
Models are evaluated using commit d7e52e5. All the model response we obtained is available here. To reproduce the results, please checkout our codebase at this checkpoint.
The following chart shows the comparison of the models based on a few metrics. You can select and deselect which models to compare. More information on each metric can be found in the blog.
In this demo for function calling, you can enter a prompt and a function and see the output. There will be two outputs (and two output boxes accordingly): one in the actual code format (the top one) and the other in the OpenAI compatible format (the bottom one). Note that the OpenAI compatible format output is only available if the actual code output has valid syntax and can be parsed. We also provide you a few examples to try out and get a sense of the input format and the output.
@misc{berkeley-function-calling-leaderboard,
title={Berkeley Function Calling Leaderboard},
author={Fanjia Yan and Huanzhi Mao and Charlie Cheng-Jie Ji
and Tianjun Zhang and Shishir G. Patil and Ion Stoica and Joseph E.
Gonzalez},
howpublished={\url{https://gorilla.cs.berkeley.edu/blogs/8_berkeley_function_calling_leaderboard.html}},
year={2024},
}