Tool Selection Accuracy
Definitions
Tool Selection Accuracy measures how well an LLM selects a tool / function in a given module.
The used tools are compared with the expected tools and the metric outputs:
num_correct
: total number of tools that are selected AND called with the correct argumentsscore
:num_correct
/ total number of tools inground_truths
Example Usage
Required data items: tools
, ground_truths
from continuous_eval.metrics.tools.match import ToolSelectionAccuracyfrom continuous_eval.eval.types import ToolCall
tools = [ ToolCall(name="useless", kwargs={}), ToolCall(name="multiply", kwargs={"a": 2, "b": 3}),]
ground_truths = [ ToolCall(name="useless", kwargs={}), ToolCall(name="add", kwargs={"a": 2, "b": 3}),]
datum = { "tools": tools, "ground_truths": ground_truths,}
metric = ToolSelectionAccuracy()print(metric(**datum))
Example Output
{ "num_correct": 1, "score": 0.5}