Skip to content

Token Count

Definitions

Token Count calculates the number of tokens used in the retrieved context.

A required input for the metrics is encoder_name for tiktoken.

For example, for the most recent OpenAI models, you use cl100k_base as the encoder. For other models, you should look up the specific tokenizer used, or alternatively, you can also use approx to get an approximate token count which measures 1 token for every 4 characters.

Required data items: retrieved_context

from continuous_eval.metrics.retrieval import TokenCount
datum = {
"retrieved_context": [
"Lyon is a major city in France.",
"Paris is the capital of France and also the largest city in the country.",
],
"ground_truth_context": ["Paris is the capital of France."],
}
metric = TokenCount(encoder_name="cl100k_base")
print(metric(**datum))

Example Output

{
'num_tokens': 24,
}