Token Count

Definitions

Token Count calculates the number of tokens used in the retrieved context.

A required input for the metrics is encoder_name for tiktoken.

For example, for the most recent OpenAI models, you use cl100k_base as the encoder. For other models, you should look up the specific tokenizer used, or alternatively, you can also use approx to get an approximate token count which measures 1 token for every 4 characters.

Required data items: retrieved_context

from continuous_eval.metrics.retrieval import TokenCount

datum = {
    "retrieved_context": [
        "Lyon is a major city in France.",
        "Paris is the capital of France and also the largest city in the country.",
    ],
    "ground_truth_context": ["Paris is the capital of France."],
}

metric = TokenCount(encoder_name="cl100k_base")
print(metric(**datum))

Example Output

{
    'num_tokens': 24,
}