Token Count
Definitions
Token Count calculates the number of tokens used in the retrieved context.
A required input for the metrics is encoder_name
for tiktoken.
For example, for the most recent OpenAI models, you use cl100k_base
as the encoder. For other models, you should look up the specific tokenizer used, or alternatively, you can also use approx
to get an approximate token count which measures 1 token for every 4 characters.
Required data items: retrieved_context
from continuous_eval.metrics.retrieval import TokenCount
datum = { "retrieved_context": [ "Lyon is a major city in France.", "Paris is the capital of France and also the largest city in the country.", ], "ground_truth_context": ["Paris is the capital of France."],}
metric = TokenCount(encoder_name="cl100k_base")print(metric(**datum))
Example Output
{ 'num_tokens': 24,}