AnswerCorrectness
Definition
** Answer Correctness** outputs a score between 0.0 - 1.0 assessing the overall quality of the answer, given the question and ground truth answer.
Scoring rubric in LLM Prompt:
- 0.0 means that the answer is completely irrelevant to the question.
- 0.25 means that the answer is relevant to the question but contains major errors.
- 0.5 means that the answer is relevant to the question and is partially correct.
- 0.75 means that the answer is relevant to the question and is correct.
- 1.0 means that the answer is relevant to the question and is correct and complete.
Example Usage
Required data items: question
, answer
, ground_truths
from continuous_eval.metrics.generation.text import AnswerCorrectness
datum = { "question": "Who wrote 'Romeo and Juliet'?", "answer": "Shakespeare wrote 'Romeo and Juliet'", "ground_truth_answers": [ "William Shakespeare wrote 'Romeo and Juliet", "William Shakespeare", "Shakespeare", "Shakespeare is the author of 'Romeo and Juliet'" ]}
metric = AnswerCorrectness()print(metric(**datum))
Sample Output
{ "correctness": 0.9999867895679586, "reasoning": "The generated answer correctly identifies Shakespeare as the author of 'Romeo and Juliet'.",}