Context Precision & Recall
Definitions
Context Precision: measures signal vs. noise — what proportion of the retrieved contexts are relevant?
Context Recall: measures completeness — what proportion of all relevant contexts are retrieved?
F1: harmonic mean of precision and recall
Matching Strategy
Given that the ground truth contexts can be defined differently from the exact chunks retrieved. For example, a ground truth contexts can be a sentence that contains the information, while the contexts retrieved are uniform 512-token chunks. We have following matching strategies that determine relevance:
Match Type | Component | Retrieved Component Considered relevant if: |
---|---|---|
ExactChunkMatch() |
Chunk | Exact match to a Ground Truth Context Chunk. |
ExactSentenceMatch() |
Sentence | Exact match to a Ground Truth Context Sentence. |
RoughChunkMatch() |
Chunk | Match to a Ground Truth Context Chunk with ROUGE-L Recall > ROUGE_CHUNK_MATCH_THRESHOLD (default 0.7). |
RougeSentenceMatch() |
Sentence | Match to a Ground Truth Context Sentence with ROUGE-L Recall > ROUGE_CHUNK_SENTENCE_THRESHOLD (default 0.8). |
Example Usage
Required data items: retrieved_context
, ground_truth_context