Evaluation Dataset
Dataset Class
The Dataset
class is a convenient class that represent a dataset that can be used for evaluation.
The dataset class can be initialized with a path to a folder or a file. The folder should contain the following files:
dataset.jsonl
which contains a collection of query / instructions and corresponding reference outputs by the modules in the pipeline.- an optional
manifest.yaml
which declares the structure and fields of the dataset, the license and other metadata.
Alternatively, you can also create a dataset from a list of dictionaries:
To access the raw data, you can use the data
attribute:
Dataset fields
Suppose you want to reference a dataset field, you can use the DatasetField
class:
When you load the dataset, the Dataset
class will automatically infer the fields from the data.
this will be particularly useful when defining the input and output of the modules in the pipeline.
Example Data Folder
Here’s an example golden dataset that contains uid
, question
, answer
(ground truth answers), and tool_calls
(the tools that are supposed to be used).
Dataset File
Manifest (optional)
Example Datasets
Below are the example datasets you can use to test your pipeline/code.
Dataset | Description | Data format |
---|---|---|
correctness | 1,200 examples, created from InstructQA | `Dataset` |
retrieval | 300 examples, created from HotpotQA | `Dataset` |
faithfulness | 544 examples, created from InstructQA | `Dataset` |
graham_essays/small/txt | 10 Paul Graham essays, created from graham-essays | Zip of txt |
graham_essays/small/dataset | 55 questions about Paul Graham essays | `Dataset` |
graham_essays/small/results | The results (i.e., answer and retrieved documents) from a simple RAG pipeline | JSON |
Download Datasets
The example datasets can be example_data_downloader
helper function.