Dataset Class
The Dataset
class is a convenient class that represent a dataset that can be used for evaluation.
The dataset class can be initialized with a path to a folder or a file.
The folder should contain the following files:
dataset.jsonl
which contains a collection of query / instructions and corresponding reference outputs by the modules in the pipeline.
- an optional
manifest.yaml
which declares the structure and fields of the dataset, the license and other metadata.
Alternatively, you can also create a dataset from a list of dictionaries:
To access the raw data, you can use the data
attribute:
Dataset fields
Suppose you want to reference a dataset field, you can use the DatasetField
class:
When you load the dataset, the Dataset
class will automatically infer the fields from the data.
this will be particularly useful when defining the input and output of the modules in the pipeline.
Example Data Folder
Here’s an example golden dataset that contains uid
, question
, answer
(ground truth answers), and tool_calls
(the tools that are supposed to be used).
Dataset File
Manifest (optional)