Evaluation Runner
Definition
eval_manager
manages the evaluation process for a pipeline. Add eval_manager
to your application and log the module outputs for evaluation.
Logging data in evaluation runs
In your application, you can leverage the following functions to log your data in the evaluation runs.
eval_manager.set_pipeline(pipeline)
to select the previously defined pipeline.eval_manager.start_run()
to begin the evaluation run.eval_manager.log("module_name", data)
to log specific outputs of a given module.eval_manager.is_running()
to check if the evaluation run is still running.eval_manager.curr_sample
to access the current sample in the dataseteval_manager.next_sample()
to move to the next sample in the dataseteval_manager.evaluation.save(Path("results.jsonl"))
to save the run results.
See complete examples in Examples folder.
Example Logged results (one sample):
Run evaluators and tests on results
eval_manager.set_pipeline(pipeline)
to select the previously defined pipeline (and corresponding metrics / tests).eval_manager.evaluation.load(Path("results.jsonl"))
to load the run results from your applicationseval_manager.run_metrics()
to calculate the metrics per module in your pipelineeval_manager.metrics.save(Path("metrics_results.json"))
to save the metric resultseval_manager.metrics.aggregate()
to calculate the aggregate results across dataseteval_manager.run_tests()
to run the testseval_manager.tests.save(Path("test_results.json"))
to save the test results