To define your own LLM-as-a-judge metrics, you can either extend LLMMetric (regular LLM-as-a-judge metric) or ProbabilisticMetric (LLM-as-a-judge metric with probabilistic scoring) or you can use the CustomMetric class for easier implementation.
Custom Metric
This is the simplest way to define your own LLM-as-a-judge metric.
Suppose we want to define a metric that checks weather the generated answer contains Personally identifiable information (PII) or other sensitive information.
Example Output
Here we defined the criteria and rubric for the metric.
We also defined the arguments and response_format for the metric.
The arguments are the arguments that the metric will take as input along with their types and descriptions.
Similarly, the response_format is the format of the response that the metric will return.
Notice that response_format affects the output of the metric.
It is possible to also define scoring examples for the metric. For an example, see the example.
LLM Metric
If the criteria and rubric are not enough to define the metric, it is possible to define a custom scoring logic for the metric.
Example Output
The main difference between CustomMetric and LLMMetric is that when using LLMMetric, you have to define the system and user prompts yourself.
You can use Jinja2 templating to dynamically generate the prompts, using any of the variables defined in the arguments.
Notice that the response format is also specified in the prompt in this case.