Evaluate
Evaluating is a core part of Halluminate — the very reason Halluminate exists is so you can assess the accuracy of your generative models. On this page, we'll dive into the evaluate criteria endpoint. We offer three evaluation methods for testing your AI agents. Once you understand the required parameters for these evaluation methods, as well as the response which you will receive, you can begin testing.
Required Attributes
- Name
criteria_uuid
- Type
- UUIDField
- Description
The criteria's uuid.
- Name
model_output
- Type
- string
- Description
The model output which you want to evaluate.
Optional Attributes
- Name
prompt
- Type
- string
- Description
Prompt can influence the evaluation model and the outcome of the evaluation.
- Name
context
- Type
- string
- Description
Context will provide more information to the evaluation model in order to hone the evaluation of the model output.
- Name
hyperparameters
- Type
- Dictionary
- Description
Hyperparameters specify the evaluation model (i.e. llama3-8b-8192 or gemma2-9b-it)and the temperature (i.e. a range from 0.00 to 2.00).
Response Metrics
- Name
reasoning
- Type
- string
- Description
An explantion for how the model output was evaluated. Score for the meaning of the model output.
- Name
score
- Type
- boolean
- Description
The score for the model's output is a boolean (i.e. PASS or FAIL).
Evaluate Basic
This is the basic evaluation. An evaluation score (i.e. pass or fail) is returned as well as its respective explanation.
Request
from halluminate import Halluminate
halluminate = Halluminate(api_key='<your_api_key_here>')
response = halluminate.evaluate_basic(
criteria_uuid="<insert_criteria_uuid_here>",
model_output="<insert_model_output_text_here>",
prompt=None,
context=None,
hyperparameters={
"model": "<customizable_model>",
"temperature": <customizable_temperature>
}
)
print(response)
Response
{
"reasoning": <an explanation for the model output's evaluation>,
"score": <'PASS' or 'FAIL'>,
}
Evaluate With Bot Court
This is a modified evaluation with a bot court. An evaluation score (i.e. pass or fail) is returned as well as its respective explanation.
Request
from halluminate import Halluminate
halluminate = Halluminate(api_key='<your_api_key_here>')
response = halluminate.evaluate_with_bot_court(
criteria_uuid="<insert_criteria_uuid_here>",
model_output="<insert_model_output_text_here>",
prompt=None,
context=None,
hyperparameters={
"model": "<customizable_model>",
"temperature": <customizable_temperature>
}
)
print(response)
Response
{
"reasoning": <an explanation for the model output's evaluation>,
"score": <'PASS' or 'FAIL'>,
}
Evaluate With Reflection
This is a modified evaluation with reflection. An evaluation score (i.e. pass or fail) is returned as well as its respective explanation.
Request
from halluminate import Halluminate
halluminate = Halluminate(api_key='<your_api_key_here>')
response = halluminate.evaluate_with_reflection(
criteria_uuid="<insert_criteria_uuid_here>",
model_output="<insert_model_output_text_here>",
prompt=None,
context=None,
hyperparameters={
"model": "<customizable_model>",
"temperature": <customizable_temperature>
}
)
print(response)
Response
{
"reasoning": <an explanation for the model output's evaluation>,
"score": <'PASS' or 'FAIL'>,
}