Evaluate

Evaluating is a core part of Halluminate — the very reason Halluminate exists is so you can assess the accuracy of your generative models. On this page, we'll dive into the evaluate criteria endpoint. We offer three evaluation methods for testing your AI agents. Once you understand the required parameters for these evaluation methods, as well as the response which you will receive, you can begin testing.

Required Attributes

Name
criteria_uuid
Type
UUIDField
Description
The criteria's uuid.
Name
model_output
Type
string
Description
The model output which you want to evaluate.

Optional Attributes

Name
prompt
Type
string
Description
Prompt can influence the evaluation model and the outcome of the evaluation.
Name
context
Type
string
Description
Context will provide more information to the evaluation model in order to hone the evaluation of the model output.
Name
hyperparameters
Type
Dictionary
Description
Hyperparameters specify the evaluation model (i.e. llama3-8b-8192 or gemma2-9b-it)and the temperature (i.e. a range from 0.00 to 2.00).

Response Metrics

Name
reasoning
Type
string
Description
An explantion for how the model output was evaluated. Score for the meaning of the model output.
Name
score
Type
boolean
Description
The score for the model's output is a boolean (i.e. PASS or FAIL).

POST/evaluate_basic

Evaluate Basic

This is the basic evaluation. An evaluation score (i.e. pass or fail) is returned as well as its respective explanation.

Request

POST

/evaluate_basic

from halluminate import Halluminate
halluminate = Halluminate(api_key='<your_api_key_here>')
response = halluminate.evaluate_basic(
    criteria_uuid="<insert_criteria_uuid_here>",
    model_output="<insert_model_output_text_here>",
    prompt=None,
    context=None,
    hyperparameters={
        "model": "<customizable_model>",
        "temperature": <customizable_temperature>
    }
)
print(response)

Response

{
    "reasoning": <an explanation for the model output's evaluation>,
    "score": <'PASS' or 'FAIL'>,
}

POST/evaluate_with_bot_court

Evaluate With Bot Court

This is a modified evaluation with a bot court. An evaluation score (i.e. pass or fail) is returned as well as its respective explanation.

Request

POST

/evaluate_with_bot_court

from halluminate import Halluminate
halluminate = Halluminate(api_key='<your_api_key_here>')
response = halluminate.evaluate_with_bot_court(
    criteria_uuid="<insert_criteria_uuid_here>",
    model_output="<insert_model_output_text_here>",
    prompt=None,
    context=None,
    hyperparameters={
        "model": "<customizable_model>",
        "temperature": <customizable_temperature>
    }
)
print(response)

Response

{
    "reasoning": <an explanation for the model output's evaluation>,
    "score": <'PASS' or 'FAIL'>,
}

POST/evaluate_with_reflection

Evaluate With Reflection

This is a modified evaluation with reflection. An evaluation score (i.e. pass or fail) is returned as well as its respective explanation.

Request

POST

/evaluate_with_reflection

from halluminate import Halluminate
halluminate = Halluminate(api_key='<your_api_key_here>')
response = halluminate.evaluate_with_reflection(
    criteria_uuid="<insert_criteria_uuid_here>",
    model_output="<insert_model_output_text_here>",
    prompt=None,
    context=None,
    hyperparameters={
        "model": "<customizable_model>",
        "temperature": <customizable_temperature>
    }
)
print(response)

Response

{
    "reasoning": <an explanation for the model output's evaluation>,
    "score": <'PASS' or 'FAIL'>,
}