Evaluate

Evaluating is a core part of Halluminate — the very reason Halluminate exists is so you can assess the accuracy of your generative models. On this page, we'll dive into the evaluate criteria endpoint. We offer three evaluation methods for testing your AI agents. Once you understand the required parameters for these evaluation methods, as well as the response which you will receive, you can begin testing.


Required Attributes

  • Name
    criteria_uuid
    Type
    UUIDField
    Description

    The criteria's uuid.

  • Name
    model_output
    Type
    string
    Description

    The model output which you want to evaluate.

Optional Attributes

  • Name
    prompt
    Type
    string
    Description

    Prompt can influence the evaluation model and the outcome of the evaluation.

  • Name
    context
    Type
    string
    Description

    Context will provide more information to the evaluation model in order to hone the evaluation of the model output.

  • Name
    hyperparameters
    Type
    Dictionary
    Description

    Hyperparameters specify the evaluation model (i.e. llama3-8b-8192 or gemma2-9b-it)and the temperature (i.e. a range from 0.00 to 2.00).


Response Metrics

  • Name
    reasoning
    Type
    string
    Description

    An explantion for how the model output was evaluated. Score for the meaning of the model output.

  • Name
    score
    Type
    boolean
    Description

    The score for the model's output is a boolean (i.e. PASS or FAIL).


POST/evaluate_basic

Evaluate Basic

This is the basic evaluation. An evaluation score (i.e. pass or fail) is returned as well as its respective explanation.

Request

POST
/evaluate_basic
from halluminate import Halluminate
halluminate = Halluminate(api_key='<your_api_key_here>')
response = halluminate.evaluate_basic(
    criteria_uuid="<insert_criteria_uuid_here>",
    model_output="<insert_model_output_text_here>",
    prompt=None,
    context=None,
    hyperparameters={
        "model": "<customizable_model>",
        "temperature": <customizable_temperature>
    }
)
print(response)

Response

{
    "reasoning": <an explanation for the model output's evaluation>,
    "score": <'PASS' or 'FAIL'>,
}
POST/evaluate_with_bot_court

Evaluate With Bot Court

This is a modified evaluation with a bot court. An evaluation score (i.e. pass or fail) is returned as well as its respective explanation.

Request

POST
/evaluate_with_bot_court
from halluminate import Halluminate
halluminate = Halluminate(api_key='<your_api_key_here>')
response = halluminate.evaluate_with_bot_court(
    criteria_uuid="<insert_criteria_uuid_here>",
    model_output="<insert_model_output_text_here>",
    prompt=None,
    context=None,
    hyperparameters={
        "model": "<customizable_model>",
        "temperature": <customizable_temperature>
    }
)
print(response)

Response

{
    "reasoning": <an explanation for the model output's evaluation>,
    "score": <'PASS' or 'FAIL'>,
}
POST/evaluate_with_reflection

Evaluate With Reflection

This is a modified evaluation with reflection. An evaluation score (i.e. pass or fail) is returned as well as its respective explanation.

Request

POST
/evaluate_with_reflection
from halluminate import Halluminate
halluminate = Halluminate(api_key='<your_api_key_here>')
response = halluminate.evaluate_with_reflection(
    criteria_uuid="<insert_criteria_uuid_here>",
    model_output="<insert_model_output_text_here>",
    prompt=None,
    context=None,
    hyperparameters={
        "model": "<customizable_model>",
        "temperature": <customizable_temperature>
    }
)
print(response)

Response

{
    "reasoning": <an explanation for the model output's evaluation>,
    "score": <'PASS' or 'FAIL'>,
}

Was this page helpful?