curl --request POST \
--url https://api.openai.com/v1/evals \
--header 'Authorization: Bearer <token>' \
--header 'Content-Type: application/json' \
--data '
{
"data_source_config": {
"type": "custom",
"item_schema": "{\n \"type\": \"object\",\n \"properties\": {\n \"name\": {\"type\": \"string\"},\n \"age\": {\"type\": \"integer\"}\n },\n \"required\": [\"name\", \"age\"]\n}\n",
"include_sample_schema": false
},
"testing_criteria": [
{
"type": "label_model",
"name": "<string>",
"model": "<string>",
"input": [
{
"role": "<string>",
"content": "<string>"
}
],
"labels": [
"<string>"
],
"passing_labels": [
"<string>"
]
}
],
"name": "<string>",
"metadata": {},
"share_with_openai": false
}
'{
"object": "eval",
"id": "<string>",
"name": "Chatbot effectiveness Evaluation",
"data_source_config": {
"type": "custom",
"schema": "{\n \"type\": \"object\",\n \"properties\": {\n \"item\": {\n \"type\": \"object\",\n \"properties\": {\n \"label\": {\"type\": \"string\"},\n },\n \"required\": [\"label\"]\n }\n },\n \"required\": [\"item\"]\n}\n"
},
"testing_criteria": "eval",
"created_at": 123,
"metadata": {},
"share_with_openai": true
}Create the structure of an evaluation that can be used to test a model’s performance. An evaluation is a set of testing criteria and a datasource. After creating an evaluation, you can run it on different models and model parameters. We support several types of graders and datasources. For more information, see the Evals guide.
curl --request POST \
--url https://api.openai.com/v1/evals \
--header 'Authorization: Bearer <token>' \
--header 'Content-Type: application/json' \
--data '
{
"data_source_config": {
"type": "custom",
"item_schema": "{\n \"type\": \"object\",\n \"properties\": {\n \"name\": {\"type\": \"string\"},\n \"age\": {\"type\": \"integer\"}\n },\n \"required\": [\"name\", \"age\"]\n}\n",
"include_sample_schema": false
},
"testing_criteria": [
{
"type": "label_model",
"name": "<string>",
"model": "<string>",
"input": [
{
"role": "<string>",
"content": "<string>"
}
],
"labels": [
"<string>"
],
"passing_labels": [
"<string>"
]
}
],
"name": "<string>",
"metadata": {},
"share_with_openai": false
}
'{
"object": "eval",
"id": "<string>",
"name": "Chatbot effectiveness Evaluation",
"data_source_config": {
"type": "custom",
"schema": "{\n \"type\": \"object\",\n \"properties\": {\n \"item\": {\n \"type\": \"object\",\n \"properties\": {\n \"label\": {\"type\": \"string\"},\n },\n \"required\": [\"label\"]\n }\n },\n \"required\": [\"item\"]\n}\n"
},
"testing_criteria": "eval",
"created_at": 123,
"metadata": {},
"share_with_openai": true
}Bearer authentication header of the form Bearer <token>, where <token> is your auth token.
A CustomDataSourceConfig object that defines the schema for the data source used for the evaluation runs. This schema is used to define the shape of the data that will be:
Show child attributes
A list of graders for all eval runs in this group.
A LabelModelGrader object which uses a model to assign labels to each item in the evaluation.
Show child attributes
The name of the evaluation.
Set of 16 key-value pairs that can be attached to an object. This can be useful for storing additional information about the object in a structured format, and querying for objects via API or the dashboard.
Keys are strings with a maximum length of 64 characters. Values are strings with a maximum length of 512 characters.
Show child attributes
Indicates whether the evaluation is shared with OpenAI.
OK
An Eval object with a data source config and testing criteria. An Eval represents a task to be done for your LLM integration. Like:
The object type.
eval Unique identifier for the evaluation.
The name of the evaluation.
"Chatbot effectiveness Evaluation"
A CustomDataSourceConfig which specifies the schema of your item and optionally sample namespaces.
The response schema defines the shape of the data that will be:
Show child attributes
A list of testing criteria.
A LabelModelGrader object which uses a model to assign labels to each item in the evaluation.
Show child attributes
The Unix timestamp (in seconds) for when the eval was created.
Set of 16 key-value pairs that can be attached to an object. This can be useful for storing additional information about the object in a structured format, and querying for objects via API or the dashboard.
Keys are strings with a maximum length of 64 characters. Values are strings with a maximum length of 512 characters.
Show child attributes
Indicates whether the evaluation is shared with OpenAI.
Was this page helpful?