POST
/
evals
Create eval
curl --request POST \
  --url https://api.openai.com/v1/evals \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '{
  "name": "<string>",
  "metadata": {},
  "data_source_config": {
    "type": "custom",
    "item_schema": "{\n  \"type\": \"object\",\n  \"properties\": {\n    \"name\": {\"type\": \"string\"},\n    \"age\": {\"type\": \"integer\"}\n  },\n  \"required\": [\"name\", \"age\"]\n}\n",
    "include_sample_schema": false
  },
  "testing_criteria": [
    {
      "type": "label_model",
      "name": "<string>",
      "model": "<string>",
      "input": [
        {
          "role": "<string>",
          "content": "<string>"
        }
      ],
      "labels": [
        "<string>"
      ],
      "passing_labels": [
        "<string>"
      ]
    }
  ],
  "share_with_openai": false
}'
{
  "object": "eval",
  "id": "<string>",
  "name": "Chatbot effectiveness Evaluation",
  "data_source_config": {
    "type": "custom",
    "schema": "{\n  \"type\": \"object\",\n  \"properties\": {\n    \"item\": {\n      \"type\": \"object\",\n      \"properties\": {\n        \"label\": {\"type\": \"string\"},\n      },\n      \"required\": [\"label\"]\n    }\n  },\n  \"required\": [\"item\"]\n}\n"
  },
  "testing_criteria": "eval",
  "created_at": 123,
  "metadata": {},
  "share_with_openai": true
}

Authorizations

Authorization
string
header
required

Bearer authentication header of the form Bearer <token>, where <token> is your auth token.

Body

application/json
data_source_config
object
required

The configuration for the data source used for the evaluation runs. A CustomDataSourceConfig object that defines the schema for the data source used for the evaluation runs. This schema is used to define the shape of the data that will be:

  • Used to define your testing criteria and
  • What data is required when creating a run
testing_criteria
(LabelModelGrader · object | StringCheckGrader · object | TextSimilarityGrader · object)[]
required

A list of graders for all eval runs in this group.

name
string

The name of the evaluation.

metadata
object | null

Set of 16 key-value pairs that can be attached to an object. This can be useful for storing additional information about the object in a structured format, and querying for objects via API or the dashboard.

Keys are strings with a maximum length of 64 characters. Values are strings with a maximum length of 512 characters.

share_with_openai
boolean
default:false

Indicates whether the evaluation is shared with OpenAI.

Response

201 - application/json

OK

An Eval object with a data source config and testing criteria. An Eval represents a task to be done for your LLM integration. Like:

  • Improve the quality of my chatbot
  • See how well my chatbot handles customer support
  • Check if o3-mini is better at my usecase than gpt-4o
object
enum<string>
default:eval
required

The object type.

Available options:
eval
id
string
required

Unique identifier for the evaluation.

name
string
required

The name of the evaluation.

Example:

"Chatbot effectiveness Evaluation"

data_source_config
object
required

Configuration of data sources used in runs of the evaluation. A CustomDataSourceConfig which specifies the schema of your item and optionally sample namespaces. The response schema defines the shape of the data that will be:

  • Used to define your testing criteria and
  • What data is required when creating a run
testing_criteria
(LabelModelGrader · object | StringCheckGrader · object | TextSimilarityGrader · object)[]
required

A list of testing criteria.

created_at
integer
required

The Unix timestamp (in seconds) for when the eval was created.

metadata
object | null
required

Set of 16 key-value pairs that can be attached to an object. This can be useful for storing additional information about the object in a structured format, and querying for objects via API or the dashboard.

Keys are strings with a maximum length of 64 characters. Values are strings with a maximum length of 512 characters.

share_with_openai
boolean
required

Indicates whether the evaluation is shared with OpenAI.