Create eval

curl --request POST \
  --url https://api.openai.com/v1/evals \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '
{
  "data_source_config": {
    "type": "custom",
    "item_schema": "{\n  \"type\": \"object\",\n  \"properties\": {\n    \"name\": {\"type\": \"string\"},\n    \"age\": {\"type\": \"integer\"}\n  },\n  \"required\": [\"name\", \"age\"]\n}\n",
    "include_sample_schema": false
  },
  "testing_criteria": [
    {
      "type": "label_model",
      "name": "<string>",
      "model": "<string>",
      "input": [
        {
          "role": "<string>",
          "content": "<string>"
        }
      ],
      "labels": [
        "<string>"
      ],
      "passing_labels": [
        "<string>"
      ]
    }
  ],
  "name": "<string>",
  "metadata": {},
  "share_with_openai": false
}
'

{
  "object": "eval",
  "id": "<string>",
  "name": "Chatbot effectiveness Evaluation",
  "data_source_config": {
    "type": "custom",
    "schema": "{\n  \"type\": \"object\",\n  \"properties\": {\n    \"item\": {\n      \"type\": \"object\",\n      \"properties\": {\n        \"label\": {\"type\": \"string\"},\n      },\n      \"required\": [\"label\"]\n    }\n  },\n  \"required\": [\"item\"]\n}\n"
  },
  "testing_criteria": "eval",
  "created_at": 123,
  "metadata": {},
  "share_with_openai": true
}

POST

evals

Create eval

curl --request POST \
  --url https://api.openai.com/v1/evals \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '
{
  "data_source_config": {
    "type": "custom",
    "item_schema": "{\n  \"type\": \"object\",\n  \"properties\": {\n    \"name\": {\"type\": \"string\"},\n    \"age\": {\"type\": \"integer\"}\n  },\n  \"required\": [\"name\", \"age\"]\n}\n",
    "include_sample_schema": false
  },
  "testing_criteria": [
    {
      "type": "label_model",
      "name": "<string>",
      "model": "<string>",
      "input": [
        {
          "role": "<string>",
          "content": "<string>"
        }
      ],
      "labels": [
        "<string>"
      ],
      "passing_labels": [
        "<string>"
      ]
    }
  ],
  "name": "<string>",
  "metadata": {},
  "share_with_openai": false
}
'

{
  "object": "eval",
  "id": "<string>",
  "name": "Chatbot effectiveness Evaluation",
  "data_source_config": {
    "type": "custom",
    "schema": "{\n  \"type\": \"object\",\n  \"properties\": {\n    \"item\": {\n      \"type\": \"object\",\n      \"properties\": {\n        \"label\": {\"type\": \"string\"},\n      },\n      \"required\": [\"label\"]\n    }\n  },\n  \"required\": [\"item\"]\n}\n"
  },
  "testing_criteria": "eval",
  "created_at": 123,
  "metadata": {},
  "share_with_openai": true
}

Authorizations

Authorization

string

header

required

Bearer authentication header of the form Bearer <token>, where <token> is your auth token.

Body

application/json

data_source_config

CustomDataSourceConfig · object

required

A CustomDataSourceConfig object that defines the schema for the data source used for the evaluation runs. This schema is used to define the shape of the data that will be:

Used to define your testing criteria and
What data is required when creating a run

CustomDataSourceConfig
StoredCompletionsDataSourceConfig

Show child attributes

testing_criteria

(LabelModelGrader · object | StringCheckGrader · object | TextSimilarityGrader · object)[]

required

A list of graders for all eval runs in this group.

A LabelModelGrader object which uses a model to assign labels to each item in the evaluation.

LabelModelGrader
StringCheckGrader
TextSimilarityGrader

Show child attributes

name

string

The name of the evaluation.

metadata

object

Set of 16 key-value pairs that can be attached to an object. This can be useful for storing additional information about the object in a structured format, and querying for objects via API or the dashboard.

Keys are strings with a maximum length of 64 characters. Values are strings with a maximum length of 512 characters.

Show child attributes

Indicates whether the evaluation is shared with OpenAI.

Response

201 - application/json

An Eval object with a data source config and testing criteria. An Eval represents a task to be done for your LLM integration. Like:

Improve the quality of my chatbot
See how well my chatbot handles customer support
Check if o3-mini is better at my usecase than gpt-4o

object

enum<string>

default:eval

required

The object type.

Available options:

eval

string

required

Unique identifier for the evaluation.

name

string

required

The name of the evaluation.

Example:

"Chatbot effectiveness Evaluation"

data_source_config

CustomDataSourceConfig · object

required

A CustomDataSourceConfig which specifies the schema of your item and optionally sample namespaces. The response schema defines the shape of the data that will be:

Used to define your testing criteria and
What data is required when creating a run

CustomDataSourceConfig
StoredCompletionsDataSourceConfig

Show child attributes

testing_criteria

(LabelModelGrader · object | StringCheckGrader · object | TextSimilarityGrader · object)[]

required

A list of testing criteria.

A LabelModelGrader object which uses a model to assign labels to each item in the evaluation.

LabelModelGrader
StringCheckGrader
TextSimilarityGrader

Show child attributes

created_at

integer

required

The Unix timestamp (in seconds) for when the eval was created.

metadata

object

required

Keys are strings with a maximum length of 64 characters. Values are strings with a maximum length of 512 characters.

Show child attributes

Indicates whether the evaluation is shared with OpenAI.

List evals

Get an eval

⌘I

API

Assistants

Audio

Batch

Chat

Completions

Embeddings

Evals

Files

Fine-tuning

Images

Models

Moderations

API Reference

Audit Logs

Certificates

Usage

Invites

Projects

Users

Realtime

Responses

Uploads

Vector stores

Create eval

Authorizations

Body

Response