POST
/
evals
/
{eval_id}
Update an eval
curl --request POST \
  --url https://api.openai.com/v1/evals/{eval_id} \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '{
  "name": "<string>",
  "metadata": {}
}'
{
  "object": "eval",
  "id": "<string>",
  "name": "Chatbot effectiveness Evaluation",
  "data_source_config": {
    "type": "custom",
    "schema": "{\n  \"type\": \"object\",\n  \"properties\": {\n    \"item\": {\n      \"type\": \"object\",\n      \"properties\": {\n        \"label\": {\"type\": \"string\"},\n      },\n      \"required\": [\"label\"]\n    }\n  },\n  \"required\": [\"item\"]\n}\n"
  },
  "testing_criteria": "eval",
  "created_at": 123,
  "metadata": {},
  "share_with_openai": true
}

Authorizations

Authorization
string
header
required

Bearer authentication header of the form Bearer <token>, where <token> is your auth token.

Path Parameters

eval_id
string
required

The ID of the evaluation to update.

Body

application/json

Request to update an evaluation

name
string

Rename the evaluation.

metadata
object | null

Set of 16 key-value pairs that can be attached to an object. This can be useful for storing additional information about the object in a structured format, and querying for objects via API or the dashboard.

Keys are strings with a maximum length of 64 characters. Values are strings with a maximum length of 512 characters.

Response

200 - application/json

The updated evaluation

An Eval object with a data source config and testing criteria. An Eval represents a task to be done for your LLM integration. Like:

  • Improve the quality of my chatbot
  • See how well my chatbot handles customer support
  • Check if o3-mini is better at my usecase than gpt-4o
object
enum<string>
default:eval
required

The object type.

Available options:
eval
id
string
required

Unique identifier for the evaluation.

name
string
required

The name of the evaluation.

Example:

"Chatbot effectiveness Evaluation"

data_source_config
object
required

Configuration of data sources used in runs of the evaluation. A CustomDataSourceConfig which specifies the schema of your item and optionally sample namespaces. The response schema defines the shape of the data that will be:

  • Used to define your testing criteria and
  • What data is required when creating a run
testing_criteria
(LabelModelGrader · object | StringCheckGrader · object | TextSimilarityGrader · object)[]
required

A list of testing criteria.

created_at
integer
required

The Unix timestamp (in seconds) for when the eval was created.

metadata
object | null
required

Set of 16 key-value pairs that can be attached to an object. This can be useful for storing additional information about the object in a structured format, and querying for objects via API or the dashboard.

Keys are strings with a maximum length of 64 characters. Values are strings with a maximum length of 512 characters.

share_with_openai
boolean
required

Indicates whether the evaluation is shared with OpenAI.