> ## Documentation Index
> Fetch the complete documentation index at: https://openai-hd4n6.mintlify.site/llms.txt
> Use this file to discover all available pages before exploring further.

# Streaming API responses

> Learn how to stream model responses from the OpenAI API using server-sent events.

By default, when you make a request to the OpenAI API, we generate the model's entire output before sending it back in a single HTTP response. When generating long outputs, waiting for a response can take time. Streaming responses lets you start printing or processing the beginning of the model's output while it continues generating the full response.

## Enable streaming

To start streaming responses, set `stream=True` in your request to the Responses endpoint:

<CodeGroup>
  ```javascript javascript theme={"system"}
  import { OpenAI } from "openai";
  const client = new OpenAI();

  const stream = await client.responses.create({
      model: "gpt-4.1",
      input: [
          {
              role: "user",
              content: "Say 'double bubble bath' ten times fast.",
          },
      ],
      stream: true,
  });

  for await (const event of stream) {
      console.log(event);
  }
  ```

  ```python python theme={"system"}
  from openai import OpenAI
  client = OpenAI()

  stream = client.responses.create(
      model="gpt-4.1",
      input=[
          {
              "role": "user",
              "content": "Say 'double bubble bath' ten times fast.",
          },
      ],
      stream=True,
  )

  for event in stream:
      print(event)
  ```
</CodeGroup>

The Responses API uses semantic events for streaming. Each event is typed with a predefined schema, so you can listen for events you care about.

For a full list of event types, see the [API reference for streaming](/docs/api-reference/responses-streaming). Here are a few examples:

```typescript theme={"system"}
type StreamingEvent =
	| ResponseCreatedEvent
	| ResponseInProgressEvent
	| ResponseFailedEvent
	| ResponseCompletedEvent
	| ResponseOutputItemAdded
	| ResponseOutputItemDone
	| ResponseContentPartAdded
	| ResponseContentPartDone
	| ResponseOutputTextDelta
	| ResponseOutputTextAnnotationAdded
	| ResponseTextDone
	| ResponseRefusalDelta
	| ResponseRefusalDone
	| ResponseFunctionCallArgumentsDelta
	| ResponseFunctionCallArgumentsDone
	| ResponseFileSearchCallInProgress
	| ResponseFileSearchCallSearching
	| ResponseFileSearchCallCompleted
	| ResponseCodeInterpreterInProgress
	| ResponseCodeInterpreterCallCodeDelta
	| ResponseCodeInterpreterCallCodeDone
	| ResponseCodeInterpreterCallIntepreting
	| ResponseCodeInterpreterCallCompleted
	| Error
```

## Read the responses

If you're using our SDK, every event is a typed instance. You can also identity individual events using the `type` property of the event.

Some key lifecycle events are emitted only once, while others are emitted multiple times as the response is generated. Common events to listen for when streaming text are:

```text theme={"system"}
- \`response.created\`
- \`response.output_text.delta\`
- \`response.completed\`
- \`error\`
```

For a full list of events you can listen for, see the [API reference for streaming](/docs/api-reference/responses-streaming).

## Advanced use cases

For more advanced use cases, like streaming tool calls, check out the following dedicated guides:

<CardGroup cols={2}>
  <Card title="Streaming function calls" icon="antenna" iconType="solid" href="/docs/guides/function-calling#streaming" horizontal />

  <Card title="Streaming structured output" icon="radio" iconType="solid" href="/docs/guides/structured-outputs#streaming" horizontal />
</CardGroup>

## Moderation risk

Note that streaming the model's output in a production application makes it more difficult to moderate the content of the completions, as partial completions may be more difficult to evaluate. This may have implications for approved usage.
