> ## Documentation Index
> Fetch the complete documentation index at: https://openai-hd4n6.mintlify.site/llms.txt
> Use this file to discover all available pages before exploring further.

# Create transcription

> Transcribes audio into the input language.


## OpenAPI

````yaml api-definition.yaml post /audio/transcriptions
openapi: 3.0.0
info:
  title: OpenAI API
  description: >-
    The OpenAI REST API. Please see
    https://platform.openai.com/docs/api-reference for more details.
  version: 2.3.0
  termsOfService: https://openai.com/policies/terms-of-use
  contact:
    name: OpenAI Support
    url: https://help.openai.com/
  license:
    name: MIT
    url: https://github.com/openai/openai-openapi/blob/master/LICENSE
servers:
  - url: https://api.openai.com/v1
security:
  - ApiKeyAuth: []
tags:
  - name: Assistants
    description: Build Assistants that can call models and use tools.
  - name: Audio
    description: Turn audio into text or text into audio.
  - name: Chat
    description: >-
      Given a list of messages comprising a conversation, the model will return
      a response.
  - name: Completions
    description: >-
      Given a prompt, the model will return one or more predicted completions,
      and can also return the probabilities of alternative tokens at each
      position.
  - name: Embeddings
    description: >-
      Get a vector representation of a given input that can be easily consumed
      by machine learning models and algorithms.
  - name: Evals
    description: Manage and run evals in the OpenAI platform.
  - name: Fine-tuning
    description: Manage fine-tuning jobs to tailor a model to your specific training data.
  - name: Batch
    description: Create large batches of API requests to run asynchronously.
  - name: Files
    description: >-
      Files are used to upload documents that can be used with features like
      Assistants and Fine-tuning.
  - name: Uploads
    description: Use Uploads to upload large files in multiple parts.
  - name: Images
    description: Given a prompt and/or an input image, the model will generate a new image.
  - name: Models
    description: List and describe the various models available in the API.
  - name: Moderations
    description: >-
      Given text and/or image inputs, classifies if those inputs are potentially
      harmful.
  - name: Audit Logs
    description: List user actions and configuration changes within this organization.
paths:
  /audio/transcriptions:
    post:
      tags:
        - Audio
      summary: Create transcription
      description: Transcribes audio into the input language.
      operationId: createTranscription
      requestBody:
        required: true
        content:
          multipart/form-data:
            schema:
              $ref: '#/components/schemas/CreateTranscriptionRequest'
      responses:
        '200':
          description: OK
          content:
            application/json:
              schema:
                oneOf:
                  - $ref: '#/components/schemas/CreateTranscriptionResponseJson'
                  - $ref: >-
                      #/components/schemas/CreateTranscriptionResponseVerboseJson
            text/event-stream:
              schema:
                $ref: '#/components/schemas/CreateTranscriptionResponseStreamEvent'
components:
  schemas:
    CreateTranscriptionRequest:
      type: object
      additionalProperties: false
      properties:
        file:
          description: >
            The audio file object (not file name) to transcribe, in one of these
            formats: flac, mp3, mp4, mpeg, mpga, m4a, ogg, wav, or webm.
          type: string
          x-oaiTypeLabel: file
          format: binary
        model:
          description: >
            ID of the model to use. The options are `gpt-4o-transcribe`,
            `gpt-4o-mini-transcribe`, and `whisper-1` (which is powered by our
            open source Whisper V2 model).
          example: gpt-4o-transcribe
          anyOf:
            - type: string
            - type: string
              enum:
                - whisper-1
                - gpt-4o-transcribe
                - gpt-4o-mini-transcribe
              x-stainless-const: true
          x-oaiTypeLabel: string
        language:
          description: >
            The language of the input audio. Supplying the input language in
            [ISO-639-1](https://en.wikipedia.org/wiki/List_of_ISO_639-1_codes)
            (e.g. `en`) format will improve accuracy and latency.
          type: string
        prompt:
          description: >
            An optional text to guide the model's style or continue a previous
            audio segment. The [prompt](/docs/guides/speech-to-text#prompting)
            should match the audio language.
          type: string
        response_format:
          $ref: '#/components/schemas/AudioResponseFormat'
        temperature:
          description: >
            The sampling temperature, between 0 and 1. Higher values like 0.8
            will make the output more random, while lower values like 0.2 will
            make it more focused and deterministic. If set to 0, the model will
            use [log probability](https://en.wikipedia.org/wiki/Log_probability)
            to automatically increase the temperature until certain thresholds
            are hit.
          type: number
          default: 0
        include[]:
          description: >
            Additional information to include in the transcription response. 

            `logprobs` will return the log probabilities of the tokens in the 

            response to understand the model's confidence in the transcription. 

            `logprobs` only works with response_format set to `json` and only
            with 

            the models `gpt-4o-transcribe` and `gpt-4o-mini-transcribe`.
          type: array
          items:
            $ref: '#/components/schemas/TranscriptionInclude'
        timestamp_granularities[]:
          description: >
            The timestamp granularities to populate for this transcription.
            `response_format` must be set `verbose_json` to use timestamp
            granularities. Either or both of these options are supported:
            `word`, or `segment`. Note: There is no additional latency for
            segment timestamps, but generating word timestamps incurs additional
            latency.
          type: array
          items:
            type: string
            enum:
              - word
              - segment
          default:
            - segment
        stream:
          description: >
            If set to true, the model response data will be streamed to the
            client

            as it is generated using [server-sent
            events](https://developer.mozilla.org/en-US/docs/Web/API/Server-sent_events/Using_server-sent_events#Event_stream_format). 

            See the [Streaming section of the Speech-to-Text
            guide](/docs/guides/speech-to-text?lang=curl#streaming-transcriptions)

            for more information.


            Note: Streaming is not supported for the `whisper-1` model and will
            be ignored.
          type: boolean
          nullable: true
          default: false
      required:
        - file
        - model
    CreateTranscriptionResponseJson:
      type: object
      description: >-
        Represents a transcription response returned by model, based on the
        provided input.
      properties:
        text:
          type: string
          description: The transcribed text.
          items:
            type: object
            properties:
              token:
                type: string
                description: The token in the transcription.
              logprob:
                type: number
                description: The log probability of the token.
              bytes:
                type: array
                items:
                  type: number
                description: The bytes of the token.
      required:
        - text
      x-oaiMeta:
        name: The transcription object (JSON)
        group: audio
        example: |
          {
            "text": "Imagine the wildest idea that you've ever had, and you're curious about how it might scale to something that's a 100, a 1,000 times bigger. This is a place where you can get to do that."
          }
    CreateTranscriptionResponseVerboseJson:
      type: object
      description: >-
        Represents a verbose json transcription response returned by model,
        based on the provided input.
      properties:
        language:
          type: string
          description: The language of the input audio.
        duration:
          type: number
          description: The duration of the input audio.
        text:
          type: string
          description: The transcribed text.
        words:
          type: array
          description: Extracted words and their corresponding timestamps.
          items:
            $ref: '#/components/schemas/TranscriptionWord'
        segments:
          type: array
          description: Segments of the transcribed text and their corresponding details.
          items:
            $ref: '#/components/schemas/TranscriptionSegment'
      required:
        - language
        - duration
        - text
      x-oaiMeta:
        name: The transcription object (Verbose JSON)
        group: audio
        example: |
          {
            "task": "transcribe",
            "language": "english",
            "duration": 8.470000267028809,
            "text": "The beach was a popular spot on a hot summer day. People were swimming in the ocean, building sandcastles, and playing beach volleyball.",
            "segments": [
              {
                "id": 0,
                "seek": 0,
                "start": 0.0,
                "end": 3.319999933242798,
                "text": " The beach was a popular spot on a hot summer day.",
                "tokens": [
                  50364, 440, 7534, 390, 257, 3743, 4008, 322, 257, 2368, 4266, 786, 13, 50530
                ],
                "temperature": 0.0,
                "avg_logprob": -0.2860786020755768,
                "compression_ratio": 1.2363636493682861,
                "no_speech_prob": 0.00985979475080967
              },
              ...
            ]
          }
    CreateTranscriptionResponseStreamEvent:
      anyOf:
        - $ref: '#/components/schemas/TranscriptTextDeltaEvent'
        - $ref: '#/components/schemas/TranscriptTextDoneEvent'
      discriminator:
        propertyName: type
    AudioResponseFormat:
      description: >
        The format of the output, in one of these options: `json`, `text`,
        `srt`, `verbose_json`, or `vtt`. For `gpt-4o-transcribe` and
        `gpt-4o-mini-transcribe`, the only supported format is `json`.
      type: string
      enum:
        - json
        - text
        - srt
        - verbose_json
        - vtt
      default: json
    TranscriptionInclude:
      type: string
      enum:
        - logprobs
      default: []
    TranscriptionWord:
      type: object
      properties:
        word:
          type: string
          description: The text content of the word.
        start:
          type: number
          format: float
          description: Start time of the word in seconds.
        end:
          type: number
          format: float
          description: End time of the word in seconds.
      required:
        - word
        - start
        - end
    TranscriptionSegment:
      type: object
      properties:
        id:
          type: integer
          description: Unique identifier of the segment.
        seek:
          type: integer
          description: Seek offset of the segment.
        start:
          type: number
          format: float
          description: Start time of the segment in seconds.
        end:
          type: number
          format: float
          description: End time of the segment in seconds.
        text:
          type: string
          description: Text content of the segment.
        tokens:
          type: array
          items:
            type: integer
          description: Array of token IDs for the text content.
        temperature:
          type: number
          format: float
          description: Temperature parameter used for generating the segment.
        avg_logprob:
          type: number
          format: float
          description: >-
            Average logprob of the segment. If the value is lower than -1,
            consider the logprobs failed.
        compression_ratio:
          type: number
          format: float
          description: >-
            Compression ratio of the segment. If the value is greater than 2.4,
            consider the compression failed.
        no_speech_prob:
          type: number
          format: float
          description: >-
            Probability of no speech in the segment. If the value is higher than
            1.0 and the `avg_logprob` is below -1, consider this segment silent.
      required:
        - id
        - seek
        - start
        - end
        - text
        - tokens
        - temperature
        - avg_logprob
        - compression_ratio
        - no_speech_prob
    TranscriptTextDeltaEvent:
      type: object
      description: >-
        Emitted when there is an additional text delta. This is also the first
        event emitted when the transcription starts. Only emitted when you
        [create a transcription](/docs/api-reference/audio/create-transcription)
        with the `Stream` parameter set to `true`.
      properties:
        type:
          type: string
          description: |
            The type of the event. Always `transcript.text.delta`.
          enum:
            - transcript.text.delta
          x-stainless-const: true
        delta:
          type: string
          description: |
            The text delta that was additionally transcribed.
        logprobs:
          type: array
          description: >
            The log probabilities of the delta. Only included if you [create a
            transcription](/docs/api-reference/audio/create-transcription) with
            the `include[]` parameter set to `logprobs`.
          items:
            type: object
            properties:
              token:
                type: string
                description: |
                  The token that was used to generate the log probability.
              logprob:
                type: number
                description: |
                  The log probability of the token.
              bytes:
                type: array
                description: |
                  The bytes that were used to generate the log probability.
      required:
        - type
        - delta
      x-oaiMeta:
        name: Stream Event (transcript.text.delta)
        group: transcript
        example: |
          {
            "type": "transcript.text.delta",
            "delta": " wonderful"
          }
    TranscriptTextDoneEvent:
      type: object
      description: >-
        Emitted when the transcription is complete. Contains the complete
        transcription text. Only emitted when you [create a
        transcription](/docs/api-reference/audio/create-transcription) with the
        `Stream` parameter set to `true`.
      properties:
        type:
          type: string
          description: |
            The type of the event. Always `transcript.text.done`.
          enum:
            - transcript.text.done
          x-stainless-const: true
        text:
          type: string
          description: |
            The text that was transcribed.
        logprobs:
          type: array
          description: >
            The log probabilities of the individual tokens in the transcription.
            Only included if you [create a
            transcription](/docs/api-reference/audio/create-transcription) with
            the `include[]` parameter set to `logprobs`.
          items:
            type: object
            properties:
              token:
                type: string
                description: |
                  The token that was used to generate the log probability.
              logprob:
                type: number
                description: |
                  The log probability of the token.
              bytes:
                type: array
                description: |
                  The bytes that were used to generate the log probability.
      required:
        - type
        - text
      x-oaiMeta:
        name: Stream Event (transcript.text.done)
        group: transcript
        example: |
          {
            "type": "transcript.text.done",
            "text": "I see skies of blue and clouds of white, the bright blessed days, the dark sacred nights, and I think to myself, what a wonderful world."
          }
  securitySchemes:
    ApiKeyAuth:
      type: http
      scheme: bearer

````