Speech to Text

Integrating OpenAI's Speech-to-Text (STT) capabilities into your Unity project enables you to transcribe audio content into written text. This feature is powered by OpenAI's advanced speech recognition models, making it invaluable for applications that involve voice commands, audio content accessibility, or the processing of spoken user inputs.

For detailed information about the Speech-to-Text API, including the models available, parameter options, and best practices for audio files, refer to the Speech API Reference.

Speech-to-Text Operations Overview:

  • Audio Transcription: Convert spoken words from audio files into accurate written text. This process facilitates the understanding and utilization of spoken language within your applications.

  • Audio Translation: Convert and translation spoken language into written text in English.

Sample Code for Speech-to-Text Requests:

1. Audio Transcription Request:

Transcribe audio content to text. You'll need to provide the audio file as a FormFile (for API requests) or a AudioClip object (within Unity).

TranscriptionRequest request = new TranscriptionRequest.Builder()
    .SetAudioFile(formFile or AudioClip) // Specify the path to your audio file
    .SetLanguage(SystemLanguage.Korean) // Specify the language of your audio file
    .SetModel(WhisperModel.Whisper1) // There is only 1 model as of now
    .Build();

string transcription = await request.ExecuteAsync();

Debug.Log($"Transcribed Text: {transcription}");

2. Audio Translation Request:

Transcribe audio content to English text. You'll need to provide the audio file as a FormFile (for API requests) or a AudioClip object (within Unity).

TranslationRequest request = new TranslationRequest.Builder()
    .SetAudioFile(formFile or AudioClip) // Specify the path to your audio file
    .SetModel(WhisperModel.Whisper1) // There is only 1 model as of now
    .Build();

string translation = await request.ExecuteAsync();

Debug.Log($"Translated Text: {transcription}");

Last updated