OpenAI presented new audio models for real-time voice AI services
OpenAI announced the launch of three new audio models for its API, enabling voice AI services with real-time features such as translation, transcription, and support for complex dialogues.
The first of the new models, GPT-Realtime-2, offers an enhanced level of dialogue capable of maintaining longer and more complex conversations. The model can simultaneously use multiple tools, respond to changes in context, and work with specialized terminology. Additionally, developers can adjust the model’s reasoning level from minimal to high. In the Big Bench Audio and Audio MultiChallenge tests, GPT-Realtime-2 showed improved results compared to the previous version.
The second model, GPT-Realtime-Translate, is designed for instant voice translation. It supports over 70 input languages and 13 output languages and is already being tested in international calls and in customer support, including at Deutsche Telekom and startup BolnaAI.
The third model, GPT-Realtime-Whisper, is developed for real-time speech transcription, making it ideal for subtitling, note-taking during calls, and automating the work of voice agents.
All three models are now available for use through the Realtime API. The cost of using the models varies: GPT-Realtime-2 costs $32 per million input audio tokens and $64 on output; GPT-Realtime-Translate is $0.034 per minute, and GPT-Realtime-Whisper is $0.017 per minute.
This step is significant in the development of voice interaction in AI technologies, which can greatly facilitate international communications and automate many business processes. Experts predict that OpenAI’s new models could be a significant advancement in the field of natural language processing.
| Model | Function | Cost |
|---|---|---|
| GPT-Realtime-2 | Enhanced dialogue | $32 on input, $64 on output |
| GPT-Realtime-Translate | Real-time translation | $0.034 per minute |
| GPT-Realtime-Whisper | Real-time transcription | $0.017 per minute |




