OpenAI Unveils ChatGPT 4o Omni: A Multimodal AI Breakthrough

OpenAI Unveils ChatGPT 4o Omni: A Multimodal AI Breakthrough

Discover the latest from OpenAI as they unveil ChatGPT 4o Omni, a groundbreaking new release that excels in voice, images, and text interactions. Explore the world of multimodal AI with this cutting-edge technology.

ChatGPT has introduced a new version called ChatGPT 4o, which can now take in audio, image, and text inputs, and provide outputs in audio, image, and text formats. The "o" in ChatGPT 4o stands for "omni," a term that signifies "all" or "complete."

OpenAI introduced the latest ChatGPT as a step forward in making interactions between humans and machines more natural. This new version can respond to user inputs just as quickly as people do in conversations. In English, it matches the performance of ChatGPT 4 Turbo and surpasses it in other languages. The API now performs much better, running faster and costing 50% less.

Advanced Voice Processing

GPT-4o has been found to perform at the same level as GPT-4 Turbo on tasks related to text, reasoning, and coding intelligence. In addition, it has surpassed previous benchmarks in terms of multilingual capabilities, audio processing, and visual recognition.

The old way of communicating with voice required using three different models to convert voice inputs to text, process the text, and then convert it back to audio. This method was criticized for losing nuances in the translations.

OpenAI pointed out the drawbacks of the old approach that are supposedly addressed by the new method:

The new version doesn’t need three different models because all of the inputs and outputs are handled together in one model for end to end audio input and output. Interestingly, OpenAI states that they haven’t yet explored the full capabilities of the new model or fully understand the limitations of it.

This process means that the main source of intelligence, GPT-4, loses a lot of information—it can’t directly observe tone, multiple speakers, or background noises, and it can’t output laughter, singing, or express emotion.

Enhanced Safety Measures and Gradual Rollout

OpenAI GPT 4o now includes enhanced safety features and filters to prevent unintended voice outputs. Today's announcement reveals that the new capabilities will initially only be available for text and image inputs, text outputs, and limited audio. GPT 4o is accessible through both free and paid tiers, with Plus users enjoying message limits five times higher than regular users.

Audio capabilities are due for a limited alpha-phase release for ChatGPT Plus and API users within weeks.

The announcement explained:

We understand that there are new risks associated with GPT-4o's audio features. Today, we are making text and image inputs and text outputs available to the public. In the coming weeks and months, we will focus on enhancing the technical setup, usability after training, and safety measures needed to introduce other features. For instance, when we first launch, audio outputs will only include a few preset voices and will follow our current safety guidelines.

Hello GPT-4o

Featured Image by Shutterstock/Photo For Everything

Editor's P/S: