ChatGPT increasingly part of the real world

GPT-4 Omni, or GPT-4o for short, is OpenAI’s latest cutting-edge AI model that combines human-like conversational abilities with multimodal perception across text, audio, and visual inputs.

“Omni” refers to the model’s ability to understand and generate content across different modalities like text, speech, and vision. Unlike previous language models limited to just text inputs and outputs, GPT-4o can analyze images, audio recordings, and documents in addition to parsing written prompts. Conversely, it can also generate audio responses, create visuals, and compose text seamlessly. This allows GPT-4o to power more intelligent and versatile applications that can perceive and interact with the world through multiple sensory modalities, mimicking human-like multimedia communication and comprehension abilities.

In addition to increasing ChatGPT’s speed and accessibility, as reported here, GPT-4o enhances its functionality by enabling more natural dialogues through desktop or mobile apps.

GPT-4o has made great progress in our understanding of human communication by allowing you to have conversations that nearly sound real. Including all the imperfections of the real world, like interpreting tone, interrupting, and even realizing you’ve made a mistake. These advanced conversational abilities were shown during OpenAI’s live product demo.

From a technical standpoint, OpenAI asserts that GPT-4o delivers significant performance upgrades compared to its predecessor GPT-4. According to the company, GPT-4o is twice as fast as GPT-4 in terms of inference speed, allowing for more responsive and low-latency interactions. Moreover, GPT-4o is claimed to be half the cost of GPT-4 when deployed via OpenAI’s API or Microsoft’s Azure OpenAI Service. This cost reduction makes the advanced AI model more accessible to developers and businesses. Additionally, GPT-4o offers higher rate limits, enabling developers to scale up their usage without hitting arbitrary throughput constraints. These performance enhancements position GPT-4o as a more capable and resource-efficient solution for AI applications across various domains.

>>>  ChatGPT-4 VS Google Bard

In the video, the presenter solicited feedback on his breathing technique during the first live demo. He took a deep breath into his phone, to which ChatGPT replied, “You’re not a vacuum cleaner.” Therefore, it showed that it could recognize and react to human subtleties.

So, speaking casually to your phone and receiving the desired response—rather than one telling you to Google it—makes GPT-4o feel even more natural than typing in a search query.

Among the other impressive features shown, are ChatGPT’s ability to act as a simultaneous translator between speakers; and the ability to recognize objects in the world around through the camera and react accordingly (the example shows a sheet of paper with an equation written on it that ChatGPT can read and suggest how to solve); recognizing the speaker’s tone of voice, but also replicating different nuances of speech and emotions including sarcasm, as well as the ability to sing.

In addition to these features, the ability to create images including text, and 3D images, has also been improved.

Anyway, you’re probably not alone if you thought about the movie Her or another dystopian film featuring artificial intelligence. This kind of natural speech with ChatGPT is similar to what happens in the movie. Given that it will be available for free on both desktop and mobile devices, a lot of people might soon experience something similar.

It’s evident from this first view that GPT-4o is getting ready to face the greatest that Apple and Google have to offer in their much-awaited AI announcements.

>>>  AI can identify humans' vulnerabilities and influence their decisions

OpenAI surprises us with this amazing new development that Google had falsely previewed with Gemini not long ago. Once again, the company proves to be a leader in the field, creating both wonder and concern. All of these new features will surely allow us to have an intelligent ally capable of teaching us and helping us learn new things better. But how much intelligence will we delegate each time? Will we become more educated or will we increasingly delegate tasks? The simultaneous translation then raises the ever more obvious doubts about how easy it is to replace a profession, in this case, that of an interpreter. And how easy will it be for an increasingly capable AI to simulate a human being in order to gain their trust and manipulate people if used improperly?