OpenAI has announced the deployment of ChatGPT 4o, taking artificial intelligence to new levels of human-style interaction.
ChatGPT 4o was announced on Monday 13th, and it promises to bring new levels of human-style interactions with the system, both to desktop and to mobile apps. One of the main aims of ChatGPT 4o is to make interaction with the AI much more natural.
Anyone who has used the system, particularly in conversation with the voice system, will know that there is a lot of delay. Further, the system cannot detect emotion or easily distinguish between background sounds and multiple voices. Until now.
ChatGPT 4o not only brings with it human level response times in conversation, but it can detect users' emotions, facial expressions, as well as making observations about the environment. OpenAI says it has been working on other important aspects as well, such as memory, so that the system can remember details across multiple conversations for much better conversation continuity.
Human-level fluidity
In the live demo during the system's announcement, the system is seen engaging in an incredibly fluid way. When asked for advice about calming nerves, the system suggests trying some breathing techniques. In response, the user asks ChatGPT to listen to him trying it out, doing some comically over the top fast breath sounds. ChatGPT responds quickly to tell him to slow down, saying "you're not a vacuum cleaner!"
What's notable about the demonstrations is how ChatGPT 4o expresses emotion through the voice rather than the monotone level of old. This was demonstrated when it was asked to tell a story. It began by telling it in a very flat way before being interrupted and asked to amplify the emotional aspect of the story telling. ChatGPT responded by increasing the emotion, but was interrupted again to be asked to really ramp up the emotional style to the max. The result was that the AI told the story in a way that you might expect a Shakesperian thespian to have delivered it.
The capabilities didn't end there. In a surprising twist, the system was asked to sing the story instead. Incredible stuff! That demonstration also highlighted another key feature of 4o, and that's the ability to interrupt it mid-response, creating a flow to conversation that hasn't been possible until now.
So far, so Star Trek. Voice interaction is one thing, but one of the most incredible aspects of the demonstration was the way in which video can be combined with the voice interaction. This was first demonstrated in the live stream by showing the AI an equation being written down on a piece of paper. The user asked the system not to tell him the answer, but to guide him through how to solve it. ChatGPT responded just like a good teacher, asking the user what he thinks he should do and then giving hints as to how he could best go about getting the answer.
Demonstrations like this beg the question of whether ChatGPT 4o could be used as a sports coach, spotting good form and giving suggestions on how to improve, among many other things.
Environmental observation
In one demo on the OpenAI website, two ChatGPT 4o systems are asked to have a conversation with each other. One system has the camera engaged, while the other asks it what it can see. In addition to describing the environment and what the user is wearing, it also picked up on someone appearing behind him doing bunny ears.
In another impressive demonstration, the desktop app is used to paste some code into the ChatGPT 4o voice module. When asked what the code does, the system responded immediately with a full description of what the code did and its purpose.
Now, this has barely scratched the surface of what the new system can do, but it shouldn't be underestimated what a milestone ChatGPT 4o represents. Until now, one of the most naturally responsive AIs was Pi, although there was still a delay between responses, making truly natural conversation difficult. ChatGPT 4o overcomes that obstacle, while also adding emotional responses, detection, and the ability to observe the environment.
In other words, we have moved beyond science fiction. Think about the computer on the Starship Enterprise, and the types of responses it gives. ChatGPT 4o has reached a point in terms of interaction that surpasses that of science fiction of the past. In fact, we are fast approaching the style of interaction seen in Blade Runner 2049, albeit without the holographic element.
Such interactivity doesn't come without risks, however, with a Google Deepmind research paper suggesting that AI chatbots that respond much more like humans could not only become addictive, but also persuasive. In fact, in one instance, a British man recently pleaded guilty to breaching security at Windsor Castle, claiming that an AI system had encouraged him to do so.
With lengthy interactions and AI systems that will become pretty much indistinguishable from humans in terms of responsiveness and emotional response, it's easy to see how vulnerable people could be at risk. A risk that hasn't been lost on OpenAI, which states on its website, "GPT-4o has also undergone extensive external red teaming with 70+ external experts in domains such as social psychology, bias and fairness, and misinformation to identify risks that are introduced or amplified by the newly added modalities."
It is clear that adding much faster response times to ChatGPT has changed the game. In fact the faster response time is a key factor in how OpenAI is rolling out the capabilities of ChatGPT 4o to all users, including those with free accounts. The company stated that its aim is to make tools like this available to everyone. To paraphrase a way that one of my favourite magazines back in the 90s used to sign off on every editorial: the future it seems, is here.
Tags: Technology AI
Comments