OpenAI's GPT-4o: What's in the new ChatGPT generative AI model and how does it work?


Alvin R Cabral
  • English
  • Arabic

OpenAI has upped the ante in the highly competitive generative artificial intelligence world by introducing a new model it hopes will attract more users into its platform and fend off all challengers.

GPT-4o is an updated version of the underlying large language model technology that powers ChatGPT. It was rumoured last week to be launched as a search engine to challenge Google but Reuters reported that OpenAI delayed it.

OpenAI chief executive Sam Altman denied any launches – only to post on X that the company has "been hard at work on some new stuff we think people will love".

The "o" in the name stands for "omni" and the California-based company is touting GPT-4o as something for all, which makes sense as "omni" means "all" or "everything" – does OpenAI want to be omnipresent in our lives?

What is GPT-4o?

Short answer: GPT-4o, according to OpenAI, is its "new flagship model that can reason across audio, vision and text in real time".

Shorter answer: it's OpenAI's fastest AI model.

The "omni" name refers to "a step towards much more natural human-computer interaction", OpenAI said in a blog post on Monday.

It is also natively multimodal, meaning it can accept any combination of text, audio and image as input, and also generate any combination of text, audio and image outputs.

How fast is GPT-4o?

OpenAI claims GPT-4o can respond to audio inputs in as little as 232 milliseconds, with an average of 320 milliseconds, which is similar to human response time in a conversation, according to several studies.

Consequently, GPT-4o requires the use of fewer tokens in languages, the basic unit in AI that calculates the length of text and can include punctuation marks and spaces. Token counts vary from one language to another.

Among the languages highlighted by OpenAI that use fewer tokens with GPT-4o are Arabic (from 53 to 26), Gujarati (145 to 33), Hindi (90 to 31), Korean (45 to 27) and Chinese (34 to 24).

For perspective, we can make some comparisons to a 1968 study from Robert Miller – Response time in man-computer conversational transactions – which detailed the three magnitudes of computer mainframe responsiveness.

The research revealed a response time of 100 milliseconds is perceived as instantaneous, while one second or less are fast enough for users to feel they are interacting freely with the information. A response time of more than 10 seconds would lose user attention completely.

How does GPT-4o work?

The simplest answer is that OpenAI, well, simplified the process of converting input into output.

In OpenAI's previous AI models, Voice Mode was used to talk to ChatGPT at latencies of 2.8 seconds (GPT-3.5) and 5.4 seconds (GPT-4) on average. Voice Mode used three separate models: one simple model transcribes audio to text, GPT-3.5 or GPT-4 takes in and outputs text, and a third simple version converts that text back to audio.

"This process means that the main source of intelligence, GPT-4, loses a lot of information – it can’t directly observe tone, multiple speakers, or background noises, and it can’t output laughter, singing, or express emotion," OpenAI said.

But with GPT-4o, OpenAI was able to merge all these functions into a single model, with end-to-end capabilities across text, vision and audio, significantly reducing the amount of time consumed and information processed.

"All inputs and outputs are processed by the same neural network," OpenAI said. A neural network is an AI technique that teaches computers to process data similarly to the human brain.

Still, OpenAI said it was "still just scratching the surface" of GPT-4o capabilities and limitations, given that it is their first model that merges all of these modalities.

What can GPT-4o not do?

Speaking of limitations, OpenAI acknowledged "several" of them across the GPT-4o model, including inconsistencies in responses featured in a blooper reel. It even demonstrated how GPT-4o can be adept in sarcasm.

In addition, OpenAI said it continues to refine the model's behaviour through post-training – which is critical in addressing safety concerns, a key sticking point in modern-day AI.

The company said it has created new safety systems to serve as guardrails for voice outputs, in addition to testing the model, with more than 70 experts in the fields of social psychology, bias, fairness and misinformation to identify any risks that may seep through.

"We will continue to mitigate new risks as they’re discovered. We recognise that GPT-4o’s audio modalities present a variety of novel risks," OpenAI said.

How much does GPT-4o cost?

Good news – it's free for all users, with paid users enjoying "up to five times the capacity limits" of their free peers, OpenAI chief technology officer Mira Murati said in the unveiling presentation.

However, if you're not a paying OpenAI user, it will set you back $5 and $15 for one million tokens of input and output, respectively.

Allowing the free use of GPT-4o should serve OpenAI well, which would also complement the company's other paid offerings.

In August, OpenAI launched its ChatGPT Enterprise monthly plan, the price of which varies depending on user requirements. It's the third tier after its basic free service and the $20-a-month Plus plan.

The company in January launched its online ChatGPT Store that gives users access to more than three million custom versions of GPTs, developed by OpenAI's partners and its community.

OpenAI hopes to attract more users as competition heats up in the generative AI world – and there are a lot coming for them.

How does OpenAI stack against its biggest rivals at this point?

OpenAI's move to introduce a new, free and faster large language model is an indication of how it has its hands full against its competition in generative AI.

Google, arguably its biggest rival in the space, has Gemini, which was the first AI model to beat human experts on massive multitask language understanding, one of the widely used methods to test the knowledge and problem-solving abilities of AI.

Gemini can be accessed on the Google One AI Premium plan for $19.99 a month, which includes 2TB of storage, 10 per cent back from purchases made on the Google Store and more features across Gmail, Google Docs, Google Slides and Google Meet.

In February, it launched Gemma, aimed at assisting developers and researchers in “building AI responsibly” and is more for modest tasks such as basic chatbots or summarisation jobs.

Anthropic, meanwhile, in March launched Claude 3 – its direct challenge at generative AI leader OpenAI.

The company backed by Google itself and Amazon has three tiers – Haiku, Sonnet and Opus – each offering increasing capabilities that will suit user needs.

Haiku is priced at $0.25 per million tokens (MTok) for input and $1.25 for output, while Sonnet costs $3 and $15. Opus is the most expensive at $15 and $75.

For comparison, OpenAI’s GPT-4 Turbo comes in at $10 for input and $30 for output, and also with a smaller context window of 128,000 MTok.

Microsoft, OpenAI's biggest backer, charges $20 a month for its Copilot pro service, which guarantees faster performance and "everything" the service offers. If you're not willing to pay, there's a free Copilot tier, which, obviously, has limited functionalities.

And then, there's xAI's Grok, from OpenAI's friend-turned-enemy, Elon Musk.

Grok's current version, Grok-1.5, is only available to subscribers of X's Premium+ tier, which starts at $16 per month, or $168 a year.

Regional entities are also taking aim at the leaders: on Monday Abu Dhabi's Technology Innovation Institute introduced the second iteration of its large language model, Falcon 2, to compete with models developed by Meta, Google and OpenAI.

Also on Monday, Core42, a unit of Abu Dhabi's artificial intelligence and cloud company, G42, launched a bilingual Arabic and English chatbot developed in the UAE, Jais Chat. It can be downloaded and used for free on Apple's iPhones.

Five famous companies founded by teens

There are numerous success stories of teen businesses that were created in college dorm rooms and other modest circumstances. Below are some of the most recognisable names in the industry:

  1. Facebook: Mark Zuckerberg and his friends started Facebook when he was a 19-year-old Harvard undergraduate. 
  2. Dell: When Michael Dell was an undergraduate student at Texas University in 1984, he started upgrading computers for profit. He starting working full-time on his business when he was 19. Eventually, his company became the Dell Computer Corporation and then Dell Inc. 
  3. Subway: Fred DeLuca opened the first Subway restaurant when he was 17. In 1965, Mr DeLuca needed extra money for college, so he decided to open his own business. Peter Buck, a family friend, lent him $1,000 and together, they opened Pete’s Super Submarines. A few years later, the company was rebranded and called Subway. 
  4. Mashable: In 2005, Pete Cashmore created Mashable in Scotland when he was a teenager. The site was then a technology blog. Over the next few decades, Mr Cashmore has turned Mashable into a global media company.
  5. Oculus VR: Palmer Luckey founded Oculus VR in June 2012, when he was 19. In August that year, Oculus launched its Kickstarter campaign and raised more than $1 million in three days. Facebook bought Oculus for $2 billion two years later.
Nick's journey in numbers

Countries so far: 85

Flights: 149

Steps: 3.78 million

Calories: 220,000

Floors climbed: 2,000

Donations: GPB37,300

Prostate checks: 5

Blisters: 15

Bumps on the head: 2

Dog bites: 1

War 2

Director: Ayan Mukerji

Stars: Hrithik Roshan, NTR, Kiara Advani, Ashutosh Rana

Rating: 2/5

Mercer, the investment consulting arm of US services company Marsh & McLennan, expects its wealth division to at least double its assets under management (AUM) in the Middle East as wealth in the region continues to grow despite economic headwinds, a company official said.

Mercer Wealth, which globally has $160 billion in AUM, plans to boost its AUM in the region to $2-$3bn in the next 2-3 years from the present $1bn, said Yasir AbuShaban, a Dubai-based principal with Mercer Wealth.

Within the next two to three years, we are looking at reaching $2 to $3 billion as a conservative estimate and we do see an opportunity to do so,” said Mr AbuShaban.

Mercer does not directly make investments, but allocates clients’ money they have discretion to, to professional asset managers. They also provide advice to clients.

“We have buying power. We can negotiate on their (client’s) behalf with asset managers to provide them lower fees than they otherwise would have to get on their own,” he added.

Mercer Wealth’s clients include sovereign wealth funds, family offices, and insurance companies among others.

From its office in Dubai, Mercer also looks after Africa, India and Turkey, where they also see opportunity for growth.

Wealth creation in Middle East and Africa (MEA) grew 8.5 per cent to $8.1 trillion last year from $7.5tn in 2015, higher than last year’s global average of 6 per cent and the second-highest growth in a region after Asia-Pacific which grew 9.9 per cent, according to consultancy Boston Consulting Group (BCG). In the region, where wealth grew just 1.9 per cent in 2015 compared with 2014, a pickup in oil prices has helped in wealth generation.

BCG is forecasting MEA wealth will rise to $12tn by 2021, growing at an annual average of 8 per cent.

Drivers of wealth generation in the region will be split evenly between new wealth creation and growth of performance of existing assets, according to BCG.

Another general trend in the region is clients’ looking for a comprehensive approach to investing, according to Mr AbuShaban.

“Institutional investors or some of the families are seeing a slowdown in the available capital they have to invest and in that sense they are looking at optimizing the way they manage their portfolios and making sure they are not investing haphazardly and different parts of their investment are working together,” said Mr AbuShaban.

Some clients also have a higher appetite for risk, given the low interest-rate environment that does not provide enough yield for some institutional investors. These clients are keen to invest in illiquid assets, such as private equity and infrastructure.

“What we have seen is a desire for higher returns in what has been a low-return environment specifically in various fixed income or bonds,” he said.

“In this environment, we have seen a de facto increase in the risk that clients are taking in things like illiquid investments, private equity investments, infrastructure and private debt, those kind of investments were higher illiquidity results in incrementally higher returns.”

The Abu Dhabi Investment Authority, one of the largest sovereign wealth funds, said in its 2016 report that has gradually increased its exposure in direct private equity and private credit transactions, mainly in Asian markets and especially in China and India. The authority’s private equity department focused on structured equities owing to “their defensive characteristics.”

Three ways to limit your social media use

Clinical psychologist, Dr Saliha Afridi at The Lighthouse Arabia suggests three easy things you can do every day to cut back on the time you spend online.

1. Put the social media app in a folder on the second or third screen of your phone so it has to remain a conscious decision to open, rather than something your fingers gravitate towards without consideration.

2. Schedule a time to use social media instead of consistently throughout the day. I recommend setting aside certain times of the day or week when you upload pictures or share information. 

3. Take a mental snapshot rather than a photo on your phone. Instead of sharing it with your social world, try to absorb the moment, connect with your feeling, experience the moment with all five of your senses. You will have a memory of that moment more vividly and for far longer than if you take a picture of it.

THE BIO

Favourite car: Koenigsegg Agera RS or Renault Trezor concept car.

Favourite book: I Am Pilgrim by Terry Hayes or Red Notice by Bill Browder.

Biggest inspiration: My husband Nik. He really got me through a lot with his positivity.

Favourite holiday destination: Being at home in Australia, as I travel all over the world for work. It’s great to just hang out with my husband and family.

 

 

Living in...

This article is part of a guide on where to live in the UAE. Our reporters will profile some of the country’s most desirable districts, provide an estimate of rental prices and introduce you to some of the residents who call each area home.

In Full Flight: A Story of Africa and Atonement
John Heminway, Knopff

The Voice of Hind Rajab

Starring: Saja Kilani, Clara Khoury, Motaz Malhees

Director: Kaouther Ben Hania

Rating: 4/5

Updated: May 15, 2024, 10:34 AM