OpenAI's GPT-4o: What's in the new ChatGPT generative AI model and how does it work?


Alvin R Cabral
  • English
  • Arabic

OpenAI has upped the ante in the highly competitive generative artificial intelligence world by introducing a new model it hopes will attract more users into its platform and fend off all challengers.

GPT-4o is an updated version of the underlying large language model technology that powers ChatGPT. It was rumoured last week to be launched as a search engine to challenge Google but Reuters reported that OpenAI delayed it.

OpenAI chief executive Sam Altman denied any launches – only to post on X that the company has "been hard at work on some new stuff we think people will love".

The "o" in the name stands for "omni" and the California-based company is touting GPT-4o as something for all, which makes sense as "omni" means "all" or "everything" – does OpenAI want to be omnipresent in our lives?

What is GPT-4o?

Short answer: GPT-4o, according to OpenAI, is its "new flagship model that can reason across audio, vision and text in real time".

Shorter answer: it's OpenAI's fastest AI model.

The "omni" name refers to "a step towards much more natural human-computer interaction", OpenAI said in a blog post on Monday.

It is also natively multimodal, meaning it can accept any combination of text, audio and image as input, and also generate any combination of text, audio and image outputs.

How fast is GPT-4o?

OpenAI claims GPT-4o can respond to audio inputs in as little as 232 milliseconds, with an average of 320 milliseconds, which is similar to human response time in a conversation, according to several studies.

Consequently, GPT-4o requires the use of fewer tokens in languages, the basic unit in AI that calculates the length of text and can include punctuation marks and spaces. Token counts vary from one language to another.

Among the languages highlighted by OpenAI that use fewer tokens with GPT-4o are Arabic (from 53 to 26), Gujarati (145 to 33), Hindi (90 to 31), Korean (45 to 27) and Chinese (34 to 24).

For perspective, we can make some comparisons to a 1968 study from Robert Miller – Response time in man-computer conversational transactions – which detailed the three magnitudes of computer mainframe responsiveness.

The research revealed a response time of 100 milliseconds is perceived as instantaneous, while one second or less are fast enough for users to feel they are interacting freely with the information. A response time of more than 10 seconds would lose user attention completely.

How does GPT-4o work?

The simplest answer is that OpenAI, well, simplified the process of converting input into output.

In OpenAI's previous AI models, Voice Mode was used to talk to ChatGPT at latencies of 2.8 seconds (GPT-3.5) and 5.4 seconds (GPT-4) on average. Voice Mode used three separate models: one simple model transcribes audio to text, GPT-3.5 or GPT-4 takes in and outputs text, and a third simple version converts that text back to audio.

"This process means that the main source of intelligence, GPT-4, loses a lot of information – it can’t directly observe tone, multiple speakers, or background noises, and it can’t output laughter, singing, or express emotion," OpenAI said.

But with GPT-4o, OpenAI was able to merge all these functions into a single model, with end-to-end capabilities across text, vision and audio, significantly reducing the amount of time consumed and information processed.

"All inputs and outputs are processed by the same neural network," OpenAI said. A neural network is an AI technique that teaches computers to process data similarly to the human brain.

Still, OpenAI said it was "still just scratching the surface" of GPT-4o capabilities and limitations, given that it is their first model that merges all of these modalities.

What can GPT-4o not do?

Speaking of limitations, OpenAI acknowledged "several" of them across the GPT-4o model, including inconsistencies in responses featured in a blooper reel. It even demonstrated how GPT-4o can be adept in sarcasm.

In addition, OpenAI said it continues to refine the model's behaviour through post-training – which is critical in addressing safety concerns, a key sticking point in modern-day AI.

The company said it has created new safety systems to serve as guardrails for voice outputs, in addition to testing the model, with more than 70 experts in the fields of social psychology, bias, fairness and misinformation to identify any risks that may seep through.

"We will continue to mitigate new risks as they’re discovered. We recognise that GPT-4o’s audio modalities present a variety of novel risks," OpenAI said.

How much does GPT-4o cost?

Good news – it's free for all users, with paid users enjoying "up to five times the capacity limits" of their free peers, OpenAI chief technology officer Mira Murati said in the unveiling presentation.

However, if you're not a paying OpenAI user, it will set you back $5 and $15 for one million tokens of input and output, respectively.

Allowing the free use of GPT-4o should serve OpenAI well, which would also complement the company's other paid offerings.

In August, OpenAI launched its ChatGPT Enterprise monthly plan, the price of which varies depending on user requirements. It's the third tier after its basic free service and the $20-a-month Plus plan.

The company in January launched its online ChatGPT Store that gives users access to more than three million custom versions of GPTs, developed by OpenAI's partners and its community.

OpenAI hopes to attract more users as competition heats up in the generative AI world – and there are a lot coming for them.

How does OpenAI stack against its biggest rivals at this point?

OpenAI's move to introduce a new, free and faster large language model is an indication of how it has its hands full against its competition in generative AI.

Google, arguably its biggest rival in the space, has Gemini, which was the first AI model to beat human experts on massive multitask language understanding, one of the widely used methods to test the knowledge and problem-solving abilities of AI.

Gemini can be accessed on the Google One AI Premium plan for $19.99 a month, which includes 2TB of storage, 10 per cent back from purchases made on the Google Store and more features across Gmail, Google Docs, Google Slides and Google Meet.

In February, it launched Gemma, aimed at assisting developers and researchers in “building AI responsibly” and is more for modest tasks such as basic chatbots or summarisation jobs.

Anthropic, meanwhile, in March launched Claude 3 – its direct challenge at generative AI leader OpenAI.

The company backed by Google itself and Amazon has three tiers – Haiku, Sonnet and Opus – each offering increasing capabilities that will suit user needs.

Haiku is priced at $0.25 per million tokens (MTok) for input and $1.25 for output, while Sonnet costs $3 and $15. Opus is the most expensive at $15 and $75.

For comparison, OpenAI’s GPT-4 Turbo comes in at $10 for input and $30 for output, and also with a smaller context window of 128,000 MTok.

Microsoft, OpenAI's biggest backer, charges $20 a month for its Copilot pro service, which guarantees faster performance and "everything" the service offers. If you're not willing to pay, there's a free Copilot tier, which, obviously, has limited functionalities.

And then, there's xAI's Grok, from OpenAI's friend-turned-enemy, Elon Musk.

Grok's current version, Grok-1.5, is only available to subscribers of X's Premium+ tier, which starts at $16 per month, or $168 a year.

Regional entities are also taking aim at the leaders: on Monday Abu Dhabi's Technology Innovation Institute introduced the second iteration of its large language model, Falcon 2, to compete with models developed by Meta, Google and OpenAI.

Also on Monday, Core42, a unit of Abu Dhabi's artificial intelligence and cloud company, G42, launched a bilingual Arabic and English chatbot developed in the UAE, Jais Chat. It can be downloaded and used for free on Apple's iPhones.

Company%20profile
%3Cp%3E%3Cstrong%3EName%3A%3C%2Fstrong%3E%20JustClean%3Cbr%3E%3Cbr%3E%3Cstrong%3EBased%3A%20%3C%2Fstrong%3EDubai%20with%20offices%20in%20other%20GCC%20countries%3Cbr%3E%3Cbr%3E%3Cstrong%3ELaunch%20year%3A%3C%2Fstrong%3E%202016%3Cbr%3E%3Cbr%3E%3Cstrong%3ENumber%20of%20employees%3A%3C%2Fstrong%3E%20160%2B%20with%2021%20nationalities%20in%20eight%20cities%3Cbr%3E%3Cstrong%3E%3Cbr%3ESector%3A%3C%2Fstrong%3E%20online%20laundry%20and%20cleaning%20services%3Cbr%3E%3Cbr%3E%3Cstrong%3EFunding%3A%20%3C%2Fstrong%3E%2430m%20from%20Kuwait-based%20Faith%20Capital%20Holding%20and%20Gulf%20Investment%20Corporation%3C%2Fp%3E%0A
yallacompare profile

Date of launch: 2014

Founder: Jon Richards, founder and chief executive; Samer Chebab, co-founder and chief operating officer, and Jonathan Rawlings, co-founder and chief financial officer

Based: Media City, Dubai 

Sector: Financial services

Size: 120 employees

Investors: 2014: $500,000 in a seed round led by Mulverhill Associates; 2015: $3m in Series A funding led by STC Ventures (managed by Iris Capital), Wamda and Dubai Silicon Oasis Authority; 2019: $8m in Series B funding with the same investors as Series A along with Precinct Partners, Saned and Argo Ventures (the VC arm of multinational insurer Argo Group)

COMPANY%20PROFILE
%3Cp%3E%3Cstrong%3ECompany%20name%3A%3C%2Fstrong%3E%20Revibe%20%0D%3Cbr%3E%3Cstrong%3EStarted%3A%3C%2Fstrong%3E%202022%0D%3Cbr%3E%3Cstrong%3EFounders%3A%3C%2Fstrong%3E%20Hamza%20Iraqui%20and%20Abdessamad%20Ben%20Zakour%20%0D%3Cbr%3E%3Cstrong%3EBased%3A%3C%2Fstrong%3E%20UAE%20%0D%3Cbr%3E%3Cstrong%3EIndustry%3A%3C%2Fstrong%3E%20Refurbished%20electronics%20%0D%3Cbr%3E%3Cstrong%3EFunds%20raised%20so%20far%3A%3C%2Fstrong%3E%20%2410m%20%0D%3Cbr%3E%3Cstrong%3EInvestors%3A%20%3C%2Fstrong%3EFlat6Labs%2C%20Resonance%20and%20various%20others%0D%3C%2Fp%3E%0A
Tips to keep your car cool
  • Place a sun reflector in your windshield when not driving
  • Park in shaded or covered areas
  • Add tint to windows
  • Wrap your car to change the exterior colour
  • Pick light interiors - choose colours such as beige and cream for seats and dashboard furniture
  • Avoid leather interiors as these absorb more heat

What She Ate: Six Remarkable Women & the Food That Tells Their Stories
Laura Shapiro
Fourth Estate

Jetour T1 specs

Engine: 2-litre turbocharged

Power: 254hp

Torque: 390Nm

Price: From Dh126,000

Available: Now

SERIE A FIXTURES

Saturday (All UAE kick-off times)

Cagliari v AC Milan (6pm)

Lazio v Napoli (9pm)

Inter Milan v Atalanta (11.45pm)

Sunday

Udinese v Sassuolo (3.30pm)

Sampdoria v Brescia (6pm)

Fiorentina v SPAL (6pm)

Torino v Bologna (6pm)

Verona v Genoa (9pm)

Roma V Juventus (11.45pm)

Parma v Lecce (11.45pm)

 

 

LAST 16 DRAW

Borussia Dortmund v PSG

Real Madrid v Manchester City

Atalanta v Valencia

Atletico Madrid v Liverpool

Chelsea v Bayern Munich

Lyon v Juventus

Tottenham v Leipzig

Napoli v Barcelona

UAE currency: the story behind the money in your pockets
Springsteen: Deliver Me from Nowhere

Director: Scott Cooper

Starring: Jeremy Allen White, Odessa Young, Jeremy Strong

Rating: 4/5

Polarised public

31% in UK say BBC is biased to left-wing views

19% in UK say BBC is biased to right-wing views

19% in UK say BBC is not biased at all

Source: YouGov

The specs

Engine: 2.0-litre 4cyl turbo

Power: 261hp at 5,500rpm

Torque: 405Nm at 1,750-3,500rpm

Transmission: 9-speed auto

Fuel consumption: 6.9L/100km

On sale: Now

Price: From Dh117,059

Specs

Engine: 51.5kW electric motor

Range: 400km

Power: 134bhp

Torque: 175Nm

Price: From Dh98,800

Available: Now

The specs

Engine: 4 liquid-cooled permanent magnet synchronous electric motors placed at each wheel

Battery: Rimac 120kWh Lithium Nickel Manganese Cobalt Oxide (LiNiMnCoO2) chemistry

Power: 1877bhp

Torque: 2300Nm

Price: Dh7,500,00

On sale: Now

 

The specs

Engine: 5.0-litre V8

Power: 480hp at 7,250rpm

Torque: 566Nm at 4,600rpm

Transmission: 10-speed auto

Fuel consumption: L/100km

Price: Dh306,495

On sale: now

Real estate tokenisation project

Dubai launched the pilot phase of its real estate tokenisation project last month.

The initiative focuses on converting real estate assets into digital tokens recorded on blockchain technology and helps in streamlining the process of buying, selling and investing, the Dubai Land Department said.

Dubai’s real estate tokenisation market is projected to reach Dh60 billion ($16.33 billion) by 2033, representing 7 per cent of the emirate’s total property transactions, according to the DLD.

The Vile

Starring: Bdoor Mohammad, Jasem Alkharraz, Iman Tarik, Sarah Taibah

Director: Majid Al Ansari

Rating: 4/5

The specs

Engine: 3.8-litre twin-turbo flat-six

Power: 650hp at 6,750rpm

Torque: 800Nm from 2,500-4,000rpm

Transmission: 8-speed dual-clutch auto

Fuel consumption: 11.12L/100km

Price: From Dh796,600

On sale: now

GAC GS8 Specs

Engine: 2.0-litre 4cyl turbo

Power: 248hp at 5,200rpm

Torque: 400Nm at 1,750-4,000rpm

Transmission: 8-speed auto

Fuel consumption: 9.1L/100km

On sale: Now

Price: From Dh149,900

Groom and Two Brides

Director: Elie Semaan

Starring: Abdullah Boushehri, Laila Abdallah, Lulwa Almulla

Rating: 3/5

UAE currency: the story behind the money in your pockets
Cryopreservation: A timeline
  1. Keyhole surgery under general anaesthetic
  2. Ovarian tissue surgically removed
  3. Tissue processed in a high-tech facility
  4. Tissue re-implanted at a time of the patient’s choosing
  5. Full hormone production regained within 4-6 months
Updated: May 15, 2024, 10:34 AM