AI needs to understand the world, not just describe it

Much of today’s artificial intelligence boom has been built by a simple assumption: that scaling models, data and computing resources will continue to make systems more capable. That approach has taken us remarkably far. But it is becoming increasingly clear that scaling language models alone may not be enough for the next phase of AI.

As AI moves beyond chat interfaces and into real-world systems, the limits of this paradigm are starting to show.

Language models are trained primarily to predict patterns in language rather than to build a reliable model of how the world works. They can describe reality with impressive fluency, but that is not the same as robustly modelling it. That distinction matters.

A language model can explain what typically happens when a glass falls off a table. But that does not mean it has a stable internal model of gravity, motion and physical interaction that it can reliably use for grounded prediction or action. It generates an answer based on patterns in its training data.

This is also why even advanced systems sometimes produce confident but incorrect outputs – what we call hallucinations. These systems are built to produce answers that sound right, not to test them against how the world actually behaves.

For many use cases, that capability is sufficient. But if AI is going to move beyond generative language interfaces and into real-world intelligent systems, it will not be enough. Artificial intelligence is increasingly moving into environments where digital systems interact with physical infrastructure, complex operational processes and dynamic real-world conditions. Robotics, autonomous transport and other agentic operational applications require machines that can autonomously interpret changing conditions and anticipate the consequences of actions.

01:18

AI regulation is important because different companies are taking different approaches in its development, says Google DeepMind’s Lila Ibrahim

In those environments, AI must do more than generate plausible text. It must understand how systems behave well enough to simulate the consequences of possible actions before taking them. This is where a growing body of research around world models is reshaping the conversation about the future of artificial intelligence.

World models represent a different approach to machine intelligence. Instead of learning patterns in language alone, these systems aim to model how environments evolve over time. They capture cause and effect – how actions shape outcomes and how situations unfold.

Put simply, they shift the question from “What word comes next in a sentence?” to “What happens next in the real world?”

Consider the simple task of packing groceries into a plastic shopping bag. A typical language model may struggle to reason through the physical constraints involved – for example, that even if eggs and bread are scanned first, heavier items like milk, fabric softener and jars of pasta sauce should go into the bag first to avoid crushing more delicate goods. It may also fail to account for practical limits, such as the need for more than one bag to bear the weight.

A robot with a world model, however, doesn’t rely on verbal reasoning alone. It would use learned dynamics to anticipate that a plastic bag can tear under excessive weight, that eggs are fragile and that weight distribution matters – without engineers having to hand-specify every one of those rules.

Now imagine applying this same capability to more complex domains like surgery, autonomous driving or disaster response – where understanding how the physical world behaves, and anticipating the consequences of actions, is critical.

That shift from correlation to causation could define the next phase of AI.

The real breakthrough will come from systems that can predict reality, simulate it and act within it

This does not mean language models stop mattering. It means the most credible path forward is to combine them with grounded world models, memory and planning.

Humans do not learn primarily through language. We learn by interacting with our environment, building mental models of how things behave, and using those models to anticipate outcomes before we act. World-model architectures attempt to replicate this capability in artificial systems.

Yann LeCun, Turing Award-winning AI researcher and former chief AI scientist at Meta, has long argued that scaling language models alone will not lead to human-level intelligence. In his view, intelligence depends on the ability to model and predict how the world works, not just to describe it.

The question is no longer theoretical. Mr LeCun and Prof Eric Xing, president of the Mohamed bin Zayed University of Artificial Intelligence in Abu Dhabi, are publicly debating the foundations of world-model architecture – highlighting a growing divergence in how the field believes intelligence should be built.

Mr LeCun’s approach focuses on learning internal representations that allow systems to reason about the world. Prof Xing, while aligned on the importance of world models, has proposed a different direction. His PAN (Physical, Agentic, Nested) framework argues that these systems must remain grounded in real-world observation and multimodal experience, ensuring that internal reasoning does not drift away from reality.

The exchange points to a deeper divide: not whether world models matter, but whether intelligence will emerge primarily from internal simulation, or from systems that continuously anchor their reasoning in the external world.

At the same time, investment is beginning to follow this shift. Advanced Machine Intelligence (AMI), a company co-founded by Mr LeCun, recently raised more than $1 billion to pursue world-model-based AI systems. Presight was an investor in that funding round through our Presight-Shorooq Fund I.

Our investment in AMI reflects a simple view: if the next phase of AI depends on systems that can reason about the real world, then backing the teams building those systems is a bet on where AI is heading. For organisations working to embed intelligence into large-scale operational systems, the shift from language prediction to world understanding will be foundational.

Scaling language models has taken us a long way. But the real breakthrough will come from systems that do more than describe reality – they will need to predict it, simulate it and act within it.