AI models must complement, not cannibalise, knowledge creation

Large language models, or LLMs, like ChatGPT are effective because they are trained on high-quality human-generated content. However, there is a concern that they also undermine the incentives humans have to create original material, meaning that they might inadvertently be killing their feedstock.

Fears of such a death cycle are slightly exaggerated. But in the long term, societies should consider the steps that might be necessary for stopping LLMs from killing the goose that lays the golden eggs.

When used correctly, LLMs are a fantastic tool for boosting productivity. First and foremost, they function as immense fonts of knowledge that shrink the time required to retrieve and organise information by many orders. They are also highly effective at synthesising and summarising large volumes of information, while also functioning as competent authors, allowing white-collar workers to allocate their scarce time to other issues that still require a large dose of human attention.

28:44

Could AI music help to close a gap in culture?

This is how a team of workers in 2026 supported by LLMs can produce significantly more than a comparable group working in 2006 unassisted by AI.

The problem with LLMs, however, is their impact on the incentives for knowledge creation. LLMs’ capacity to aggregate and synthesise knowledge in a matter of seconds functions as an effective substitute for many of the activities through which knowledge creators traditionally monetise their work. As users increasingly rely on AI-generated summaries rather than engaging directly with books, articles, research papers and other primary sources, the economic rewards accruing to those who produce original content are diminished.

This creates what economists refer to as a “negative externality”: while each individual user benefits from the convenience and efficiency offered by LLMs, the collective result may be a reduction in society’s investment in generating new knowledge.

When it comes to the use of existing knowledge for generic, ever-present problems – as in the case of routine legal drafting, customer service interactions or teaching arithmetic – this does not constitute a substantive loss. However, many aspects of daily life require novel insights, such as frontier medical applications or cutting-edge military weapons. In these areas, LLMs risk slowing the flow of fresh ideas, discoveries and creative works upon which both human progress and future generations of AI systems ultimately depend.

This possibility has already gone from hypothetical to realised: websites across the internet that provide information services and rely on visitor traffic for their commercial viability are suffering significant contractions in their income, with Google’s “AI Overview” summary box that appears at the top of a traditional Google search being a key instigator.

There are, however, two flaws in the argument that should prevent those overly concerned about this challenge from panicking for the time being.

First, LLM developers are aware that the knowledge base for new issues is growing at a slower pace than before due to the proliferation of AI solutions, and so they are now willing to pay for new data to train their LLMs, partially offsetting the reduced incentive to create new knowledge. Second, a case can be made that it is mediocre-quality knowledge production that is being squeezed, while incentives to produce the high-quality variety remain large and are potentially growing.

Nevertheless, it would be imprudent to passively wait for the LLM revolution to conclude when policymakers and the industry itself have the capacity to actively shape its outcomes. Some government interventions merit consideration, with one of the most salient being to expand support for knowledge production through continued funding for universities, libraries, museums and archives.

Potentially more important would be the strengthening of intellectual property rights: creating a system in which humans can continue to freely use existing knowledge to create new knowledge subject to formal attribution, but where LLMs using knowledge have to pay levies that are then funnelled back to knowledge production.

Just as repeated photocopying gradually degrades image quality, repeated training on synthetic data can amplify errors

A valuable supporting intervention is provenance infrastructure: developing cryptographic signatures that accurately verify human authorship. This is especially important for preventing LLMs from “model collapse”, whereby successive generations of AI systems are trained on increasing quantities of AI-generated content rather than original human-created material.

Just as repeated photocopying gradually degrades image quality, repeated training on synthetic data can amplify errors, biases and informational distortions. Reliable provenance mechanisms would help preserve access to authentic human-generated knowledge, ensuring that future AI systems continue to learn from the original source rather than increasingly diluted copies of it.

Equally important is avoiding simplistic interventions that have historically failed in ensuring benign technological progress. These include bans on AI-generated content, mandatory quotas of human-created content, or other broad restrictions on AI use. The economic forces are too strong, and so it is better to try to steer the ship favourably than to devise an anchor heavy enough to stop it, as the latter will just spawn a more powerful engine.

The challenge, therefore, is not to stop the rise of LLMs, but to ensure that they complement rather than cannibalise human knowledge creation. After all, a technology that consumes knowledge faster than society can produce it risks eventually running out of the very resource that made it valuable.