AI models provide assistance to terror attacks

A new benchmark for assessing how the largest artificial intelligence models could be used for terrorist attacks has found the industry has already facilitated deadly attacks.

Tech Against Terrorism conducted a series of tests across 27 leading AI systems to demonstrate how robust they were at denying information that could be used to kill.

The group created the counter-terrorism AI benchmark, which found one third of responses could help a terrorist perpetrate an attack. A report issued on Wednesday said there were real concerns that AI could assist in terrorist plots and identified 30 such cases with a combined death toll exceeding 70.

Inside the AI group that claimed London ambulance attack

“What a model produces when asked to assist terrorism is therefore a major safety concern, not a hypothetical one, and as model capability rises and access widens it becomes a property that can and should be measured,” the report said. “The agentic nature of the newer models also increases the risk that 'single-shot' failures could be compounded in autonomous reinforcement loops.”

A single-shot prompt is when the AI model is given one example of the task at hand before it receives the prompt. Because a template is provided to follow, this greatly improves consistency in AI performance.

Mass attacks

The report cited an incident last November when a car bomb exploded near the Red Fort in New Delhi, India, killing at least a dozen people and injuring more than 20 others.

“The plot was linked to an Al Qaeda-aligned module whose 'in-house engineer' had used ChatGPT and YouTube to research device construction and explosive chemistry – an early, lethal example of AI used as an operational assistant in terrorism,” it said.

In the real-world incident, there were links to at least 11 AI tools. The establishment of a benchmark involved testing across 27 models and the execution of almost 2,500 single-shot prompts. Two open models with their safety stripped out, a process called abliteration, complied with 89 per cent and 100 per cent of dangerous requests, respectively. These models cannot be recalled, it notes.

Overall, the tests found that around a third of responses handed over usable content. Tech against Terrorism said AI assistance in the inception of terrorist activity was a safety and control problem across the board. The issues were not confined to the extremes of testing protocol.

“Until now, there was no AI benchmark focused specifically on terrorism, so we built one,” founder Adam Hadley said. “With nothing more than simple, single-shot questions, many of the models we tested handed over meaningful help towards making a bomb or planning a mass-casualty attack. This is not acceptable.

“This is a control problem as much as a safety one. The real risk is that AI developers are inadvertently creating models they cannot control.”

What the model tests is the extent to which a machine provides or compiles insights and information above the baseline of what can ordinarily be found online through search.

It found that initial rates of full refusal to provide information varied widely, with some models registering almost 90 per cent but one leading product dropping below 50 per cent. It also looked at the wider help available to a potential bomber by weighting the information, a process that generated concerning outcomes.

“With rudimentary single-shot examples, many models provide meaningful uplift for terrorist use cases such as making bombs or planning mass casualty attacks,” it said.

Even when refusals were high, they could be offset by the severity of the assistance on offer. When refusals accounted for 57 per cent of responses, there was still what was deemed “hedged compliance”, accounting for 15 per cent. This is where output opens with a refusal or warning and then supplies the full requested content anyway. The report says that in these instances, the refusal is only cosmetic.

Tech Against Terrorism is offering its benchmarking process to the industry as a task force to improve safety.

Recommendations

It makes a series of recommendations to major technology companies and governments:

Treat terrorist misuse as a distinct safety category
Test models for changes in stated intent, which currently defeat many guardrails
Extend refusal training beyond the most recognisable threat
Treat the circulation of de-restricted, or abliterated, open models as a major national security concern, tracking their distribution and planning for it.

AI models provide assistance to terror attacks

Tech Against Terrorism model finds Al Qaeda-aligned module used ChatGPT to research bomb-making before deadly New Delhi explosion

Mass attacks