Israel, Gaza and AI machines - is this the automation of war crimes?

For the past decade, side rooms in international law conferences have hosted panel discussions on the introduction of AI software into military toolkits. The use of AI-powered drones in Afghanistan, Pakistan and elsewhere have led to campaigns to ban “killer robots”. All of this was premised on the idea that you need to keep human decision making in the loop as a means of ensuring that – even if technology makes warfare easier – a soldier with moral awareness can ensure that human ethics and international law are still observed. An explosive investigation released on Wednesday by +972 Magazine , an Israeli publication, may come to upend those discussions for years to come. The report, based on interviews with six anonymous Israeli soldiers and intelligence officials, alleges the Israeli military has used AI software to carry out killings of not only suspected militants but also civilians in Gaza on a scale so grand, so purposeful, that it would throw any Israeli army claim of adherence to international law out the window. Among the most shocking elements of the allegations is that the war has not been delegated entirely to AI. Instead there has been plenty of human decision-making involved. But the human decisions were to maximise killing and minimise the “bottleneck” of ethics and the law. To summarise the allegations briefly, the Israeli army has reportedly made use of an in-house AI-based programme called Lavender to identify possible Hamas and Palestinian Islamic Jihad (PIJ) militants from within the Gazan population, and mark them as targets for Israeli air force bombers. In the early weeks of the war, when Palestinian casualties were at their highest , the military “almost completely relied on Lavender”, with the army giving “sweeping approval for officers to adopt Lavender’s kill lists, with no requirement to thoroughly check why the machine made those choices or to examine the raw intelligence data on which they were based”. The raw intelligence data consisted of a number of parameters drawn from Israel’s vast surveillance system in Gaza – including a person’s age, sex, mobile phone usage patterns, patterns of movement, which WhatsApp groups they are in, known contacts and addresses, and others – to collate a rating from 1 to 100 determining the likelihood of the target being a militant. The characteristics of known Hamas and PIJ militants were fed into Lavender to train the software, which would then look for the same characteristics within Gaza’s general population to help build the rating. A high rating would render someone a target for assassination – with the threshold determined by senior officers. Four allegations, in particular, stand out because of their dire implications in international law. First, Lavender was allegedly used primarily to target suspected “junior” (ie, low-ranking) militants. Second, human checks were minimal, with one officer estimating them to last about 20 seconds per target, and mostly just to confirm whether the target was male (Hamas and PIJ do not have women in their ranks). Third, a policy was apparently in place to try to bomb junior targets in their family homes, even if their civilian family members were present, using a system called “Where’s Daddy?” that would alert the military when the target reached the house. The name of the software is particularly malicious, as it implies the vulnerability of a target’s children as collateral damage. +972 ’s report notes that so-called dumb bombs, as opposed to precision weapons, were used in these strikes in spite of the fact that they cause more collateral damage, because precision weapons are too expensive to “waste” on such people. And finally, the threshold for who was considered by the software to be a militant was toggled to cater to “a constant push to generate more targets for assassination”. In other words, if Lavender was not generating enough targets, the rating threshold was allegedly lowered to draw more Gazans – perhaps someone who fulfilled only a few of the criteria – into the kill net. Every time an army seeks to kill someone, customary international law of armed conflict (that is, the established, legally binding practice of what is and is not acceptable in war) applies two tests. The first is distinction – that is, you have to discriminate between what is a civilian and a military target. The second is precaution – you have to take every feasible measure to avoid causing civilian death. That does not mean armies are prohibited from ever killing civilians. They are allowed to do so where necessary and unavoidable , in accordance with a principle called “proportionality”. The exact number of civilians who may be killed in a given military action has never been defined (and any military lawyer would tell you it would be naïve to attempt to do so). But the guiding principle has always, understandably, been to minimise casualties. The greatest number of justifiable civilian deaths is afforded to efforts to kill the highest-value targets, with the number decreasing as the target becomes less important. The general understanding – including within the Israeli military’s own stated procedures – is that killing a foot soldier is not worth a single civilian life. But the Israeli military’s use of Lavender, allegedly, worked in many respects the other way around. In the first weeks of the war, the military’s international law department pre-authorised the deaths of up to 15 civilians, even children, to eliminate any target marked by the AI software – a number that would have been unprecedented in Israeli operational procedure. One officer says the number was toggled up and down over time – up when commanders felt that not enough targets were being hit, and down when there was pressure (presumably from the US) to minimise civilian casualties. Again, the guiding principle of proportionality is to trend towards zero civilian deaths, based on target value – not to modulate the number of acceptable civilian deaths in order to hit a certain quantity of targets. The notion that junior militants were targeted specifically in their homes with mass-casualty weapons (allegedly because this was the method most compatible with the way Israel’s surveillance system in Gaza operates) is particularly egregious. If true, it would be evidence that Israel’s military not only ignored the possibility of civilian casualties, but actually institutionalised killing civilians alongside junior militants in its standard operating procedures. The way in which Lavender was allegedly used also fails the distinction test and international law’s ban on “indiscriminate attacks” on multiple fronts. An indiscriminate attack, as defined in customary law, includes any that is “not directed at a specific military objective” or employs a method or means of combat “of a nature to strike military objectives and civilians … without distinction”. The +972 report paints a vivid picture of a programme that tramples over these rules. This includes not only the use of the “Where’s Daddy?” system to intentionally enmesh civilian homes into kill zones and subsequently drop dumb bombs on them, but also the occasional toggling down of the ratings threshold specifically to render the killing less discriminate. Two of the report’s sources allege that Lavender was partly trained on data collected from Gaza public sector employees – such as civil defence workers like police, fire and rescue personnel – increasing the likelihood of a civilian being given a higher rating. On top of that, the sources allege that before Lavender was deployed, its accuracy in identifying anyone who actually matched the parameters given to it was only 90 per cent; one in 10 people marked did not fit the criteria at all. That was considered an acceptable margin of error. The normal mitigation for that kind of margin goes back to human decision-making; you would expect humans to double-check the target list and ensure that the 10 per cent becomes 0 per cent, or at least as close to that as possible. But the allegation that soldiers routinely only conducted brief checks – mainly to ascertain whether the target was male – would show that not to have been the case. If human soldiers can kill civilians, either intentionally or through error, and machines can kill civilians through margins of error, then does the distinction matter? In theory, the use of AI software in targeting should be a valuable asset in minimising civilian loss of life. One of the soldiers +972 interviewed sums up the rationale neatly: “I have much more trust in a statistical mechanism than a soldier who lost a friend two days ago.” Human beings can kill for emotional reasons, potentially with a much higher margin of error as a result. The idea of a drone or radio operator directing an attack from an operations room after having verified the data ought to provide some comfort. But one of the most alarming aspects of delegating so much of the target incrimination and selection process to machines, many would argue, is not the number of civilians who could be killed. It’s the questions of accountability afterwards and the incentives that derive from that. A soldier who fires indiscriminately can be investigated and tried, the motivation for his or her actions ascertained and lessons of those actions learnt. Indiscriminate killing by humans is seen as a bug in the system, to be rooted out – even if the mission to do so at a time of war seems like a Sisyphean task. A machine’s margin of error, on the other hand, is not ideal – but when it is perceived by operators as preferable to human mistakes, it isn’t treated as a bug. It becomes a feature. And that can create an incentive to trust the machine, and to abdicate human responsibility for error minimisation – precisely the opposite of what the laws of war intend. The testimonies of the Israeli officers to +972 provide a perfect illustration of an operational culture built on those perverse incentives. That would be the charitable interpretation. The less charitable one is an operational culture in which the human decision makers’ goal was to kill at scale, with parameters superficially designed to cater to ethics and laws being bent to fit the shape of that goal. The question of which of those cultures is more terrifying is a subjective one. Less subjective would be the criminality that gives rise to both of them.