Testing times: Back to school with ChatGPT questions and AI-assisted answers

Tests written, answered and marked with the help of artificial intelligence are starting to be used in the UK, The National has been told, as schools and universities are urged to embrace technology such as ChatGPT . UK exam boards are open to robots setting “low-stakes” questions that could be marked by a machine “in combination with human involvement”. Pupils and students could also be allowed to use AI as a form of “open-book testing” and should be encouraged to talk freely about their use of large language models, educators said. “In some contexts, it may be perfectly valid to allow the use of AI tools, just as some assessments allow candidates to use calculators, or search engines,” said Alex Scharaschkin, director of research and innovation at exam board AQA. Like Google and Wikipedia before it, the rise of ChatGPT and similar AI software has sparked doom-and-gloom concerns about cheating on exams and the integrity of education. Another shift is coming with AI, which can generate plausible-sounding answers to almost anything – although it sometimes has “hallucinations” and confidently says something totally wrong. But Sam Illingworth, an associate professor of learning and teaching at Edinburgh Napier University, says the narrative that the new technology will “destroy the world” should be rejected. He said AI gave teachers an opportunity to set “authentic assessments” that would be hard to plagiarise, such as inviting pupils to consider their own social biases. Another approach could be to ask students and pupils to generate answers with ChatGPT then discuss the shortcomings of what was produced. The university has invited students to comment anonymously on how they use the technology. “We’re talking to students and they’re not really using it to plagiarise with generative AI,” he said. “They’re using it as a study tool, they’re using it to check their language if they’ve got English as a second language, they’re using it to spellcheck. “Really the best advice that we’re giving is have these open conversations with our students and not just assume that they’re using it in a way that we think they might be.” The Tony Blair Institute, the former prime minister’s think tank, has spoken of “future-proofing education” to bridge the gap between current schooling and the demands of tomorrow’s job market. It says pupils will need skills such as critical thinking and creativity instead of “direct instruction and memorisation” to “flourish in increasingly digital workplaces”. One of the key challenges for teachers is writing questions that will test people’s cleverness even if they have ChatGPT at their desk. Exam board OCR says that from November, it will be acceptable for pupils to use AI for initial research, which is treated as “no different from consulting published articles or books or browsing in a search engine”. They can also use two or three sentences of AI-generated text if they discuss it critically. “Open-AI exams seem to be the new open-book tests to me. Examinations and assignments should be crafted to remain challenging even with the aid of AI,” said Kiana Jafari Meimandi, a research scientist at the University of Virginia. “While large language models can provide answers, accuracy is not always guaranteed, similar to search engine results.” But for teachers, there is the time-saving prospect: they, too, could turn to AI to write questions, especially in science subjects where there is less room for interpretation. Laura Gould, who has been a science teacher in the UK for 10 years, said language models were a “really cool tool” that can save time and energy even with “some tweaking and fine-tuning at the end”. “Some educators use it to brainstorm and outline lesson plans and unit plans and have found it really helpful for curriculum work,” she said. “You can also use it to create grammar exercises and exemplar sentences, which is really nice and saves you some brainpower.” In recent evidence presented to the UK’s Department of Education, exam boards have said they are open to the idea while acknowledging that AI has limitations. Exam board OCR said AI could provide “high volumes numbers of multiple-choice questions” but “struggles with complexity”. It said a chat bot would “still need its outputs to be monitored and approved by human operators”. Its submission said “we cannot put generative AI back in its box” and said “we have to embrace the opportunities for education while being clear-sighted about the limits”. AQA, which has run tests on apps including ChatGPT, says “high-stakes exams are not about to be marked by robots” but that examiners can use AI as an aid and that bots could handle “informal assessments in class”. Mr Scharaschkin told The National that AI can produce draft questions but that “considerable human oversight is needed to revise and modify them”. “It is possible to achieve reasonable accuracy when marking criteria can be specified in a tightly rules-based way, but it is more challenging to train an AI classifier to replicate expert human judgment consistently, and explain the reason for the marks awarded, for example in marking essays,” he said. AQA has suggested that AI tools could have to pass trials like new medicines, in which they are checked for bias, effectiveness and copyright breaches. It says pupils should learn about AI and its benefits and risks. “We envisage future assessments of digital skills will include these topics,” said Mr Scharaschkin. The National put the AI question to AI: how do you write tests or set tasks that would be difficult to use AI to cheat at? Three AI tools gave a more or less identical response with an emphasis on open-ended questions. “For instance, instead of asking ‘What is 2 + 2?’, you could ask 'Explain the concept of addition in mathematics',” suggested ChatGPT. It also proposed challenges or puzzles similar to Captcha software – think of “clicking on all the pictures of buses” – that is meant to fend off spamming internet users. A second app, Wordtune, said testing in several areas at once such as conceptual understanding and critical thinking could defeat AI bots that “often excel in specific areas, but struggle to perform consistently across multiple domains”. Like other apps, it said randomising the order of questions could stop AI picking up patterns. Google’s chat bot Bard said pupils and students could be asked to “explain their reasoning” to show they are not simply memorising facts. “It is important to note that no test or task is completely immune to cheating, but by following these tips, you can make it more difficult for AI to cheat,” it said. In that, it may have guessed right.

Testing times: Back to school with ChatGPT questions and AI-assisted answers

Exam boards open to robots setting 'low-stakes' assessments as new academic year begins