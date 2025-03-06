China's e-commerce giant Alibaba Group has unveiled a new generative artificial intelligence model that it claims surpasses the performance of <a href="https://www.thenationalnews.com/business/2025/01/29/deepseek-ai-r1-what-chatgpt/" target="_blank">DeepSeek</a>, triggering a stock market rally among Chinese tech companies. Alibaba's AI platform, Qwen, released the open-source QwQ-32B model on Thursday, joining <a href="https://www.thenationalnews.com/future/technology/2025/02/06/deepseek-r1-cost-artificial-intelligence/" target="_blank">the generative AI arms race</a> amid widespread adoption globally. The first version of Qwen, which is under Alibaba Cloud, was released in April 2023 and the platform has quietly gained a following. QwQ-32B, according to Qwen, was designed for “embracing the power of reinforcement learning” and was evaluated across a range of benchmarks designed to assess its mathematical reasoning, coding proficiency and general problem-solving capabilities. Qwen said that studies have demonstrated that reinforcement learning can significantly improve the reasoning capabilities of models, supporting scalability and boosting its impact on enhancing the intelligence of large language models. Reinforcement learning is a form of machine learning that lets AI models refine their decision-making process based on positive, neutral and negative feedback. This helps them decide whether to repeat an action in similar circumstances, according to Oracle. It is one of the three basic paradigms of machine learning, alongside supervised and unsupervised learning. Qwen's QwQ-32B model comes with 32 billion parameters that it says achieves performance comparable to DeepSeek-R1, which has 671 billion parameters. In AI, parameters are the values, aspects or variables that are learnt by AI and other machine learning models – those that train the model – which in turn defines the output. The huge disparity between the parameters Qwen and DeepSeek use, according to the former, “underscores the effectiveness of reinforcement learning when applied to robust foundation models pretrained on extensive world knowledge”. Qwen also said it added the ability to let QwQ-32B think “critically while utilising tools and adapting its reasoning based on environmental feedback”. This shows the potential of reinforcement learning and provides an opportunity for more AI innovations, it said. Benchmarks presented by Qwen show that QwQ-32B outperforms DeepSeek-R1 and o1-mini from OpenAI, the creator of ChatGPT. For maths specifically, Qwen used what it calls an accuracy verifier for problems to ensure the correctness of final solutions; for coding, it implemented a code execution server to assess whether the generated codes successfully pass predefined test cases. “As training episodes progress, performance in both domains shows continuous improvement. After the first stage, we add another stage of reinforcement learning for general capabilities,” it said, noting that even “a small amount of steps” can boost the performance of other general capabilities. The emergence of more affordable open-source LLMs bodes well for the broader AI community, said Richard Clode, a portfolio manager at asset management group Janus Henderson. “This is a victory for the open-source model of driving community innovation … this is positive for the longer-term development of AI, driving and proliferating innovation,” he said in a note. And, perhaps most important, Qwen is free. Fellow Chinese company DeepSeek's free versions require a user to register by email, Google or a Chinese phone number, but advanced users, such as developers, <a href="https://www.thenationalnews.com/business/2025/01/29/deepseek-ai-r1-what-chatgpt/" target="_blank">must pay for tokens</a>. Following the announcement of QwQ-32B, Alibaba's share price rose by as much as 9.2 per cent on Thursday, before closing 8.6 per cent higher. It was a boost for Hangzhou-based Alibaba, founded by tech mogul Jack Ma, after years of being hit by government investigations that stymied its business. That kicked off a rally among Chinese technology stocks. The Hang Seng Tech Index, which represents the 30 largest technology companies listed in Hong Kong, closed 5.4 per cent higher on Thursday. Tech stocks in mainland China also jumped, with AI agent developer Focus Technology surging above the 10 per cent daily limit. Qwen said the release of QwQ-32B is an “initial step” in scaling reinforcement learning to further develop AI reasoning capabilities. It has also recognised the “untapped possibilities” within pretrained language models. The work it is doing will also bring it closer to artificial general intelligence – which is an advanced form of AI capable of performing any intellectual task that a human can do, the company said. OpenAI boss Sam Altman defines AGI as a technology that can reason, learn and operate across all areas of human cognition. For instance, an AGI system could write software, design complex architecture, and solve quantum computing equations without requiring separate programming for each task. “As we work towards developing the next generation of Qwen, we are confident that combining stronger foundation models with reinforced learning powered by scaled computational resources will propel us closer to achieving AGI … aiming to unlock greater intelligence,” Qwen said.