Reinforcement Learning

3 articles tagged "Reinforcement Learning"

Alibaba's Qwen Team Fixes Reinforcement Learning's Blind Spot to Make AI Reason More Deeply

Alibaba's Qwen research team has published a new training algorithm that addresses a fundamental limitation in how reinforcement learning reward signals are assigned to reasoning models — giving each step in a reasoning chain a weight proportional to its actual impact on the outcome. Early results show measurable improvements in multi-step reasoning quality.

D.O.T.S AI NewsroomApr 6, 20262 min read

Research

Alibaba's Qwen Team Fixes the Core Problem With Reasoning Model Training — and Doubles Thought Length in the Process

Reinforcement learning gives reasoning models the same reward for every token, regardless of whether it was the pivot that unlocked a solution or just a filler comma. Alibaba's Qwen team has built FIPO, an algorithm that assigns rewards based on downstream influence — and the results include doubled reasoning depth without adding a separate value model.

D.O.T.S AI NewsroomApr 5, 2026

Research

Meta's 'Hyperagents' Don't Just Improve at Tasks — They Improve at Improving

Meta AI researchers have developed 'hyperagents' built on an extension of the Darwin Gödel Machine framework, capable of optimizing not only task performance but the improvement mechanism itself. Across four domains — coding, paper review, robotics, and mathematics — the system showed benchmark gains of up to 6×, with improvement strategies transferring across domains.

D.O.T.S AI NewsroomMar 29, 2026