Reinforcement learning AI might bring humanoid robots to the real world

These robots that play soccer and navigate difficult terrain may be the future of AI

Two small, humanoid robots play soccer after being trained with reinforcement learning. The AI tool helps the robots to be more agile and resilient compared with traditional computer programming, according to a recent study.

By Matthew Hutson

May 24, 2024 at 9:15 am

ChatGPT and other AI tools are upending our digital lives, but our AI interactions are about to get physical. Humanoid robots trained with a particular type of AI to sense and react to their world could lend a hand in factories, space stations, nursing homes and beyond. Two recent papers in Science Robotics highlight how that type of AI — called reinforcement learning — could make such robots a reality.

“We’ve seen really wonderful progress in AI in the digital world with tools like GPT,” says Ilija Radosavovic, a computer scientist at the University of California, Berkeley. “But I think that AI in the physical world has the potential to be even more transformational.”

The state-of-the-art software that controls the movements of bipedal bots often uses what’s called model-based predictive control. It’s led to very sophisticated systems, such as the parkour-performing Atlas robot from Boston Dynamics. But these robot brains require a fair amount of human expertise to program, and they don’t adapt well to unfamiliar situations. Reinforcement learning, or RL, in which AI learns through trial and error to perform sequences of actions, may prove a better approach.

“We wanted to see how far we can push reinforcement learning in real robots,” says Tuomas Haarnoja, a computer scientist at Google DeepMind and coauthor of one of the Science Robotics papers. Haarnoja and colleagues chose to develop software for a 20-inch-tall toy robot called OP3, made by the company Robotis. The team not only wanted to teach OP3 to walk but also to play one-on-one soccer.

“Soccer is a nice environment to study general reinforcement learning,” says Guy Lever of Google DeepMind, a coauthor of the paper. It requires planning, agility, exploration, cooperation and competition.

The robots were more responsive when they learned to move on their own, versus being manually programmed. As input, the AIs received data including the positions and movements of the robot’s joints and, from external cameras, the positions of everything else in the game. The AIs had to output new joint positions.

The toy size of the robots “allowed us to iterate fast,” Haarnoja says, because larger robots are harder to operate and repair. And before deploying the machine learning software in the real robots — which can break when they fall over — the researchers trained it on virtual robots, a technique known as sim-to-real transfer.

Training of the virtual bots came in two stages. In the first stage, the team trained one AI using RL merely to get the virtual robot up from the ground, and another to score goals without falling over. As input, the AIs received data including the positions and movements of the robot’s joints and, from external cameras, the positions of everything else in the game. (In a recently posted preprint, the team created a version of the system that relies on the robot’s own vision.) The AIs had to output new joint positions. If they performed well, their internal parameters were updated to encourage more of the same behavior. In the second stage, the researchers trained an AI to imitate each of the first two AIs and to score against closely matched opponents (versions of itself).

To prepare the control software, called a controller, for the real-world robots, the researchers varied aspects of the simulation, including friction, sensor delays and body-mass distribution. They also rewarded the AI not just for scoring goals but also for other things, like minimizing knee torque to avoid injury.

Real robots tested with the RL control software walked nearly twice as fast, turned three times as quickly and took less than half the time to get up compared with robots using the scripted controller made by the manufacturer. But more advanced skills also emerged, like fluidly stringing together actions. “It was really nice to see more complex motor skills being learned by robots,” says Radosavovic, who was not a part of the research. And the controller learned not just single moves, but also the planning required to play the game, like knowing to stand in the way of an opponent’s shot.

“In my eyes, the soccer paper is amazing,” says Joonho Lee, a roboticist at ETH Zurich. “We’ve never seen such resilience from humanoids.”

But what about human-sized humanoids? In the other recent paper, Radosavovic worked with colleagues to train a controller for a larger humanoid robot. This one, Digit from Agility Robotics, stands about five feet tall and has knees that bend backward like an ostrich. The team’s approach was similar to Google DeepMind’s. Both teams used computer brains known as neural networks, but Radosavovic used a specialized type called a transformer, the kind common in large language models like those powering ChatGPT.

Instead of taking in words and outputting more words, the model took in 16 observation-action pairs — what the robot had sensed and done for the previous 16 snapshots of time, covering roughly a third of a second — and output its next action. To make learning easier, it first learned based on observations of its actual joint positions and velocity, before using observations with added noise, a more realistic task. To further enable sim-to-real transfer, the researchers slightly randomized aspects of the virtual robot’s body and created a variety of virtual terrain, including slopes, trip-inducing cables and bubble wrap.

This bipedal robot learned to handle a variety of physical challenges, including walking on different terrains and being bumped off balance by an exercise ball. Part of the robot’s training involved a transformer model, like the one used in ChatGPT, to process data inputs and learn and decide on its next movement.

After training in the digital world, the controller operated a real robot for a full week of tests outside — preventing the robot from falling over even a single time. And in the lab, the robot resisted external forces like having an inflatable exercise ball thrown at it. The controller also outperformed the non-machine-learning controller from the manufacturer, easily traversing an array of planks on the ground. And whereas the default controller got stuck attempting to climb a step, the RL one managed to figure it out, even though it hadn’t seen steps during training.

Reinforcement learning for four-legged locomotion has become popular in the last few years, and these studies show the same techniques now working for two-legged robots. “These papers are either at-par or have pushed beyond manually defined controllers — a tipping point,” says Pulkit Agrawal, a computer scientist at MIT. “With the power of data, it will be possible to unlock many more capabilities in a relatively short period of time.”

And the papers’ approaches are likely complementary. Future AI robots may need the robustness of Berkeley’s system and the dexterity of Google DeepMind’s. Real-world soccer incorporates both. According to Lever, soccer “has been a grand challenge for robotics and AI for quite some time.”