A computer was trained to play Qbert and immediately broke the game in a way no human ever has

Video game character Qbert makes an appearance in

caption
Video game character Qbert makes an appearance in “Wreck-It Ralph.”
source
YouTube/Disney

  • Machine learning researchers taught a machine how to play Qbert for Atari.
  • The computer program found a bizarre way to rack up 1 million points by playing the game “in what seems to be a random manner” and making the entire stage flash.
  • Artificial intelligence agents often find techniques to win games that a human would never discover.

While the jury’s still out on whether today’s machine-learning techniques will ever create a program that could rival human intelligence, one thing about the future of artificial intelligence is clear: The machines are really good at playing games.

And when the machines get good at the games, sometimes they come up with bizarre strategies and tactics that a human never would.

For example: In a new unreviewed paper posted on Arxiv, which we saw through a tweet from researcher Miles Brundage, three researchers from the University of Freiburg in Germany trained an agent using evolutionary strategies to play eight different Atari games from over 30 years ago.

For one of the games, “Qbert,” the AI found a way to exploit a bug in between levels, make the entire stage flash, and then rack up unlimited points.

Seriously. Even if you have never played “Qbert,” you can tell that the agent is crushing the game. (The goal of the game is to visit every square in the level and make them change colors by jumping on them.)

Here’s the video – the glitch starts at about 20 seconds in.

Here’s how Patryk Chrabaszcz and the other researchers describe what the agent is doing in the paper:

In the second interesting solution, the agent discovers an in-game bug. First, it completes the first level and then starts to jump from platform to platform in what seems to be a random manner. For a reason unknown to us, the game does not advance to the second round but the platforms start to blink and the agent quickly gains a huge amount of points (close to 1 million for our episode time limit). Interestingly, the policy network is not always able to exploit this in-game bug and 22/30 of the evaluation runs (same network weights but different initial environment conditions) yield a low score.

The strategies that AI agents take to win games are often fascinating. When Google’s AlphaGo Zero agent beat the world’s best Go player, its lead designer bragged that it found strategies that hadn’t been used in the thousands of years the game has been played. “It found these human moves, it tried them, then ultimately it found something it prefers,” AlphaGo’s lead programmer David Silver said at the time.

It’s also worth noting that the Qbert agent described in the new paper is using a different machine-learning technique from AlphaGo Zero’s reinforcement learning.

The bottom line is that machine-learning researchers love games. The rules are clear, you can run them thousands or millions of times, and they’re just plain fun – even when the machines start breaking the games.

Read the entire paper here.