Introduction: Self-Learning Rock - Paper - Scissors Robot From Lego Mindstorms NXT!

About: Been a while since I've been on here! Now a Robotics Software Engineer at Neato Robotics.
Hi everyone!  This is my first instructable!

This is a REAL self - learning robot that learns how to play rock - paper - scissors!  It will learn how to beat a person 100% of the time!  A person is NOT needed to teach the robot how to play the game; it really does learn by itself!  

This robot does not play rock-paper-scissors in the way people play.  It first asks the user to input a move (either rock - paper - or scissors).  The robot then calculates the best move to play, and then will extend a retractable arm that shows its next move (a Lego rock, paper, or a Lego scissors).  The player must then tell the robot if the robot won, lost, or tied, against the player.  

While you may think that this robot is cheating, since it waits for the player to make a move, I did not program the robot to know the rules of the game!  The robot does not know that rock beats scissors, paper beats rock, or scissors beats paper!  Instead, the robot relies on the player to tell whether it won/lost/tied to learn from past success/failures and to use this information in the future!

This robot was featured at Robogames 2011 for the Lego Open challenge (first place!).  Moar pics and the source code will be attached!

Step 1: You Shall Not Win! - Construction Overview

The robot is made out of the following pieces:

1x Lego Mindstorms NXT - the brain!
3x Lego Mindstorms Touch Sensors - User Inputs
3x Lego Mindstorms NXT Motors - Peripherals for the robot
Tetrix Pieces for the Base (Aluminum chassis)

This robot has a very simple construction; each motor drives a retractable arm that carries a Lego rock (left), paper (center, I didn't have any Lego paper handy!), and scissors (right).  The robot will extend the arm that indicates its move.

On the bottom of the robot is the user inputs, which are three touch sensors with gears on them.  Each sensor has two possible options.  

Left Sensor: Rock and Yes!
Center Sensor: Paper and No!
Right Sensor: Scissors and Tie!

I will explain why each sensor has two options in the next slide!

Step 2: How Does the Robot Learn If It Doesn't Cheat?! (Part 1)

As I've written on the first step, this robot first asks the (much smarter!) human to input a move through the touch sensors.  It will then look through a database and determine what the best possible move to make is.  After making that move, the human will have to tell the robot whether it won/lost/tied for that round.  If the robot does not know that rock > scissors, paper > rock, or scissors > paper (I did not program these rules into the robot), how can it use this information to learn?!

The robot creates a virtual database (for you computer geeks, it uses a 3 dimensional array to do this!).  Think of this database like a rubik's cube.  The robot has to keep track of three things: 1) The move the player entered (rock, paper, or scissors); 2) The move the robot made (again rock, paper, or scissors); and 3) The result of this round (did the robot win, lose, or tie with the player?).  In this database, the robot will factor in a probability of success for this move.  This value is stored in the array, or (using the Rubik's Cube analogy) in one of the 27 cubes.

For example, if the player chose ROCK, but the robot choose SCISSORS, the robot lost, so it will enter a success rate of 0% for playing scissors when the player chooses rock in the future.

To encourage the robot to learn, I reward the robot using a system of virtual points!  An analogy is that of a little kid.  If I went up to the kid and said, "Hey, I'll give you $20 if you can learn to fly by yourself!", the kid will think, "Wow!  $20!  That's a good reward!  Let me try!".  The kid will first crawl, then walk, then run, and then jump in an attempt to fly and get the $20 reward.  However, the kid will eventually learn that he/she cannot fly without an airplane, and cannot succeed.  However, along the way, the kid had learned how the crawl, walk, run, and jump!

I applied these principles to the robot!  Instead of cash (would I seriously waste my time trying to give my robot $20?!), I will give the robot a virtual point (+1) if the robot beats the player.  BUT, I will take away 10,000 virtual points from the robot (yes I'm mean) if the robot either loses or ties with the player.  Since the robot wants to maximize the number of points it gains, it will use the success probabilities in its database to achieve this goal.

Step 3: How Does the Robot Learn If It Doesn't Cheat?! (Part 2)

Would you believe it if I told you that to make this robot learn how to play RPS, it only requires FOUR variables?!  O_O.  

The main variable is called EPSILON.  This variable is also known as the learning rate.  Epsilon starts out ridiculously high, which causes the robot to make random moves in the beginning of the game.  As the robot plays more (and consequently learns the best moves to make against the player), Epsilon decreases.  Since Epsilon gets smaller, over time, the robot will slowly begin to use the success probabilities in its database against the player.

The three other variables are: ALPHA, GAMMA, and KAPPA.

Alpha keeps track of how much each move affects the robot's learning.  That sounds confusing!  Actually, Alpha is intentionally set to as close to zero as possible.  If a player lies (*gasp*) to the robot (say if the player chose rock, and the robot chose paper, but the player claims that the robot lost), a low value of Alpha will cause the robot to ignore the lie!  However, if Alpha is too low, then the robot will not learn as quickly.

Gamma is a reward rate.  Gamma is set high (0.80) because as Gamma approaches 1, the robot is more likely to begin to use success probabilities sooner.

Kappa is a thoroughness value that helps the robot refine its probabilities.


Step 4: Enough Talk! Let's Play a Sample Game!

Now that the theory part is done, let's play a sample game with the robot!  I wish I took a video of this bot before I tore it down, but here's a sample game:

Round 1:
Me: Rock
Robot: Scissors
Loss

Round 2:
Me: Paper
Robot: Paper
Tie

Round 3:
Me: Scissors
Robot: Rock
Win!

Round 4:
Me: Rock
Robot: Scissors
Loss

Round 5:
Me: Paper
Robot: Scissors
Win!

Round 6:
Me: Scissors
Robot: Paper
Loss

Round 7:
Me: Rock
Robot: Paper
Win!

Round 8:
Me: Paper
Robot: Scissors
Win!

Round 9:
Me: Scissors
Robot: Rock
Win!

If you're observant, you can see that in the beginning of the game, the robot makes RANDOM moves, but eventually the robot beats me every time!  If you've also kept track of the moves I've made, you should have noticed that I cycled through Rock -> Paper -> Scissors.  This is to help the robot learn what moves to make to counter mine.  However, here's rounds 10 -> 13 to prove that the robot can now beat me every time:

Round 10: 
Me: Scissors (again)
Robot: Rock
Win!

Round 11:
Me: Paper 
Robot: Scissors
Win!

Round 12:
Me: Scissors
Robot: Rock
Win!

Round 13:
Me: Rock
Robot: Paper
Win!

Step 5: Source Code and Moar Pics!

Well I know that this project may be confusing for people, but here's the source code for it if you want to try this at home!  I used Bricxcc (a C programming language for Lego Mindstorms: http://bricxcc.sourceforge.net/) I've commented it to help aid you with understanding it!

For more information, try Googling the following:
- Q-Learning
- Reinforcement Learning

Thank you for reading this instructable, and I hope you enjoyed it! :D

P.S. If you try downloading my source code, it may show up as a .tmp file.  Please rename it as "RPSGame.nxc" to open it with Bricxcc.