Hacker School, Week 3
tl;dr Tic-tac-toe, and more Tic-tac-toe
I’d gotten started on my Tic-tac-toe AI (artificial intelligence — a pretty fancy word for a Tic-tac-toe algorithm, but that’s the term people use) in Clojure at the very end of the previous week, and my plan for Week 3 was to wrap that up by the end of Monday, then do an implementation of the Game of Life in Clojure, then start working on my Robotwar project (see my blog post from Week 2). By the end of the week, I was still working on Tic-tac-toe. This was a little discouraging, but I was learning a new language, so I tried to give myself a break.
I actually got a basic Tic-tac-toe AI going fairly easily, using a heuristic algorithm, which means a series of basic rules for picking good moves. Mine picked them in the following order of priority:
- Pick a square that will be a winning move.
- Pick a square that will prevent the opponent from making a winning move on their next turn.
- Pick the center.
- Pick a corner.
- Pick any other free square.
This pretty much works every time, but as I learned early in the week, there is a more elegant solution, called the minimax algorithm, which can be applied to any zero-sum game like Tic-tac-toe (or really any competitive game, I think, with some modifications and compromises).
Minimax seemed like magic to me when I heard about it. The basic idea is simple to understand: when given any board, either at the beginning of the game or in the middle, have the AI play out every possible combination of moves, figure out who wins at the end of each branch in the tree of all possible moves, and work back recursively to figure out what is the best move for a given player at the given point in the game. (When doing the calculation, the AI makes the assumption that each player will play as well as the AI does.)
There are a number of variations and optimizations for speed, the most important of them being the one where you have to evaluate intermediate game states, because playing every game out until the end would consume too many computing resources (I gather this is true of pretty much any game except Tic-tac-toe). In this version of the algorithm, you set a depth limit, say 10 moves into the future. Then for each branch where the AI ends up calculating 10 moves into the future before it reaches a final board, it has to use some kind of heuristic algorithm to analyze that board 10 moves into the future and figure out which player that board is most favorable to. This means that even in games like Tic-tac-toe where there are only a few outcomes (win, lose, and draw), point values are assigned to intermediate boards so that they can be compared with each other.
The minimax algorithm gets its name from the fact that one player is designated the “maximizing” player, and the other player is designated the “minimizing” player. The maximizer wants the score to be as high as possible, and the minimizer wants the score to be as low as possible. There are a few variations, but a common version is that winning boards are assigned the value positive infinity (if it’s the maximizing player’s turn) or negative infinity (if it’s the minimizing player’s turn), and all boards that must be calculated in an intermediate state receive positive or negative scores, of various values. So in evaluating any given square before the depth limit is reached, its value is the numerical maximum (for the maximizing player) of all the values of the opponent’s next move, or the numerical minimum (for the minimizing player).
If you understand this immediately and can write out the algorithm, you’re way ahead of me. It actually took me a week to get this working. What I kept getting stuck on were the signs, and what it meant for one player to arbitrarily be positive and the other negative.
I ended up finding it helpful to first implement a preliminary version of the algorithm — one without numbers. In a game which has only a few possible outcomes (in the case of Tic-tac-toe, win, lose and draw) and which can also be played out to the end, it is actually not strictly necessary to use numbers for the return values of the recursive function. You can instead use symbols or strings like “win,” “lose,” and “draw,” or better yet, “X”, “O”, and “draw”. The reason it’s better to use “X” and “O” instead of “win” and “lose” is that if you use “win” and “lose”, you have to pass the opposite symbol (“win” instead of “lose”, or vice versa) up the chain at each level of the recursion, because one player’s win is the other player’s loss. If you use “X” or “O”, you don’t have that problem.
But we’re going to start with the most arduous version here, the version that uses no numbers and uses the generic symbols :win and :lose instead of absolute player-specific values like :X and :O. I wrote it because it helped me to work through the logic behind the algorithm, and it also helped me appreciate the numerical version, once I got to it!
Some helper functions not included below include
winner, which returns :X, :O, or nil,
board-full?, which is self-explanatory, and
free-squares, which returns a set of coordinate pairs representing all the free squares on the board. In the first
potential-board is a new board with a marker added. The values of the squares are either :X, :O, or :e for empty, and the values that the minimax function may return are :win, :lose, and :draw.
And now, here’s the eventual version I wrote using the more usual numerical version of the algorithm to compute the value of a square in Tic-tac-toe. It needs to use the Clojure
declare form because it involves two mutually recursive functions, but it is more elegant than the one above.
Because I’m still not bothering with intermediate depth calculations — I haven’t gotten to that point in my study of the minimax algorithm — I’m just assigning the arbitrary values 1 and -1 to wins by X and O. I could have actually used any positive and negative numbers.
In the version below, 1, -1, and 0 are actually used to represent both the values of the markers (X, O, and empty, respectively) and the return values of the minimax function (X wins, O wins, and it’s a draw, respectively). The
final-score function statically analyzes a board and returns 1 for X winning, -1 for O winning, 0 for a draw, and nil for a game that is not over.
Note the lack of the need for an
opposite-result function, and also the lack of a need for a complicated
reduce operation (we can just use the
max functions instead of