Tic-Tac-Toe AI

About

This is a demonstration of a Monte Carlo Tree Search (MCTS) algorithm for the game of Tic-Tac-Toe. MCTS is a sampling-based search technique that has been successfully applied to much more difficult games, the most notable example being AlphaGo (and its successor AlphaZero), a computer AI trained to play the Chinese game Go. Combined with a deep valuation neural network, AlphaGo was able to beat world champion Go player, Lee Sedol.

The algorithm explores the game tree by selecting promising moves to explore. In this demonstration, move selection is done through an upper confidence bound (UCB) formula. The explicit formula was developed for Markov decision processess (MDPs) in Chang et al. (2005), and is provided in the FAQs. MCTS has four basic steps. The first is selection, where a move from the explored game tree is selected. Selection is followed by expansion, where countermoves (moves by the opponent) are sampled/explored. From there, simulation (aka rollout) is used to evaluate a game by simulating until a terminating state or until valuation of the state is precise enough. After reaching one of these states, the algorithm backpropgates and updates the estimated score for all the moves along the path taken. These processes are repeated for a specified number of rollouts.

This demonstration allows the user to play the MCTS AI or watch the AI play other types of AIs. The user is also able to specify certain parameters for the MCTS AI and compare the differences. If there are any questions, please check the FAQs.

Enjoy!

All Rights Reserved
Created by Uro Lyi, supported by the National Science Foundation (NSF) under Grant CMMI-1434419 (Principal Investigator Michael C. Fu).
urolyi1@gmail.com, mfu@umd.edu

The difference is in semi-random the move selected is random as long as there is no instant win or block move available. So in the simulation play out instead of completely random game the game is semi random moves.

For each possible move after each play out the move's score is updated. If the simulation resulted in a win the +1, +0.5 for a draw, and +0 for a loss. The estimated score is then the total score divided by the number of times the move was sampled.

The formula for the UCT value of a move is based on a formula. The formula is $UCT = w$ _i/n_i + c √(ln(N_i)/n_i)

Yes! Just select who moves first, then before hitting start click the cells on the board that you want to be filled. The turns alternate.

The input into the each cell before the AI moves is what that node's estimated score value is initialized to. If -10 is the input, the AI will initialize that move with 1 visitCount and a score of -10.

If the AI detects that there is an instant win or block available it will take that move without simulations. This is called a greedy AI and can be disabled in the settings. The "pure" AI will use the tree search method no matter what.

The optimal AI uses a min max algorithm to determine the most optimal move in each scenario. The min max algorithm basically explores every possible outcome from each move and chooses the best move from that information.

After the AI makes a move there is a graph on the right that shows the estimated score for each move and the colors of the path correspond with the colors on the board.