## Adversary Search

## Adversary Search

So far, we have only fought the instance

Consider a two-player game with:

Perfect knowledge (for now)

No "luck" (for now)

Turn-based

Always ends

Well-defined final score + "zero sum"

Examples: Checkers, Chess, Connect-4, Tic-Tac-Toe

We can use state-space search to find "good" or maybe even "best" plays in that game at a given state

## Game State Space

State:

"Board" position (where are the checkers)

Side on move (red or black)

Maybe history of game (ugh, not checkers)

State space is directed, neighbors are states after legal moves

State space may be cyclic (which is where history rules sometimes come from)

## Zero-Sum and Negamax

Minimax Theorem (Von Neumann, Nash): Any state in the game has a well-defined

*value*assuming "best play" by both sidesBasic Negamax idea:

Choose a next state with minimum value for opponent (moves are boring)

Determine this by valuing each opponent state from opponent's point of view (recursively)

Zero-sum, so your value = negation of opponent's value

Thus, pick state that maximizes negation of opponent's value

## Win Pruning

If on some search you find a move (successor state) that is a win for the side on move — no point in searching further

This is true whether the win is immediate or by forced play

Can remove an "exponential" number of states from the search

## Game Value

Value of a game is value of starting state to side on move

Game's value is value of starting state

Game is "solved" if the value of every state is known (recorded or trivially calculated)

Game is "weakly solved" if the value of every state

*that could occur in perfect play*is known

Value is known: Hex

Weakly solved: Checkers

Solved: Tic-Tac-Toe, Connect-4, Awari, many others

## The Game of "15"

Game I found in some old Martin Gardner book

Rules:

Start with tiles labeled 1-9 in the center

Players take turns choosing a tile from the pile

Any time a player holds 3 tiles whose total is 15, they win

Perfect information, alternating, terminating, zero-sum: check

(There's a fantastic trick to playing this game)

## Solving "15"

Build a negamax search http://github.com/pdx-cs-ai/nkt

Can cheaply calculate value of any position in practice

Or can calculate the value of all possible positions and record them

- (How many possible positions are there?)

## "NKT": Generalizing "15"

We will generalize to "NKT": N tiles, exactly K tiles in a win, winning total is T

Next natural size is N=15, K=4, T=28, so we'll call this game "28"

Calculating the value of the starting position is going to take a

*long*time

## Depth-limited Negamax Search

We want an

*approximate*value for a position, and will tolerate an*approximately*best moveFor chess, we use "piece value" as a heuristic approximation. For "28", maybe number of "almost wins"?

Heuristic should look at both sides, so subtract on-move value from opponent

But just doing this right away might not be the best plan. Let's look ahead some fixed depth and use negamax to combine heuristic values

28 or fewer possible almost-wins, so we'll set the value of an actual win to 29

## Alpha-Beta Pruning

Generalization of Win Pruning. Hard to explain:

Imagine you find a forced score of 3 for some state

*s*in the middle of the searchImagine further that in some earlier state

*t*at the same level you had found a score of 2Then your opponent (parent) would never let you get to

*s*: they would avoid itSo there's no point in searching

*s*any further

## Alpha-Beta Implementation

"Alpha-Beta window" keeps track of best score for side not on move (alpha) and side on move (beta)

Prune when computed score exceeds beta

Game is zero-sum. On recursion, flip and negate the window: (alpha, beta) → (-beta, -alpha)

Window starts with (-max, max). When searching, keep bumping alpha up as better moves are found

## Effectiveness of Alpha-Beta

Depends on move ordering: a good move for the side on move will "close the window" a lot, so want to try good moves first

With optimal move ordering, will effectively

*double the depth*of the search by pruning; runtime of depth-4 without = runtime of depth-8 withNote that alpha-beta

*never changes the answer*: it only prunes states that provably don't matter