# Estimating win probability from best-of-7 series is not straightforward

(Note: The code in this post is available here as a single R script.)

Let’s say A and B play 7 games and A wins 4 of them. What would be a reasonable estimate for A’s win probability (over B)? A natural estimate would be 4/7. More generally, if A wins $x$ games, then the estimator $\hat{p} = x/7$ is an unbiased estimator of the true win probability $p$. This is not difficult to show: since $x \sim \text{Binom}(7, p)$, we have

\begin{aligned} \mathbb{E}[x/7] = \mathbb{E}[x] / 7 = (7p) / 7 = p. \end{aligned}

Now, let’s say that A and B play a best-of-7 series (or equivalently, first to 4 wins), and A wins 4 of them. It seems natural to estimate A’s win probability by 4/7 again. If we define the estimator

$\hat{p} = \dfrac{\# \text{ games A wins}}{\# \text{ games}},$

does this estimator have any nice properties? It’s hard to say anything about this estimator (if someone knows something, please let me know!). In fact, it’s not even unbiased!

Estimator is biased: a simulation

First, let’s set up two functions. The first function, SampleSeries, simulates a series which ends when either player 1 reaches n_wins1 wins or when player 2 reaches n_wins2 wins. (The true win probability for player 1 is p.) The second function, EstimateWinProb, samples n_sim series. For each series, it estimates player 1’s win probability as in $\hat{p}$ above, then returns the mean of $\hat{p}$ over the n_sim simulations.

# Player 1 has win probability p. Simulates a series of games which ends when
# player 1 reaches n_wins1 wins or when player 2 reaches n_wins2 wins.
SampleSeries <- function(p, n_wins1, n_wins2 = n_wins1) {
game_results <- rbinom(n_wins1 + n_wins2 - 1, size = 1, prob = p)
if (sum(game_results) >= n_wins1) {
n_games <- which(game_results == 1)[n_wins1]
} else {
n_games <- which(game_results == 0)[n_wins2]
}
player1_wins <- sum(game_results[1:n_games])
return(c(player1_wins, n_games - player1_wins))
}

# Run SampleSeries n_sim times and for each run, estimate p by player 1's
# win percentage in that series. Returns the estimated probabilities.
EstimateWinProb <- function(p, n_sim, n_wins1, n_wins2 = n_wins1) {
win_data <- replicate(n_sim, SampleSeries(p, n_wins1, n_wins2))
prob_estimates <- apply(win_data, 2, function(x) x[1] / (x[1] + x[2]))
return(mean(prob_estimates))
}


Next, we run a Monte Carlo simulation (20,000 simulations) to compute the mean of $\hat{p}$ for a range of values of $p$. (Because of the symmetry inherent in the setup, we only consider $p \geq 0.5$.)

set.seed(1)
p <- 20:40 / 40
n_wins1 <- 4
n_sim <- 20000
mean_phat <- sapply(p, function(p) EstimateWinProb(p, n_sim, n_wins1))
plot(p, mean_phat, asp = 1, pch = 16, ylab = "Win prob estimator mean",
main = "First to 4 wins")
abline(0, 1)


If $\hat{p}$ were unbiased, we would expect the black dots to lie on the black line $y = x$; it’s clear that they don’t. In this setting, $\hat{p}$ is biased upwards.

Estimator is biased: exact computation

I was not able to find a nice closed-form expression for $\mathbb{E}[\hat{p}]$: for a best-of-7 series, WolframAlpha “simplified” my expression to give

\begin{aligned} \mathbb{E}[\hat{p}] = \dfrac{4x}{5} + \dfrac{2x^2}{15} + \dfrac{4x^3}{105} + \dfrac{101x^4}{21} - \dfrac{1252x^5}{105} + 10x^6 - \dfrac{20x^7}{7}. \end{aligned}

However, it is possible to write a function that computes $\mathbb{E}[\hat{p}]$ from first principles:

# Return expected value of the "series win percentage" estimator
GetEstimatorMean <- function(p, n_wins1, n_wins2 = n_wins1) {
# first line is for when player 1 wins, second line is for when player 2 wins
summands1 <- sapply(0:(n_wins2 - 1),
function(x) n_wins1 / (n_wins1 + x) * p^n_wins1 * (1 - p)^x *
choose(x + n_wins1 - 1, x)
)
summands2 <- sapply(0:(n_wins1 - 1),
function(x) x / (x + n_wins2) * p^x * (1 - p)^n_wins2 *
choose(x + n_wins2 - 1, x)
)
return(sum(c(summands1, summands2)))
}


Let’s plot these exact values on the previous plot to make sure we got it right:

lines(p, sapply(p, function(x) GetEstimatorMean(x, n_wins1)),
col = "red", lty = 2, lwd = 1.5)


What might be a better estimator?

Let’s go back to the setup where A and B play a best-of-7 series (or equivalently, first to 4 wins), and A wins 4 of them. Instead of estimating A’s win probability by 4/7, we could use the plot above to find which value of $p$ would have given $\mathbb{E}[\hat{p}] = 4/7$, and use that instead. This is essentially a method of moments estimator with a single data point, and so has some nice properties (well, at least it’s consistent).

The picture below depicts this method for obtaining estimates for when A wins 4 out of 4, 5, 6 and 7 games.

p <- 20:40 / 40
original_estimate <- c(4 / 7:4)
mm_estimate <- c(0.56, 0.64, 0.77, 1)  # by trial and error
plot(p, sapply(p, function(x) GetEstimatorMean(x, n_wins1 = 4)), type = "l",
asp = 1, ylab = "Win prob estimator mean")
abline(0, 1, lty = 2)
segments(x0 = 0, x1 = mm_estimate, y0 = original_estimate, col = "red", lty = 2)
segments(y0 = 0, y1 = original_estimate, x0 = mm_estimate, col = "red", lty = 2)


What are some other possible estimators in this setting?

First to n wins: generalizing the above

Since we’ve coded up all this already, why look at just first to 4 wins? Let’s see what happens to the estimator $\hat{p}$ when we vary the number of wins the players need in order to win the series. The code below makes a plot showing how $\mathbb{E}[\hat{p}]$ varies as we change the number of wins needed from 1 to 8 (inclusive).

library(tidyverse)
theme_set(theme_bw())
df <- expand.grid(p = 20:40 / 40, n_wins1 = 1:8)
df$mean_phat <- apply(df, 1, function(x) GetEstimatorMean(x[1], x[2])) ggplot(df, aes(group = factor(n_wins1))) + geom_line(aes(x = p, y = mean_phat, col = factor(n_wins1))) + coord_fixed(ratio = 1) + scale_color_discrete(name = "n_wins") + labs(y = "Win prob estimator mean", title = "First to n_wins")  Past some point, the bias decreases as n_wins1 increases. Peak bias seems to be around n_wins1 = 3. The picture below zooms in on a small portion of the picture above: Asymmetric series: different win criteria for the players Notice that in the code above, we maintained separate parameters for player 1 and player 2 (n_wins1 and n_wins2). This allows us to generalize even further, to explore what happens when players 1 and 2 need a different number of wins in order to win the series. The code below shows how $\mathbb{E}[\hat{p}]$ varies with $p$ for different combinations of n_wins1 and n_wins2. Each line within a facet corresponds to one value of n_wins1, while each facet of the plot corresponds to one value of n_wins2. The size of $\hat{p}$‘s bias increases as the difference between n_wins1 and n_wins2 increases. df <- expand.grid(p = 20:40 / 40, n_wins1 = 1:8, n_wins2 = 1:8) df$mean_phat <- apply(df, 1, function(x) GetEstimatorMean(x[1], x[2], x[3]))
ggplot(df, aes(group = factor(n_wins1))) +
geom_line(aes(x = p, y = mean_phat, col = factor(n_wins1))) +
geom_abline(slope = 1, intercept = 0, linetype = "dashed") +
coord_fixed(ratio = 1) +
scale_color_discrete(name = "n_wins1") +
facet_wrap(~ n_wins2, ncol = 4, labeller = label_both) +
labs(y = "Win prob estimator mean", title = "Player 1 to n_wins1 or Player 2 to n_wins2")


# Simulating the dice game “Toss Up!” in R

Toss Up! is a very simple dice game that I’ve always wanted to simulate but never got around to doing so (until now!). This post outlines how to simulate a Toss Up! game in R, as well as how to evaluate the effectiveness of different game strategies. All the code for this blog post is available here.

Rules

The official rules for Toss Up! are available here. Here is an abbreviated version:

• As shown in the picture above, the game uses 10 six-sided dice. Each die has 3 green sides, 2 yellow sides and one red side.
• For each turn, a player rolls all 10 dice. If any greens are rolled, set them aside.
• At this point, the player can choose to end the turn and “bank” the points (one point for one green), or keep rolling the remaining dice.
• After every roll, greens are set aside. If you roll enough times to make all 10 dice green, you have 10 points and you can either bank or roll all 10 dice again.
• A player can keep going until he/she decides to stop rolling and bank all the points earned on this turn.
• Catch: If, on a roll, none of the dice are green and at least one is red, the turn ends and no points can be banked.
• First player to reach 100 points wins.

Simulating Toss Up!

There are several different ways to implement Toss Up!: I describe my (two-player) version below. To allow for greater flexibility for the resulting code, I implement the game with three global constants that the programmer can change:

1. NUM_DICE: The number of dice used in the game (default is 10).
2. FACE_PROBS: A numeric vector of length 3 denoting the probability that a die comes up red, yellow or green respectively (default is c(1/6, 2/6, 3/6)).
3. POINTS_TO_WIN: The number of points a player needs to obtain to win the game (default is 100).

Step 1: How can we describe the state of a game?

The state of the game can be encapsulated with 4 pieces of information:

1. current_player: who’s turn it is to make a decision.
2. scores: a vector of length 2 containing the scores for players 1 and 2 respectively.
3. turn_points: Number of points scored on the current turn so far (these points have not been banked yet).
4. dice_rolls: A vector of variable length denoting the outcome of the dice rolls (0: red, 1: yellow, 2: green).

In my implementation, the state is stored as a list with the 4 elements above.

Step 2: Updating the state

From a given state, there are 3 possibilities for what comes next:

1. The dice rolls don’t have any greens and at least one red. In this case, the current players turn is over. We need to change current_player, reset turn_points to 0, and re-roll the dice (NUM_DICE of them).
2. The dice rolls either (i) have at least one green, or (ii) have no reds. In this case, the current player has a choice of what to do.
1. If the player chooses to bank, then we need to update scores, reset turn_points to 0, change current_player and re-roll the dice (NUM_DICE of them).
2. If the player chooses to roll, then we need to update turn_points and re-roll just the dice that were not green.

The function updateState below does all of the above. I have also added a verbose option which, if set to TRUE, prints information on the game state to the console. (The function DiceToString is a small helper function for printing out the dice rolls.)

DiceToString <- function(dice_rolls) {
return(paste(sum(dice_rolls == 2), "Green,",
sum(dice_rolls == 1), "Yellow,",
sum(dice_rolls == 0), "Red"))
}

# Executes the current player's action and updates the state. Is also used if
# no greens and at least one red is rolled.
UpdateState <- function(state, action, verbose = FALSE) {
if (verbose) {
cat("Current roll:", DiceToString(state$dice_rolls)) if (action != "rolled at least 1 red and no green") cat(" (bank", state$turn_points + sum(state$dice_rolls == 2), "pts?)", fill = TRUE) else cat("", fill = TRUE) cat(paste0("Player ", state$current_player, " ", action, "s"),
fill = TRUE)
}

if (action == "bank") {
# action = "bank": bank the points current player earned this turn, then
# re-roll the dice.
state$scores[state$current_player] <- state$scores[state$current_player] +
state$turn_points + sum(state$dice_rolls == 2)
state$turn_points <- 0 state$dice_rolls <- sample(0:2, size = NUM_DICE, replace = TRUE,
prob = FACE_PROBS)
state$current_player <- 3 - state$current_player
} else if (action == "roll") {
# action = "roll": add the green dice to points earned this turn, then
# re-roll the remaining dice.
state$turn_points <- state$turn_points + sum(state$dice_rolls == 2) ndice <- sum(state$dice_rolls != 2)
if (ndice == 0) ndice <- NUM_DICE
state$dice_rolls <- sample(0:2, size = ndice, replace = TRUE, prob = FACE_PROBS) } else if (action == "rolled at least 1 red and no green") { # action = "rolled at least 1 red and no green": # no points banked, turn ends, re-roll dice. state$turn_points <- 0
state$dice_rolls <- sample(0:2, size = NUM_DICE, replace = TRUE, prob = FACE_PROBS) state$current_player <- 3 - state$current_player } else { stop("action must be 'bank', 'roll', or 'rolled at least 1 red and no green'") } if (verbose) { if (action != "roll") { cat("Current scores:", state$scores, fill = TRUE)
cat("", fill = TRUE)
}
}

return(state)
}


Step 3: How to express player behavior and strategy?

We can think of a player as a black box which takes a game state as an input and outputs a decision, “bank” or “roll”. In other words, the player is a function!

Below are two extremely simple strategies expressed as functions. The first strategy simply banks after the first roll. The second strategy banks once more than target points can be earned from the turn.

# strategy that stops after one roll
OneRoll <- function(state) {
return("bank")
}

# strategy that stops only when the turn yields > target points
OverTargetPoints <- function(state, target = 10) {
if (state$turn_points + sum(state$dice_rolls == 2) > target) {
return("bank")
} else {
return("roll")
}
}


(Note: In my implementation, strategy functions should assume that they are current_player: that is how they can determine their current score and that of their opponent correctly.)

Step 4: Simulating a full game

We can now put the pieces from the previous steps together to simulate a full game of Toss Up!. The SimulateGame function takes in two strategy functions as input. It sets up the initial state, then updates the state repeatedly until the game ends.

SimulateGame <- function(Strategy1, Strategy2, verbose = FALSE) {
# set up initial state
state <- list(current_player = 1,
scores = c(0, 0),
turn_points = 0,
dice_rolls = sample(0:2, size = NUM_DICE, replace = TRUE,
prob = FACE_PROBS))

# check if no greens and at least one red, if so change player
while (sum(state$dice_rolls == 2) == 0 && sum(state$dice_rolls == 0) > 0) {
state <- UpdateState(state, "rolled at least 1 red and no green", verbose)
}

# while the game has not ended:
# - get the next action from the current player's strategy
# - update the state
while (max(state$scores) < POINTS_TO_WIN) { if (state$current_player == 1) {
action <- Strategy1(state)
} else {
action <- Strategy2(state)
}

# check if no greens and at least one red
while (sum(state$dice_rolls == 2) == 0 && sum(state$dice_rolls == 0) > 0) {
state <- UpdateState(state, "rolled at least 1 red and no green", verbose)
}
}

# game has ended: return winner
if (verbose) {
cat(paste("Game ends: Player", which.max(state$scores), "wins!"), fill = TRUE) } return(which.max(state$scores))
}


Two examples of simulated games

The code below shows what a simulated game of Toss Up! might look like. In the code snippet below, players need 20 points to win. The first player uses the super conservative strategy of banking immediately, while the second player tries to win the entire game in one turn.

NUM_DICE <- 10
FACE_PROBS <- c(1/6, 2/6, 3/6)
POINTS_TO_WIN <- 20
set.seed(1)
winner <- SimulateGame(OneRoll,
function(state) { OverTargetPoints(state, 19) },
verbose = TRUE)

# Current roll: 4 Green, 3 Yellow, 3 Red (bank 4 pts?)
# Player 1 banks
# Current scores: 4 0
#
# Current roll: 5 Green, 4 Yellow, 1 Red (bank 5 pts?)
# Player 2 rolls
# Current roll: 3 Green, 1 Yellow, 1 Red (bank 8 pts?)
# Player 2 rolls
# Current roll: 2 Green, 0 Yellow, 0 Red (bank 10 pts?)
# Player 2 rolls
# Current roll: 5 Green, 4 Yellow, 1 Red (bank 15 pts?)
# Player 2 rolls
# Current roll: 2 Green, 3 Yellow, 0 Red (bank 17 pts?)
# Player 2 rolls
# Current roll: 0 Green, 3 Yellow, 0 Red (bank 17 pts?)
# Player 2 rolls
# Current roll: 2 Green, 1 Yellow, 0 Red (bank 19 pts?)
# Player 2 rolls
# Current roll: 0 Green, 1 Yellow, 0 Red (bank 19 pts?)
# Player 2 rolls
# Current roll: 0 Green, 1 Yellow, 0 Red (bank 19 pts?)
# Player 2 rolls
# Current roll: 1 Green, 0 Yellow, 0 Red (bank 20 pts?)
# Player 2 banks
# Current scores: 4 20
#
# Game ends: Player 2 wins!


In this particular simulation, it paid off to be a bit more risk taking. As the simulation below shows, that’s not always the case.

NUM_DICE <- 10
FACE_PROBS <- c(0.5, 0.0, 0.5)
POINTS_TO_WIN <- 20
set.seed(1)
winner <- SimulateGame(OneRoll,
function(state) { OverTargetPoints(state, 19) },
verbose = TRUE)

# Current roll: 6 Green, 0 Yellow, 4 Red (bank 6 pts?)
# Player 1 banks
# Current scores: 6 0
#
# Current roll: 5 Green, 0 Yellow, 5 Red (bank 5 pts?)
# Player 2 rolls
# Current roll: 2 Green, 0 Yellow, 3 Red (bank 7 pts?)
# Player 2 rolls
# Current roll: 0 Green, 0 Yellow, 3 Red
# Player 2 rolled at least 1 red and no greens
# Current scores: 6 0
#
# Current roll: 5 Green, 0 Yellow, 5 Red (bank 5 pts?)
# Player 1 banks
# Current scores: 11 0
#
# Current roll: 7 Green, 0 Yellow, 3 Red (bank 7 pts?)
# Player 2 rolls
# Current roll: 2 Green, 0 Yellow, 1 Red (bank 9 pts?)
# Player 2 rolls
# Current roll: 1 Green, 0 Yellow, 0 Red (bank 10 pts?)
# Player 2 rolls
# Current roll: 3 Green, 0 Yellow, 7 Red (bank 13 pts?)
# Player 2 rolls
# Current roll: 2 Green, 0 Yellow, 5 Red (bank 15 pts?)
# Player 2 rolls
# Current roll: 2 Green, 0 Yellow, 3 Red (bank 17 pts?)
# Player 2 rolls
# Current roll: 2 Green, 0 Yellow, 1 Red (bank 19 pts?)
# Player 2 rolls
# Current roll: 0 Green, 0 Yellow, 1 Red
# Player 2 rolled at least 1 red and no greens
# Current scores: 11 0
#
# Current roll: 5 Green, 0 Yellow, 5 Red (bank 5 pts?)
# Player 1 banks
# Current scores: 16 0
#
# Current roll: 4 Green, 0 Yellow, 6 Red (bank 4 pts?)
# Player 2 rolls
# Current roll: 4 Green, 0 Yellow, 2 Red (bank 8 pts?)
# Player 2 rolls
# Current roll: 1 Green, 0 Yellow, 1 Red (bank 9 pts?)
# Player 2 rolls
# Current roll: 0 Green, 0 Yellow, 1 Red
# Player 2 rolled at least 1 red and no greens
# Current scores: 16 0
#
# Current roll: 5 Green, 0 Yellow, 5 Red (bank 5 pts?)
# Player 1 banks
# Current scores: 21 0
#
# Game ends: Player 1 wins!


Comparing some simple strategies

We can’t tell whether one strategy is better than another by looking at a single game since there is so much randomness involved. What we should do is simulate many games and see which strategy wins out over the long run.

OneRoll vs. OneRoll

Let’s do a sanity check. Here, we run 10,000 games of the strategy OneRoll vs. itself (this takes around 4 seconds on my machine). Since the strategy for both players is the same, we might expect player 1 to win around 50% of the time right?

NUM_DICE <- 10
FACE_PROBS <- c(1/6, 2/6, 3/6)
POINTS_TO_WIN <- 100

nsim <- 10000
set.seed(1)
winners <- replicate(nsim, SimulateGame(OneRoll, OneRoll))
table(winners)  # not 50-50: player who starts first has an advantage

# winners
# 1    2
# 5910 4090


It looks like player 1 wins around 59% of the time! On second thought, we would expect player 1 to win more because he/she has a first mover advantage: if player 1 reaches 100 points, he/she wins even if player 2 might reach 100 points on the very next turn.

To make this sanity check work, we should have the players each go first half the time. The code below achieves that:

# for even numbered simulations, let player 2 start first
winners2 <- winners
winners2[2 * 1:(nsim/2)] <- 3 - winners2[2 * 1:(nsim/2)]
table(winners2)

# winners2
# 1    2
# 5030 4970


Now player 1 wins just 50.3% of the time, which is much closer to the expected 50%.

OneRoll vs. >20 points

Next, let’s compare the “one roll” strategy with the strategy which stops once the player can bank more than 20 points for the turn:

set.seed(1)
winners <- replicate(nsim, SimulateGame(
OneRoll,
function(state) { OverTargetPoints(state, 20) }))
table(winners)

# winners
# 1    2
# 75 9925


Wow! The strategy of banking only when >20 points have been earned on a turn wins almost all the time, even though it doesn’t have the first mover advantage!

>20 points vs. >50 points

Taking a bit more risk than always banking seems to pay off. What about even more risk? How does banking after >20 points compare with banking after >50 points?

set.seed(1)
winners <- replicate(nsim, SimulateGame(
function(state) { OverTargetPoints(state, 20) },
function(state) { OverTargetPoints(state, 50) }))
table(winners)

# winners
# 1    2
# 6167 3833

# switch order of players
set.seed(1)
winners <- replicate(nsim, SimulateGame(
function(state) { OverTargetPoints(state, 50) },
function(state) { OverTargetPoints(state, 20) }))
table(winners)

# winners
# 1    2
# 4414 5586


The >20 points strategy won 61.7% of the games when it went first, and won 55.9% of the games when it went second, so it’s clearly a superior strategy.

Where can we go from here?

There are several future directions we could take with this code base.

• The strategies presented here are very simple. Can we program more complex strategies (e.g. some that depend on the score of the other player, or on how close the player is to winning)?
• Among strategies of the form “bank once you have more than xx points”, is there an optimal value for xx?
• What is the optimal strategy for Toss Up!? Can we learn it through methods such as Q-learning?
• Is there even such a thing as an optimal strategy? Do there exist 3 strategies such that A beats B, B beats C, but C beats A?
• Can we quantify the first mover advantage?
• Can we modify the code so that we can visualize the players’ scores as the game progresses?
• Can we modify the code so that more than 2 players can play?

# Exploring the game “First Orchard” with simulation in R

My daughter received the board game First Orchard as a Christmas present and she’s hooked on it so far. In playing the game with her, a few probability/statistics questions came to mind. This post outlines how I answered some of them using simulation in R. All code for this blog post can be found here.

(In my googling I found that Matt Lane has an excellent blog post on this game, answering some of the questions that I was interested in.)

Gameplay

Before we get to the questions, let me give a quick explanation of how the game works (see Reference 1 for a more colorful explanation as well as an applet to play the game online).

• It’s a cooperative game, with all players playing against the board.
• The game starts with 16 fruit: 4 of each color (red, green, blue, yellow), and a raven at one end of a path that is 5 steps long.
• On each player’s turn, the player rolls a 6-sided die.
• If the die comes up red, green, blue or yellow, the player gets to “harvest” a fruit of that color if there are any left to harvest. If all 4 fruits of that color have been harvested, nothing happens.
• If the die shows a fruit basket, the player gets to harvest a fruit of any color.
• If the die shows a raven, the raven moves one step along the path.
• The game ends when either all the fruit are harvested (players win) or when the raven reaches the end of the path (raven wins).

As you can see this is a really simple game (hence it’s “suitable for 2+ years” rating). The only strategy is in choosing what fruit to take if a fruit basket shows up: see Reference 1 for some simulations for different strategies. Intuitively it seems like choosing the color with the most fruit remaining is the best strategy, and that’s what I bake into my code. (Is there a proof for this, and is this still true in more general circumstances described in Reference 1?)

Code for simulating the game

The state of the game can be captured in a numeric vector of length 5. The first 4 numbers refer to the number of fruit left for each color, and the 5th number keeps track of the number of steps the raven has taken so far. I created 3 functions to simulate one game of First Orchard (see full code here):

• SimulateTurn(state, verbose) takes one dice roll and updates the state of the game. For simplicity, if a 1-4 is rolled, a fruit is harvested from that corresponding tree. If 5 is rolled, the raven takes a step. A rolled 6 is taken to mean “fruit basket”, and I remove a fruit from the tree with the most remaining fruits.
• CheckGameState(state, max_raven_steps) checks if the game has ended or not, and if so, who won.
• SimulateGame(fruit_count, max_raven_steps, verbose) runs an entire game of First Orchard: while the game has not ended, run SimulateTurn. Once the game has ended, this function returns (i) who won, (ii) the number of turns taken, (iii) the number of steps the raven took, and (iv) the number of fruit left.

We allow for two game parameters to be defined by the user: the number of fruit of each type at the start of the game (fruit_count, default is 4) and the number of steps the raven must take in order for it to win (max_raven_steps, default is 5). The verbose option for these functions so that the user can see what happened in the game. The code below is an example of the output from SimulateGame:

set.seed(1)
results <- SimulateGame(fruit_count = 2, max_raven_steps = 3,
verbose = TRUE)
# Roll: 1 , State: 1,2,2,2,0
# Roll: 4 , State: 1,2,2,1,0
# Roll: 1 , State: 0,2,2,1,0
# Roll: 2 , State: 0,1,2,1,0
# Roll: 5 , State: 0,1,2,1,1
# Roll: 3 , State: 0,1,1,1,1
# Roll: 6 , State: 0,0,1,1,1
# Roll: 2 , State: 0,0,1,1,1
# Roll: 3 , State: 0,0,0,1,1
# Roll: 3 , State: 0,0,0,1,1
# Roll: 1 , State: 0,0,0,1,1
# Roll: 5 , State: 0,0,0,1,2
# Roll: 5 , State: 0,0,0,1,3
# Raven wins
# # of turns: 13
# # of steps raven took: 3
# # fruit left: 1


Simulation time!

What is the probability of winning the game? How does that change as we vary (i) the number of fruit of each color, and (ii) the number of steps the raven must take in order for the players to lose?

We let the number of fruit of each color vary from 1 to 8, and the number of steps the raven must take from 1 to 8. For each parameter setting, we simulate the game 10,000 times and compute the player win probability. We plot the results as a heatmap.

As one might expect, win probability goes down as the number of fruit increases and as the number of steps the raven must take decreases. For the original game (4 fruit, 5 steps), the win probability is approximately 62%. Sounds reasonable: we would like a win probability >50% so that kids will not get discouraged by too much losing, but not so high that they think the game is trivial.

For the original game (4 fruit, 5 steps), what is the expected number of steps until the game ends? Does this change depending on whether the player or the raven wins?

We simulate the game 100,000 times and keep track of the number of steps taken in each game. The shortest game took 5 steps while the longest took 45 steps, with the modal number of steps being 21 (it also happens to be the mean and median). Here is the histogram for all 100,000 runs:

Here are the histograms split by the outcome:

Games where the raven wins tend to be shorter than those when players win. Maybe that’s not too surprising, since a game where the raven wins needs just 5 steps, while a game where the players win needs at least 16 steps. On average, the game takes 19 steps for raven wins and 22 steps for player wins.

For the original game (4 fruit, 5 steps), given that the raven loses, what is distribution of the number of steps the raven has taken?

Because we programmed SimulateGame to return the number of steps the raven has taken as well, we don’t have to rerun the simulations: we can just use the 100,000 simulations we ran previously and look at the ones that the raven lost. Here is the histogram of steps the raven took in losing games, with the vertical red line representing the mean:

For the original game (4 fruit, 5 steps), given that the raven wins, what is distribution of the number of unharvested fruit?

Again, we can just use the 100,000 simulations we ran previously and look at the ones that the raven won. Here is the histogram along with the mean and median marked out with vertical lines:

The modal number of fruit left in player-losing games is 1: ugh tantalizingly close!

If there was no raven, how many turns would it take to harvest all the fruit?

“No raven” is the same as saying that the raven needs to take an infinite number of steps in order to win. Hence, we can use our existing simulation code with max_raven_steps = Inf to simulate this setting.

The shortest game took 16 turns while the longest game took 63 turns, with 22 and 24 turns being the modal and mean number of turns respectively. (In theory, a game could go on forever.) Here is the histogram:

References:

1. Matt Lane. (2018). Harvesting Wins.

# Simulating paths from a random walk

If you’ve ever visited this blog at wordpress.com, you might have noticed a header image that looks like this:

Ever wonder how it was generated? The image depicts 100 simulations of an asymmetric random walk. In this post, I’ll go through the code used to generate this image. All the code can also be found here.

For $t = 1, 2, \dots$, consider a series of i.i.d. random variables $X_1, X_2, \dots$ such that for any $t$, $X_t = 1$ with probability $p$, and $X_t = -1$ with probability $1-p$. ($p \in [0,1]$ is a parameter that we get to choose.) A random walk simply tracks the cumulative sum of these random variables, i.e.

$S_0 = 0, \quad S_t = \sum_{s = 1}^t X_s.$

In my image, I let the random walk run until it hits a fixed upper limit or a fixed lower limit. Here is an R function that generates one realization of this random walk:

# returns the random walk path values as a vector
# (random walk always starts at 0)
# p: probability of increasing by 1
# stop if path value hits either lower or upper
run <- function(p, lower, upper) {
values <- c(0)
current <- 0
while (current > lower & current < upper) {
current <- current + ifelse(runif(1) < p, 1, -1)
values <- c(values, current)
}
values
}


(There might be more efficient ways of doing this, but since computation is relatively fast this is good enough for our purposes.)

The code below creates a list of 100 elements, each element being a simulated random walk. The probability of going up at any one step is 0.48, and we stop the random walk once we hit -50 or 50.

N <- 100  # no. of paths to simulate
p <- 0.48
lower <- -50
upper <- 50

# simulate paths
set.seed(1055)
vlist <- replicate(N, run(p, lower, upper))


We can plot these paths along with the x-axis and the upper & lower limits:

# get length of longest path
max_length <- max(sapply(vlist, length))

# make plot
par(mar = rep(0, 4))  # no margins
plot(c(1, max_length), c(lower, upper), type = "n")
for (i in 1:N) {
lines(1:length(vlist[[i]]), vlist[[i]])
}
abline(h = 0, lty = "dashed")
abline(h = lower, lwd = 2)
abline(h = upper, lwd = 2)


Without color, it’s hard to make much sense of the image. To introduce color, let’s create a function that picks a color for path based on (i) whether the path hit the upper or lower limit, and (ii) how long the path is when compared to the longest path in the image.

colorPicker <- function(values, max_length,
ls_color = c(178, 34, 34), ll_color = c(255, 204, 0),
us_color = c(0, 0, 102), ul_color = c(102, 204, 225)) {
l <- length(values)
if (values[l] < 0) {
rgb_values <- (ls_color + (ll_color - ls_color) * l / max_length) / 255
} else {
rgb_values <- (us_color + (ul_color - us_color) * l / max_length) / 255
}
rgb(rgb_values[1], rgb_values[2], rgb_values[3])
}


If a path hits the lower limit and has length 0, it will have color ls_color / 255, and if a path hits the lower limit and has the longest length among all our simulated paths, it will have color ll_color / 255. There is a similar relationship for paths hitting the upper limit and us_color and ul_color. (You can see what these colors are at a website such as this.)

We can now color our black-and-white image:

plot(c(1, max_length), c(lower, upper), type = "n")
for (i in 1:N) {
lines(1:length(vlist[[i]]), vlist[[i]],
col = colorPicker(vlist[[i]], max_length), lwd = 0.5)
}
abline(h = 0, lty = "dashed")
abline(h = lower, lwd = 2)
abline(h = upper, lwd = 2)


From the image it is now obvious that most of the paths hit the lower limit first, and fairly quickly at that! Here is the same image, but with different choices of color scale:

plot(c(1, max_length), c(lower, upper), type = "n")
for (i in 1:N) {
lines(1:length(vlist[[i]]), vlist[[i]],
col = colorPicker(vlist[[i]], max_length,
ls_color = c(230, 230, 230),
ll_color = c(166, 166, 166),
us_color = c(255, 0, 0),
ul_color = c(0, 0, 255)),
lwd = 0.5)
}
abline(h = 0, lty = "dashed")
abline(h = lower, lwd = 2)
abline(h = upper, lwd = 2)


The possibilities are endless!

If you haven’t seen random walks before, you might be surprised at how a slightly biased walk ($p = 0.48$ instead of $p = 0.5$) results in so many more paths hitting the lower limit before the upper limit, even though the two limits are the same distance from 0. This is an example of 100 simulations when $p = 0.5$:

This is an example of 100 simulations when $p = 0.52$:

Isn’t it interesting how slight deviations in the probability of going one step up (instead of down) completely change the dynamics? It turns out that there is a closed form for the probability of hitting one limit (as opposed to hitting the other). If the upper limit is $x = b$ and the lower limit is $x = -a$, and $q = 1-p$, then

\begin{aligned} \mathbb{P} \{ \text{hit } x = b \text{ before } x = -a \} = \begin{cases} \frac{1 - (q/p)^a}{1 - (q/p)^{a+b}} &\text{if } p \neq 1/2, \\ \frac{a}{a+b} &\text{if } p = 1/2. \end{cases} \end{aligned}

For my blog header image, $p = 0.48$, $a = b = 50$, which means the probability of hitting the upper limit is approximately 0.0179: not high at all!

# Probability of winning a best-of-7-series (part 2)

In this previous post, I explored the probability that a team wins a best-of-n series, given that its win probability for any one game is some constant $p$. As one commenter pointed out, most sports models consider the home team to have an advantage, and this home advantage should affect the probability of winning a series. In this post, I will explore this question, limiting myself (most of the time) to the case of $n = 7$.

To be as general as possible, instead of giving a team just one parameter $p$, I will give a team two parameters: $p_H$ for the probability of winning at home, and $p_A$ for winning away. In general we will have $p_H > p_A$, although that is not always the case. (For example, in the NBA 2018/19 regular season, all but 2 teams had more home wins than road wins, the 2 teams being the Miami Heat and the Chicago Bulls.) In my simulations, I will allow $p_H$ and $p_A$ to take on any values between 0 and 1.

The random variable denoting the number of wins in a best-of-7 series will depend on whether the team starts at home or not. (We will assume our team will alternate being playing at home and playing away.) If starting at home, the number of wins has distribution $\text{Binom}(4, p_H) + \text{Binom}(3, p_A)$; if starting away, the distribution would be $\text{Binom}(3, p_H) + \text{Binom}(4, p_A)$.

We can modify the sim_fn function from the previous post easily so that it now takes in more parameters (home_p, away_p, and start indicating whether the first game is at “home” or “away”):

sim_fn <- function(simN, n, home_p, away_p, start) {
if (n %% 2 == 0) {
stop("n should be an odd number")
}

if (start == "home") {
more_p <- home_p
less_p <- away_p
} else {
more_p <- away_p
less_p <- home_p } mean(rbinom(simN, (n+1)/2, more_p) + rbinom(simN, (n-1)/2, less_p) > n / 2)
}


The following code sets up a grid of home_p,away_p and start values, then runs 50,000 simulation runs for each combination of those values:

home_p <- seq(0, 1, length.out = 51)
away_p <- seq(0, 1, length.out = 51)
df <- expand.grid(home_p = home_p, away_p = away_p, start = c("home", "away"),
stringsAsFactors = FALSE)

set.seed(111)
simN <- 50000
n <- 7
df$win_prob <- apply(df, 1, function(x) sim_fn(simN, n, as.numeric(x[1]), as.numeric(x[2]), x[3]))  How can we plot our results? For each combination of (home_p, away_p, start) values, we want to plot the series win probability. A decent 2D representation would be a heatmap: library(tidyverse) ggplot(df, aes(x = home_p, y = away_p)) + geom_raster(aes(fill = win_prob)) + labs(title = paste("Win probability for best of", n, "series"), x = "Home win prob", y = "Away win prob") + scale_fill_distiller(palette = "Spectral", direction = 1) + facet_wrap(~ start, labeller = label_both) + coord_fixed() + theme(legend.position = "bottom")  While the colors give us a general sense of the trends, we could overlay the heatmap with contours to make comparisons between different points easier. Here is the code, the geom_text_contour function from the metR package is for labeling the contours. library(metR) ggplot(df, aes(x = home_p, y = away_p, z = win_prob)) + geom_raster(aes(fill = win_prob)) + geom_contour(col = "white", breaks = 1:9 / 10) + geom_text_contour(col = "#595959") + labs(title = paste("Win probability for best of", n, "series"), x = "Home win prob", y = "Away win prob") + scale_fill_distiller(palette = "Spectral", direction = 1) + facet_wrap(~ start, labeller = label_both) + coord_fixed() + theme(legend.position = "bottom")  For a full 3D surface plot, I like to use the plotly package. This is because it is hard to get information from a static 3D plot: I want to be able to turn the plot around the axes to see it from different angles. plotly allows you to do that, and it also gives you information about the data point that you are hovering over. The code below gives us a 3D surface plot for the start = "home" data. (Unfortunately WordPress.com does not allow me to embed the chart directly in the post, so I am just showing a screenshot to give you a sense of what is possible. Run the code in your own R environment and play with the graph!) library(plotly) df_home <- df %>% filter(start == "home") win_prob_matrix <- matrix(df_home$win_prob, nrow = length(away_p), byrow = TRUE) plot_ly(x = home_p, y = away_p, z = win_prob_matrix) %>%
z = list(
show = TRUE,
usecolormap = TRUE,
highlightcolor = "#ff0000",
project = list(z = TRUE)
)
)
) %>%
layout(
title = paste("Win probability for best of", n, "series (start at home)"),
scene = list(
xaxis = list(title = "Home win prob"),
yaxis = list(title = "Away win prob"),
zaxis = list(title = "Series win prob")
)
)


How much better is it to start at home than away?

To answer this question, we will subtract the probability of winning a series when starting away from that when starting at home. The more positive the result, the more “important” starting at home is.

# difference in starting at home vs. away
df2 <- df %>% spread(start, value = win_prob) %>%

# raster plot
ggplot(df2, aes(x = home_p, y = away_p)) +
labs(title = "Difference in series win prob (starting at home vs. away)",
x = "Home win prob", y = "Away win prob") +
scale_fill_distiller(palette = "Spectral", direction = 1) +
coord_fixed() +
theme(legend.position = "bottom")


It looks like most of the variation is in the corners. Drawing contours at manually-defined levels helps to give the center portion of the plot more definition (note the “wigglyness” of the contours: this is an artifact of Monte Carlo simulation):

breaks <- c(-0.5, -0.2, -0.1, -0.05, -0.02, 0, 0.02, 0.05, 0.1, 0.2, 0.5)
ggplot(df2, aes(x = home_p, y = away_p, z = home_adv)) +
geom_contour(col = "white", breaks = breaks) +
geom_text_contour(breaks = breaks,
col = "#595959", check_overlap = TRUE) +
labs(title = "Difference in series win prob",
subtitle = "Starting at home vs. away",
x = "Home win prob", y = "Away win prob") +
scale_fill_distiller(palette = "Spectral", direction = 1) +
coord_fixed() +
theme(legend.position = "bottom")


When does being at home tip you over (or back under) the 50% threshold?

That is, when does starting at home make you more likely to win the series than your opponent, but starting away makes your opponent the favorite? This can be answered easily by performing some data manipulation. In the plot, “Better” means that you are the favorite at home but the underdog away, while “Worse” means the opposite.

threshold <- 0.5
df2$home_impt <- ifelse(df2$home > threshold & df2$away < threshold, 1, ifelse(df2$home < threshold & df2$away > threshold, -1, NA)) df2$home_impt <- factor(df2$home_impt) df2$home_impt <- fct_recode(df2$home_impt, "Better" = "1", "Worse" = "-1") df2 %>% filter(!is.na(home_impt)) %>% ggplot(aes(x = home_p, y = away_p)) + geom_raster(aes(fill = home_impt)) + labs(title = paste("Does starting at home push you \nover the series win threshold", threshold, "?"), x = "Home win prob", y = "Away win prob") + scale_fill_manual(values = c("#ff0000", "#0000ff")) + coord_fixed() + theme(legend.position = "bottom")  The blue area might not look very big, but sometimes teams do fall in it. The plot below is the same as the above, but overlaid with the 30 NBA teams (home/away win probabilities based on the regular season win percentages). The team squarely in the blue area? The Detroit Pistons. The two teams on the edge? The Orlando Magic and the Charlotte Hornets. How does the length of the series affect the series win probability? In the analysis above, we focused on best-of-seven series. What happens to the series win probabilities if we change the length of the series. Modifying the code for the first plot above slightly (and doing just 10,000 simulations for each parameter setting), we can get the following plot: There doesn’t seem to be much difference between best-of-7 and best-of-9 series. What happens as $n$ grows large? The plot for best-of-1001 series is below, try to figure out why this is the case! Code for this post (in one file) can be found here. # Probability of winning a best-of-7 series The NBA playoffs are in full swing! A total of 16 teams are competing in a playoff-format competition, with the winner of each best-of-7 series moving on to the next round. In each matchup, two teams play 7 basketball games against each other, and the team that wins more games progresses. Of course, we often don’t have to play all 7 games: we can determine the overall winner once either team reaches 4 wins. During one of the games, a commentator made a remark along the lines of “you have no idea how hard it is for the better team to lose in a best-of-7 series”. In this post, I’m going to try and quantify how hard it is to do so! I will do this not via analytical methods, but by Monte Carlo simulation. (You can see the code all at once here.) Let’s imagine an idealized set-up, where for any given game, team A has probability $p$ of beating team B. This probability is assumed to be the same from game to game, and the outcome of each game is assumed to be independent of the others. With these assumptions, the number of games that team A wins has a binomial distribution $\text{Binom}(7, p)$. In our simulation, for each $p$, we will generate a large number of $\text{Binom}(7, p)$ random values and determine the proportion of them that are $\geq 4$: that would be our estimate for the winning probability of the 7-game series. How many random values should we draw? We should draw enough so that we are reasonably confident of the accuracy of the proportion estimate. If we draw $N$ samples, then the plug-in estimate of standard error is $\sqrt{\hat{p}(1 - \hat{p})/N}$. In what follows, we will plot estimates with error bars indicating $\pm 1$ standard error. First, let’s set up a function that takes in three parameters: the number of simulation runs (simN), the total number of games in a series (n), and the probability that team A wins (p). The function returns the proportion of the simN runs which team A wins. sim_fn <- function(simN, n, p) { if (n %% 2 == 0) { stop("n should be an odd number") } mean(rbinom(simN, n, p) > n / 2) }  Next, we set up a data frame with each row representing a different win probability for team A. Because of symmetry inherent in the problem, we focus on win probabilities which are $\geq 0.5$. n <- 7 df <- data.frame(p = seq(0.5, 1, length.out = 26), n = n)  For a start, we set simN to be 100. For each row, we run simN simulation runs to get the win probability estimate and compute the associated standard error estimate. We also compute the upper and lower 1 standard error deviations from the probability estimate, to be used when plotting error bars. set.seed(1043) simN <- 100 df$win_prob <- apply(df, 1, function(x) sim_fn(simN, x[2], x[1]))

# compute std err & p_hat \pm 1 std err
df$std_err <- with(df, sqrt(win_prob * (1 - win_prob) / simN)) df$lower <- with(df, win_prob - std_err)
df$upper <- with(df, win_prob + std_err)  Here is the code for the plot and the plot itself: library(ggplot2) ggplot(df, aes(x = p, y = win_prob)) + geom_line() + geom_linerange(aes(ymin = lower, ymax = upper)) + labs(title = paste("Win probability for best of", n, "series"), x = "Win prob for single game", y = "Win prob for series")  We can see that the line is still quite wiggly with large error bars, suggesting that 100 simulation runs is not enough. (Indeed, we expect the graph to be monotonically increasing.) Below, we run 100,000 simulations for each value of p instead (with the same random seed): We get a much smoother line, and the error bars are so small we can hardly see them. My conclusion here is that while it is harder for the weaker team to win a best-of-7 series that a single game, the odds are not insurmountable. For example, a team that has a 70% chance of winning any one game, which is a huge advantage, still has about a 13% chance of losing a best-of-7 series: not insignificant! How do these probabilities change if we consider best-of-n series for different values of n? The code below is very similar to that above, except that our data frame contains data for n equal to all the odd numbers from 1 to 15, not just n = 7. p <- seq(0.5, 1, length.out = 26) n <- seq(1, 15, length.out = 8) df <- expand.grid(p, n) names(df) <- c("p", "n") set.seed(422) simN <- 100000 df$win_prob <- apply(df, 1, function(x) sim_fn(simN, x[2], x[1]))

# compute std err & p_hat \pm 1 std err
df$std_err <- with(df, sqrt(win_prob * (1 - win_prob) / simN)) df$lower <- with(df, win_prob - std_err)
df\$upper <- with(df, win_prob + std_err)

ggplot(df, aes(x = p, y = win_prob, col = factor(n))) +
geom_line() +
geom_linerange(aes(ymin = lower, ymax = upper)) +
labs(title = paste("Win probability for best of n series"),
x = "Win prob for single game", y = "Win prob for series") +
theme(legend.position = "bottom")


Here is the plot:

As we expect, the series win probabilities increase as n increases. Also, the n = 1 graph corresponds to the $y = x$ line. It looks like the win probabilities don’t change much from best-of-9 to best-of-15.

# Simulated annealing for optimization

Simulated annealing is a probabilistic technique for approximating the global optimum of a given objective function. Because it does not guarantee a global optimum, it is known as a metaheuristic.

According to Wikipedia, annealing is “a heat treatment that alters the physical and sometimes chemical properties of a material to increase its ductility and reduce its hardness, making it more workable. It involves heating a material above its recrystallization temperature, maintaining a suitable temperature for a suitable amount of time, and then cooling” (emphasis mine). As we will see, simulated annealing mimics this physical process in trying to find a function’s global optimum.

There are several versions of simulated annealing. Below, we give a high-level outline of a basic version for minimizing a function $f: \mathcal{X} \mapsto \mathbb{R}$. (Simulated annealing can be used for constrained optimization as well; for simplicity we limit our exposition to unconstrained optimization.) Let $i$ index the iteration number. Simulated annealing has a positive temperature parameter, denoted by $T_i$ at iteration $i$, whose evolution strongly influences the result.

Start at some initial point $x_0 \in \mathcal{X}$. For each iteration $i = 0, \dots$, if the stopping criterion is not satisfied:

1. Choose some random proposal point $y$ from a probability distribution. The lower the temperature, the more concentrated the probability distribution’s mass is around $x_i$.
2. With some acceptance probability, set $x_{i+1} = y$. If not, set $x_{i+1} = x_i$. The acceptance probability is a function of the function values $f(x_i)$ and $f(y)$, and the current temperature $T_i$.
3. Update the temperature parameter.

That’s all there is to it! The devil, of course, is in the details. Here are some examples of what each of the building blocks could be.

Stopping criterion:

• Terminate after a fixed number of iterations, or when the computational budget has been fully utilized.
• Terminate when the objective function does not improve much, e.g. $|f(x_i) - f(x_{i+1})| < 10^{-7}$.

Choosing a random proposal point: This depends very much on what the underlying state space looks like. A benefit of simulated annealing is that it can be applied to very general state spaces $\mathcal{X}$, even discrete ones, as long as there is a notion of distance between two points in the space. (In particular, first-order information about the space (i.e. gradients) is not needed.)

• If $\mathcal{X} = \mathbb{R}^d$, we could set $y = x_i + T \epsilon$, where $\epsilon \sim \mathcal{N}(0,I_d)$, or $\epsilon \sim U[-1, 1]^d$.
• The dependence on $T$ could be different as well, e.g. $y = x_i + \sqrt{T} \epsilon$.
• If $\mathcal{X}$ is the space of permutations on $\{ 1, 2, \dots, n\}$, then $y$ could be selected uniformly at random from the set of permutations which are at most $T$ transpositions from $x_i$.

In the proposals above, the smaller $T$ is, the closer $y$ is likely to be than $x_i$.

Acceptance probability: The probability of accepting $y$ as the next value in the sequence (i.e. as $x_{i+1}$) is a function which depends on $f(x_i)$, $f(x_{i+1})$ and (crucially) the temperature $T$. Typically if $f(y) < f(x_i)$, we would accept $y$ with probability 1, but this does not have to be the case. In general, the acceptance probability should decrease as $T$ decreases and as $f(y) - f(x)$ becomes more positive.

• This is the most popular acceptance function in practice: if $f(y) - f(x) < 0$, accept. If not, accept with probability $\dfrac{1}{1 + \exp \left( \frac{f(y) - f(x)}{T} \right)}$.

Updating the temperature parameter: We just require $T_i \rightarrow 0$ as $i \rightarrow \infty$.

• $T_i = T_0 \cdot (0.95)^i$.
• $T_i = T_0 / i$.
• $T_i = T_0 / \log i$.

References:

1. Henderson et al. The theory and practice of simulated annealing.
2. Mathworks. How simulated annealing works.
3. Mathworks. Simulated annealing terminology.
4. Wikipedia. Simulated annealing.