As I mentioned in a previous blog post, my family and I have found different activities to do as all our other activities are on hold due to the COVID-19 lockdown.

In that post, I looked at first player’s strategy when spinning the big wheel prior to the showcase showdown.

In this post, I’ll explore the rarely won game, Pay the Rent

Pay The Rent

Pay The Rent

Pay The Rent is a challenging game. Essentially, given 6 products, you must estimate the prices of all the products, and place them in 4 tiers. The four tiers are the mailbox (1 product), first floor (2 products), second floor (2 products), and the attic (1 product). The total price of all the products on each floor must cost more than the products on the floor below it.

After arranging the items, the player must opt in to look at the prices for each floor. The mailbox is a given 1000 because its the first selection. For each additional floor the player opts in, they win a successive 5000, 10,000, and potentially 100,000. However, if the player opts in and loses, they lose everything. More specific and thorough rules can be found at The Price Is Right Fandom.

The typical strategy for a player is to choose the cheapest product for the bottom floor, but this strategy is rarely successful because that player needs this cheaper product to offset the cost of a more expensive product on a higher level. One can quickly see the challenge of picking which price to put on the bottom level to maximize the chance of winning $100,000.

Because of this challenge, the game has only been won 5 times out of the 85 times its been played. The Price Is Right has changed up the spread of the prices of the items on several occasions to make the game more likely to win (thus having more than 1 correct solution). Details about these games are here.

To gain a better understanding of how to play, here is one of the few winning contestants playing the game.

What We’ll Do

To Analyze this game, we’ll…

  • Scrape all the prices from historical games
  • Specify an integer program to solve one of these games
  • Find the optimal solution using R
  • Functionalize this process and solve all games

Scrape the data

Since the point of this post is not to talk through web scraping, I’ll summarize how I do this.

First, download the raw data. More specifics on how to scrap data can be found here using rvest.

library(tidyverse)
library(rvest)
library(strex)
tpir <- read_html("https://priceisright.fandom.com/wiki/Pay_the_Rent/Solutions#September_20.2C_2010_.28.235231K.29")

tpir_text <- tpir %>% 
  html_nodes("td") %>%
  html_text() 
tpir_text %>%  head
## [1] "1\n"                                                                     
## [2] "Pantene shampoo ($5.99)\n"                                               
## [3] "Red Baron pizza slices ($3.49) & McCormick cinnamon ($2.98) = $6.47\n"   
## [4] "Whink cook top cleaner ($5.49) & Del Monte canned corn ($1.49) = $6.98\n"
## [5] "9 Lives cat food ($7.30)\n"                                              
## [6] "1\n"

Second, identify which prizes to go which games, delete duplicate rows, and extract dollar values from the text string.

cleaning <- 
tpir_text %>% 
  as_tibble() %>% 
  mutate(value = str_replace(string = value,pattern = "\\(2-pack\\)",replacement = "2-pack"))   %>% 
  mutate(game = row_number()) %>% 
  mutate(group = if_else(str_length(value)<6 & !str_detect(string = value,pattern = "Same"),"New","")) %>% 
  mutate(group_id = if_else(group == "New", str_c(group, "_", game), NULL)) %>% 
  fill(group_id) %>% 
  group_by(group_id) %>% 
  filter(n()==5) %>% 
  ungroup() %>% 
  mutate(group_number = lag(floor(1:nrow(.)/5),1)) %>% 
  mutate(really_new = if_else(group=="New", str_sub(string = value,start = 1,1),NULL)) %>% 
  fill(really_new) %>% 
  filter(really_new == 1) %>% 
  mutate(group_number = replace_na(group_number,0))  %>% 
  mutate(dollars = as_tibble(str_extract_all(string = value,pattern = "\\([^()]+\\)",simplify = TRUE))) %>% 
  mutate(dollars1 = dollars$V1, dollars2 = dollars$V2) %>% 
  select(-c(dollars,game,group,really_new,-group_id)) %>%
  rename(text = value) %>% 
  pivot_longer(cols = contains("dollars")) %>% 
  mutate(string_num = row_number()) 

cleaning
## # A tibble: 1,010 x 6
##    text                           group_id group_number name   value  string_num
##    <chr>                          <chr>           <dbl> <chr>  <chr>       <int>
##  1 "1\n"                          New_1               0 dolla~ ""              1
##  2 "1\n"                          New_1               0 dolla~ ""              2
##  3 "Pantene shampoo ($5.99)\n"    New_1               0 dolla~ "($5.~          3
##  4 "Pantene shampoo ($5.99)\n"    New_1               0 dolla~ ""              4
##  5 "Red Baron pizza slices ($3.4~ New_1               0 dolla~ "($3.~          5
##  6 "Red Baron pizza slices ($3.4~ New_1               0 dolla~ "($2.~          6
##  7 "Whink cook top cleaner ($5.4~ New_1               0 dolla~ "($5.~          7
##  8 "Whink cook top cleaner ($5.4~ New_1               0 dolla~ "($1.~          8
##  9 "9 Lives cat food ($7.30)\n"   New_1               0 dolla~ "($7.~          9
## 10 "9 Lives cat food ($7.30)\n"   New_1               0 dolla~ ""             10
## # ... with 1,000 more rows

Finally, to complete the dollar value extraction, we use str_extract_currencies and join it back into the data frame. Lastly, we convert it into ‘wide’ format.

Admittedly, this is a little ‘hacky’ so I’d be happy for any feedback to improve this process.

final_data <-
  cleaning %>%
  left_join(str_extract_currencies(cleaning$value)) %>%
  mutate(amount = if_else(curr_sym == "(", amount / 100, amount)) %>%
  filter(!is.na(amount)) %>%
  group_by(group_number) %>%
  filter(n() == 6) %>%
  mutate(product_order = row_number()) %>%
  select(group_number, product_order, amount) %>%
  mutate(product_order = str_c("product_", product_order)) %>%
  pivot_wider(names_from = product_order, values_from = amount) %>%
  janitor::clean_names() %>%
  ungroup()

final_data
## # A tibble: 99 x 7
##    group_number product_1 product_2 product_3 product_4 product_5 product_6
##           <dbl>     <dbl>     <dbl>     <dbl>     <dbl>     <dbl>     <dbl>
##  1            0      5.99     3.49       2.98      5.49     1.49       7.3 
##  2            1      2.89     1.69       1.49      3.29     0.290      4.19
##  3            2      5.49     2.99       3.19      2.09     4.99       7.49
##  4            3      4.99     1.79       3.46      2.99     2.59       6.29
##  5            4      3.99     4.99       0.79      3.39     2.99       6.79
##  6            5      4.99     3.39       3.49      7        0.79       7.99
##  7            6      5.29     4.49       1.49      2.79     3.49       6.69
##  8            7      6.99     3.69       3.49      1.89     5.7        7.99
##  9            8      4.49     0.290      4.99      2.99     3.49       6.99
## 10            9      2.49     0.99       3.69      0.69     4.09       4.99
## # ... with 89 more rows

Specify the Integer Program

Since the point of this blog is to talk about the solution, I’ll go into more details here.

Integer Program

To solve this program, we’ll need to set up a linear program.

We’ll define the objective equation as such:

Max

\(price_1 x_{1m} + price_2 x_{2m} + price_3 x_{3m} + price_4 x_{4m} + price_5 x_{5m} + price_6 x_{6m} + price_1 x_{1f} + ... + price_4 x_{4a}\)

Where

\(price_n\) is the price of prize ‘\(n\)

\(x_{ny}\) is 1 if prize \(n\) is selected on row \(y\) or 0 otherwise.

\(y \in \{m = mailbox, f = first floor, s = second floor, a = attic \}\)

Subject to;

Floor capacity constraints:

\(x_{1a} + x_{2a} + x_{3a} + x_{4a} + x_{5a} + x_{6a} = 1\)

\(x_{1s} + x_{2s} + x_{3s} + x_{4s} + x_{5s} + x_{6s} = 2\)

\(x_{1f} + x_{2f} + x_{3f} + x_{4f} + x_{5f} + x_{6f} = 2\)

\(x_{1m} + x_{2m} + x_{3m} + x_{4m} + x_{5m} + x_{6m} = 1\)

A price can only be selected once:

\(x_{1m} + x_{1f} + x_{1s} + x_{1a} = 1\)

\(x_{2m} + x_{2f} + x_{2s} + x_{2a} = 1\)

\(x_{3m} + x_{3f} + x_{3s} + x_{3a} = 1\)

\(x_{4m} + x_{4f} + x_{4s} + x_{4a} = 1\)

\(x_{5m} + x_{5f} + x_{5s} + x_{5a} = 1\)

\(x_{6m} + x_{6f} + x_{6s} + x_{6a} = 1\)

The sum of the prices on each floor must cost more than the floor under it:

\(\sum_{n = 1}^6 price_n x_{na} - \sum_{n = 1}^6 price_n x_{ns} > 0\)

\(\sum_{n = 1}^6 price_n x_{ns} - \sum_{n = 1}^6 price_n x_{nf} > 0\)

\(\sum_{n = 1}^6 price_n x_{nf} - \sum_{n = 1}^6 price_n x_{nm} > 0\)

Coding the Solution

This solution relies on the lpSolve library.

library(lpSolve)

We’ll begin by defining our objective equation.

To create the objective equation, we’ll use the prices from the first time this game was played.

It is a vector of all the prices.

prices <- as.numeric(final_data[1,2:7])
f_obj <- rep(prices,4)
f_obj
##  [1] 5.99 3.49 2.98 5.49 1.49 7.30 5.99 3.49 2.98 5.49 1.49 7.30 5.99 3.49 2.98
## [16] 5.49 1.49 7.30 5.99 3.49 2.98 5.49 1.49 7.30

We input the constraints via matrix form where the rows correspond to individual constraints and the columns correspond to elements of the objective function.

lpSolve assumes non-negativity constraints

f_con <- matrix(c(1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ## attic gets 1 
                  0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ## 2nd floor gets 2
                  0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, ## 1st floor gets 2
                  0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, ## basement gets 1
                  1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, ## a selected once
                  0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, ## b selected once
                  0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, ## c selected once
                  0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, ## d selected once
                  0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, ## e selected once
                  0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, ## f selected once
                  prices,-prices, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ## attic more than 2nd floor
                  0, 0, 0, 0, 0, 0, prices,-prices, 0, 0, 0, 0, 0, 0, ## 2nd floor more than 1st floor
                  0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, prices,-prices ## 1st floor more than mailbox
), nrow = 13, byrow = TRUE)

Next, we code the equality signs for each constraint.

f_dir <- c("=", ## attic gets 1
           "=", ## 2nd floor gets 2
           "=", ## 1st floor gets 2
           "=", ## basement gets 1
           "=", ## a selected once
           "=", ## b selected once
           "=", ## c selected once
           "=", ## d selected once
           "=", ## e selected once
           "=", ## f selected once
           ">", ## attic more than 2nd floor
           ">", ## 2nd floor more than 1st
           ">")  ## 1st floor more than mailbox
 
f_dir
##  [1] "=" "=" "=" "=" "=" "=" "=" "=" "=" "=" ">" ">" ">"

Next, we set values for the right hand side of the constraint matrix.

f_rhs <- c(1, ## attic gets 1
           2, ## 2nd floor gets 2
           2, ## 1st floor gets 1
           1, ## attic gets 1
           1, ## a selected 1
           1, ## b selected 1
           1, ## c selected 1
           1, ## d selected 1
           1, ## e selected 1
           1, ## f selected 1
           0, ## attic - 2nd floor > 0
           0, ## 2nd floor - 1st floor > 0
           0  ## 1st floor - mailbox > 0
)

Now, we can solve the problem.

lp(
  direction = "max",
  objective.in =  f_obj,
  const.mat =  f_con,
  const.dir =  f_dir,
  const.rhs =  f_rhs, 
  all.bin = TRUE
  )
## Success: the objective function is 26.74

You can see this only provides the solution to the objective equation. This isn’t interesting, because it should be the sum of all the prizes since our constrains limit us to selecting each prize once.

What we need is the location of each prize

lp(
  direction = "max",
  objective.in = f_obj,
  const.mat = f_con,
  const.dir = f_dir,
  const.rhs = f_rhs,
  all.bin = TRUE
)$solution
##  [1] 0 0 0 0 0 1 0 0 0 1 1 0 0 1 1 0 0 0 1 0 0 0 0 0

This provides the solution vector with 1s for the location of each prize in the objective equation. With a little massaging, we can understand this better

tibble(prize = rep(c("a","b","c","d","e","f"),4),
       row = sort(rep(c("1.attic","2.second","3.first","4.mailbox"),6)),
       selected = lp("max", f_obj, f_con, f_dir, f_rhs, all.bin = TRUE)$solution,
       prices = rep(f_obj,1)
       ) %>%
  filter(selected == 1) %>% 
  group_by(row) %>% 
  mutate(floor_sum = sum(prices))
## # A tibble: 6 x 5
## # Groups:   row [4]
##   prize row       selected prices floor_sum
##   <chr> <chr>        <dbl>  <dbl>     <dbl>
## 1 f     1.attic          1   7.3       7.3 
## 2 d     2.second         1   5.49      6.98
## 3 e     2.second         1   1.49      6.98
## 4 b     3.first          1   3.49      6.47
## 5 c     3.first          1   2.98      6.47
## 6 a     4.mailbox        1   5.99      5.99

We can see that this solution meets our constraints for the game. Nice!

Celebration!

Functionalize and solve all games

Lastly, lets set up a function to solve all the games.

This function will take a price data frame, and from the optimal solution, output the location of each products that satisfies the constraints.

solution_getter <- function(floor = 25, data){
  
  prices <- as.numeric(data[floor,2:7])
  f.obj <- rep(prices,4)

solution <-
tibble(prize = rep(c("a","b","c","d","e","f"),4),
       floor = sort(rep(c("1.attic","2.second","3.first","4.mailbox"),6)),
       selected = lp("max", f_obj, f_con, f_dir, f_rhs, all.bin = TRUE)$solution,
       prices = rep(f.obj,1)) %>% 
  filter(selected == 1) %>% 
  group_by(floor) %>% mutate(floor_sum = sum(prices)) %>%  ungroup() %>% 
  select(-selected)
return(solution)
}

Given this function, we can find the solution for any game.

Solution to game 1:

solution_getter(floor = 1, data = final_data)
## # A tibble: 6 x 4
##   prize floor     prices floor_sum
##   <chr> <chr>      <dbl>     <dbl>
## 1 f     1.attic     7.3       7.3 
## 2 d     2.second    5.49      6.98
## 3 e     2.second    1.49      6.98
## 4 b     3.first     3.49      6.47
## 5 c     3.first     2.98      6.47
## 6 a     4.mailbox   5.99      5.99

Solution to game 50:

solution_getter(floor = 50,data = final_data)
## # A tibble: 6 x 4
##   prize floor     prices floor_sum
##   <chr> <chr>      <dbl>     <dbl>
## 1 f     1.attic     6.99      6.99
## 2 d     2.second    5.49      6.48
## 3 e     2.second    0.99      6.48
## 4 b     3.first     3.99      5.98
## 5 c     3.first     1.99      5.98
## 6 a     4.mailbox   3.29      3.29

Solve all games

With the help of the magical purrr family of functions, we can map this function over all the data.

purrr::map_dfr(1:(nrow(final_data)), ~solution_getter(floor = .x, data = final_data),.id = "Game_ID") %>% 
  DT::datatable(options = list(pageLength = 6))