Lesson 11: One Proportion Z-Test

Calendar

Day 1

Day 2

Milestone 3: Let’s Talk

📄 General Instructions

Deliverables

  1. Academic Articles Worksheet
  2. Introduction & Data Section
Note

Reminder: Also add both items to your binder with an updated Annex B (not graded yet).

Exploration Exercise 1.5

One Proportion Z-Test

Cal


Reese

Little League On-Base Percentage

During my son and his team’s Little League World Series run, Cal got on base \(18\) out of \(35\) times. Across all of Texas East Little Leagues, the average on-base percentage was about \(40\%\).

Question: Is Cal really better than average, or is this just by chance?

Framing the Problem

If the average player gets on base at a \(40\%\) rate, what is the probability that we would observe someone get on base \(18/35 = 0.514\) or higher?

In other words: if Cal were truly a 40% hitter, how often would we see a season this good (or better) just by chance?

Simulating a Season

Let’s pretend we could replay Cal’s season many times under the assumption he is a \(40\%\) hitter.

library(tidyverse)

# simulate one season
one_season <- rbinom(n = 35, size = 1, prob = 0.4)
one_season
 [1] 0 0 0 1 0 1 1 0 0 0 0 0 1 1 0 0 0 0 0 0 0 1 0 0 0 1 1 1 0 0 0 0 0 1 1
# total times on base
on_base_at_bats <- sum(one_season)
on_base_at_bats
[1] 11
# as a proportion
on_base_at_bats / length(one_season)
[1] 0.3142857

Many Simulated Seasons

Now let’s repeat this process \(10{,}000\) times.

n <- 10000

sim_results <- tibble(
  trial = 1:n,
  rocks = rbinom(n, size = 35, prob = 0.4)
) |> 
  mutate(proportion = rocks / 35)

head(sim_results)
# A tibble: 6 × 3
  trial rocks proportion
  <int> <int>      <dbl>
1     1    13      0.371
2     2    11      0.314
3     3    17      0.486
4     4    17      0.486
5     5    11      0.314
6     6    15      0.429

Distribution of Outcomes

Here’s the distribution of on-base proportions from the simulations.

on_base_hist <- sim_results |>
  ggplot(aes(x = proportion)) +
  geom_histogram(binwidth = 1/35, boundary = 0, fill = "skyblue", color = "white") 

on_base_hist

Compare to Observed

Now let’s add Cal’s observed rate (\(0.514\)):

on_base_hist +
  geom_vline(xintercept = 0.514, color = "firebrick", linetype = 5, linewidth = 2)

Probability of Being This Good or Better

Finally, what proportion of simulated seasons were at least this extreme?

sim_results |> 
  summarise(prob_more_extreme = mean(proportion >= 0.514))
# A tibble: 1 × 1
  prob_more_extreme
              <dbl>
1             0.113

Conclusion

In our simulation, only about X% of seasons produced an on-base percentage this high or higher if Cal were truly a 40% hitter.

➡️ This suggests his observed \(0.514\) season is possibly due to chance alone — the evidence is not convincing that he might might be better than average.

Let’s Formalize This with Tintle’s 6 Steps

Step 1. Ask a Research Question

Is Cal’s on-base percentage higher than the Texas East Little League average of 40%?

Step 2. Design a Study & Collect Data

We have observational data from Cal’s 35 plate appearances during the Little League World Series run.
- \(n = 35\) plate appearances
- \(x = 18\) times on base
- Observed proportion: \(\hat \pi = \tfrac{18}{35} \approx 0.514\)

We treat these 35 at-bats as a random sample from his true underlying ability.

Step 3. Explore the Data

The observed proportion of \(0.514\) is above the reference average of \(0.40\). This is about 11 percentage points higher. The key question is whether this difference is large enough to be unlikely by chance.

Step 4. Draw Inferences Beyond the Data

We set up hypotheses:

  • Null hypothesis: \(H_0 : \pi = 0.40\)
  • Alternative hypothesis: \(H_A : \pi > 0.40\)

Compute the standard error and test statistic:

\[ SE = \sqrt{\frac{\hat{p}(1-\hat{p})}{n}} = \sqrt{\frac{0.40(0.60)}{35}} \approx 0.083 \]

\[ z = \frac{\hat{p} - \pi_0}{SE} = \frac{0.514 - 0.40}{0.083} \approx 1.37 \]

Step 5. Formulate Conclusions

The one-tailed \(p\)-value is:

\[ p = P(Z \geq 1.37) \approx 0.085 \]

1 - pnorm(q = 1.37, mean = 0, sd = 1)
[1] 0.08534345

At \(\alpha = 0.05\), this result is not statistically significant. We do not have strong enough evidence to conclude Cal’s on-base percentage is truly higher than 40%.

Step 6. Look Back & Ahead

  • The observed rate of \(0.514\) is higher than average and suggestive (\(p \approx 0.085\)).
  • With only 35 plate appearances, there is considerable variability — a larger sample would provide a clearer answer.
  • Context matters: 35 at bats may not represent long-term ability. Future seasons with more data or more at-bats in a season could confirm or refute this pattern.

Let’s Further Define a Few Things

Null Hypothesis

The null hypothesis (\(H_0\)) is the starting assumption — usually that there is “no difference” or “no effect.”

  • In our example: \(H_0 : \pi = 0.40\)
  • This means we assume Cal’s true on-base probability is the same as the Texas East Little League average of 40%.

Alternative Hypotheses

The alternative hypothesis (\(H_A\)) is what we want to investigate. Depending on the research question, there are three common forms:

  1. Right-tailed (greater than)
    • \(H_A : \pi > 0.40\)
    • In context: Is Cal’s true on-base percentage higher than 40%?
    • This is the version we are using, because the natural question is whether he’s better than average.
  2. Left-tailed (less than)
    • \(H_A : \pi < 0.40\)
    • In context: Is Cal’s true on-base percentage lower than 40%?
    • You would ask this if you suspected Cal might actually be worse than average at getting on base.
  3. Two-tailed (not equal)
    • \(H_A : \pi \neq 0.40\)
    • In context: Is Cal’s true on-base percentage different from 40% (either higher or lower)?
    • You would use this if you want to know whether Cal performs differently than average, without assuming in advance which direction.

Choosing the Right Alternative

  • If you are asking, “Is Cal better than average?”, the right-tailed test is appropriate.
  • If the concern were that Cal struggles at the plate, you’d use a left-tailed test.
  • If you only care whether Cal is different from average in either direction, the two-tailed test is the right choice.

The Parameter \(\pi\)

  • \(\pi\) (the Greek letter pi) represents the population proportion of average of getting on base.
  • In context: \(\pi\) is Cal’s true long-run probability of getting on base each plate appearance.
  • We never observe \(\pi\) directly — we estimate it with \(\hat{\pi}\) (the sample proportion).

The Significance Level \(\alpha\)

  • \(\alpha\) (alpha) is the threshold for evidence against the null hypothesis.
  • Common choices: \(\alpha = 0.05\) (5%) or \(\alpha = 0.01\) (1%).
  • Interpretation: If \(p \leq \alpha\), the result is considered statistically significant — unlikely to occur just by chance if \(H_0\) were true.

The \(z\) Distribution

The \(z\) distribution (also called the standard normal distribution) is a bell-shaped curve with:
- Mean = \(0\)
- Standard deviation = \(1\)

So how do we get from our sample proportion \(\hat{\pi}\) to this special distribution?

Step 1. Sampling Distribution of \(\hat{\pi}\)

Under the null hypothesis \(H_0 : \pi = 0.40\), the sample proportion \(\hat{\pi}\) has a sampling distribution that is approximately normal (by the Central Limit Theorem) with:

  • Mean = \(\pi_0 = 0.40\)
  • Standard deviation = \(SE = \sqrt{\frac{\pi_0(1-\pi_0)}{n}}\)

This tells us what values of \(\hat{\pi}\) we would expect just by chance if the null were true.

Step 2. Standardizing

To compare our observed \(\hat{\pi}\) to this distribution, we standardize it:

  1. Subtract the mean under \(H_0\)
    • This centers the distribution at \(0\) by measuring how far away our observed statistic is from the null.
    • \((\hat{\pi} - \pi_0)\)
  2. Divide by the standard deviation
    • This rescales differences into standard deviation units, so we can judge how unusual they are.
    • \(\dfrac{\hat{\pi} - \pi_0}{SE}\)

Step 3. The Resulting \(z\)

After centering and rescaling, the new standardized statistic follows (approximately) the standard normal distribution, \(N(0,1)\).

That’s why we call it the \(z\) statistic:

\[ z = \frac{\hat{\pi} - \pi_0}{SE} \]

Now we can use the \(z\) distribution to calculate probabilities (like \(p\)-values) for how extreme our observed result is relative to the null hypothesis.

ggplot() +
  geom_function(fun = dnorm, xlim = c(-4,4))

Classroom Example: Rock–Paper–Scissors

Class Activity: With your eyes closed, on the count of three, everyone will make their first throw of Rock–Paper–Scissors.
- 👊 = Rock
- ✋ = Paper
- ✌️ = Scissors

Hold your choice steady so we can tally the results. Closing your eyes helps reduce herding or copying from neighbors.

Step 1. Ask a Research Question

Do students tend to choose Rock more or less often than random chance (\(33\%\)) on their first throw?

Step 2. Design a Study & Collect Data

With eyes closed, on the count of 3 each student chooses one option (Rock/Paper/Scissors) as if starting a game. We then tally the class counts:
- \(R\) = number of Rock
- \(P\) = number of Paper
- \(S\) = number of Scissors
- \(n = R+P+S\)

# Enter the tallies you just collected:
R <- 2   # Rock count
P <- 10   # Paper count
S <- 7   # Scissors count

n <- R + P + S
c(R = R, P = P, S = S, n = n)
 R  P  S  n 
 2 10  7 19 

Step 3. Explore the Data

Compute the sample proportion choosing Rock and make a quick bar chart. Under complete randomness we’d expect each to be near \(n/3\).

pihat <- R / n
pihat
[1] 0.1052632
df <- tibble(option = c("Rock","Paper","Scissors"),
             count  = c(R, P, S))

ggplot(df, aes(option, count)) +
  geom_col() +
  geom_hline(yintercept = n/3, linetype = 2) +
  labs(title = "Class First-Throw Choices",
       subtitle = "Dashed line = expected count if choices were uniform (n/3)",
       x = NULL, y = "Count")

Step 4. Draw Inferences Beyond the Data

We’ll do two things:

  1. Simulate many samples of size \(n\) assuming \(p_{Rock}=1/3\), and estimate a two-sided \(p\)-value by comparing simulated proportions to the observed.
  2. Do the math version using the \(z\)-distribution.

Simulate \(n\) under \(H_0 : \pi = 1/3\)

pi0 <- 1/3
N  <- 10000               # number of simulations
R_obs <- R                # keep observed Rock consistent with Step 2
pihat_obs <- R_obs / n

# Simulate many samples under H0
sim_results <- tibble(
  trial = 1:N,
  rocks = rbinom(N, size = n, prob = pi0)   # Rock counts in each simulated sample
) |>
  mutate(pihat = rocks / n)                  # simulated sample proportions

sim_results
# A tibble: 10,000 × 3
   trial rocks pihat
   <int> <int> <dbl>
 1     1     4 0.211
 2     2     8 0.421
 3     3     6 0.316
 4     4     3 0.158
 5     5     8 0.421
 6     6     3 0.158
 7     7     7 0.368
 8     8     5 0.263
 9     9     2 0.105
10    10     2 0.105
# ℹ 9,990 more rows
# Plot the simulated sampling distribution with observed and null marked
sim_results |>
  ggplot(aes(x = pihat)) +
  geom_histogram(binwidth = 1/n, fill = "skyblue", color = "white") +
  geom_vline(xintercept = pihat_obs, color = "firebrick", linetype = 2, linewidth = 1.2) +
  geom_vline(xintercept = pi0, color = "gray40", linewidth = 1.2) +
  labs(title = paste0("Sampling Distribution of p̂i under H0 (n = ", n, ")"),
       subtitle = paste0("Observed p̂ = ", round(pihat_obs,3), 
                         " | Null pi0 = ", round(pi0,3)),
       x = "p̂ (proportion Rock)",
       y = "Count") +
  theme_minimal()

# Two-sided simulation p-value
obs_abs_diff <- abs(pihat_obs - pi0)

sim_p_two_sided <- sim_results |> 
  mutate(pihat_different_than_null = pihat - pi0) |>          # difference from null
  mutate(abs_of_difference = abs(pihat_different_than_null)) |> 
  mutate(is_extreme = abs_of_difference >= obs_abs_diff) |> # flag extremes
  summarise(p_val = mean(is_extreme))                       # proportion of extremes = p-value

sim_p_two_sided
# A tibble: 1 × 1
   p_val
   <dbl>
1 0.0492

Two-Sided Test (math/analytic via \(z\))

  • \(H_0: p_{Rock} = 1/3\)
  • \(H_A: p_{Rock} \neq 1/3\)

\(SE = \sqrt{\frac{\pi_0(1-\pi_0)}{n}}\)

# Standard Error under H0
SE <- sqrt(pi0 * (1 - pi0) / n)
SE
[1] 0.1081476

\[ z = \frac{\hat{\pi} - \pi_0}{SE} \]

# Test Statistic (z)
z_stat <- (pihat_obs - pi0) / SE
z_stat
[1] -2.108878

\[ p = 2 \times P(Z \geq |z|) \]

# Two-Sided p-value
p_val_two_sided <- 2 * (1 - pnorm(abs(z_stat)))
p_val_two_sided
[1] 0.03495507

Step 5. Formulate Conclusions

Now we bring the two approaches together:

  • Simulation gave us an empirical \(p\)-value by resampling under \(H_0\).
  • Math/analytic (\(z\)-test) gave us an approximate \(p\)-value using the standard normal.
              n           R_obs       pihat_obs             pi0          z_stat 
        19.0000          2.0000          0.1050          0.3330         -2.1090 
sim_p_two_sided   z_p_two_sided           alpha 
         0.0492          0.0350          0.0500 
Simulation-based decision: Reject H0 (evidence of ≠ 1/3) 
Z-approximation decision: Reject H0 (evidence of ≠ 1/3) 

Interpretation:
With \(n\) throws and observed \(\hat p\), the simulation \(p\)-value tells us how unusual the result is if \(p=1/3\) were true. The \(z\) test gives a similar answer using a theoretical normal curve. At \(\alpha = 0.05\), compare both to decide whether to reject \(H_0\).

Step 6. Look Back & Ahead

Code
library(shiny)
library(ggplot2)
library(dplyr)
library(tibble)

ui <- fluidPage(
  titlePanel("One-Proportion Test: z-formulas + Live Simulation"),
  withMathJax(),
  tags$hr(),
  
  fluidRow(
    column(
      width = 4,
      h4("Inputs"),
      numericInput("x", "Observed successes (x)", value = 2, min = 0, step = 1),
      numericInput("n", "Sample size (n)", value = 19, min = 1, step = 1),
      sliderInput("pi0", HTML("&pi;<sub>0</sub> (null proportion)"), min = 0, max = 1,
                  value = 1/3, step = 0.01),
      numericInput("B", "Simulations (B)", value = 10000, min = 100, step = 100),
      numericInput("seed", "Random seed", value = 26, min = 1, step = 1),
      helpText("Tip: This defaults to the R/P/S example: x = 2 (Rock), n = 19, π0 = 1/3.")
    ),
    column(
      width = 8,
      h4("Formulas"),
      # Show the formulas via MathJax
      div(style = "font-size: 1.15em; margin-bottom: 8px;",
          "$$ SE = \\sqrt{\\frac{\\pi_0(1 - \\pi_0)}{n}}, \\qquad z = \\frac{\\hat{p} - \\pi_0}{SE} $$"
      ),
      h4("Computed Values"),
      tableOutput("value_table"),
      tags$br(),
      h4("Sampling Distribution under H0 (Simulated)"),
      plotOutput("hist_plot", height = "330px"),
      helpText("Histogram shows simulated \\(\\hat p\\) under H0. Red line = observed \\(\\hat p\\). Gray line = \\(\\pi_0\\).")
    )
  )
)

server <- function(input, output, session) {
  # Basic reactives
  phat <- reactive({
    req(input$n > 0)
    input$x / input$n
  })
  
  SE <- reactive({
    sqrt(input$pi0 * (1 - input$pi0) / input$n)
  })
  
  z_stat <- reactive({
    (phat() - input$pi0) / SE()
  })
  
  # p-values for three alternatives
  p_right <- reactive({ 1 - pnorm(z_stat()) })                 # H_A: p > pi0
  p_left  <- reactive({ pnorm(z_stat()) })                     # H_A: p < pi0
  p_two   <- reactive({ 2 * (1 - pnorm(abs(z_stat()))) })      # H_A: p != pi0
  
  # Simulation under H0
  sim_df <- reactive({
    req(input$B >= 100)
    set.seed(input$seed)
    rocks <- rbinom(input$B, size = input$n, prob = input$pi0)
    tibble(
      phat = rocks / input$n
    )
  })
  
  # Output: table of computed values
  output$value_table <- renderTable({
    tibble::tibble(
      `x (successes)` = input$x,
      `n (trials)`    = input$n,
      `π0 (null)`     = round(input$pi0, 4),
      `p̂ = x/n`      = round(phat(), 4),
      `SE`            = round(SE(), 5),
      `z`             = round(z_stat(), 4),
      `p (right)`     = signif(p_right(), 4),
      `p (left)`      = signif(p_left(), 4),
      `p (two-sided)` = signif(p_two(), 4)
    )
  }, striped = TRUE, bordered = TRUE, spacing = "s", digits = 6)
  
  # Output: histogram with vertical lines at phat and pi0
  output$hist_plot <- renderPlot({
    df <- sim_df()
    ggplot(df, aes(x = phat)) +
      geom_histogram(binwidth = 1 / input$n, color = "white") +
      geom_vline(xintercept = phat(), color = "firebrick", linetype = 2, linewidth = 1.2) +
      geom_vline(xintercept = input$pi0, color = "gray40", linewidth = 1.2) +
      labs(
        x = expression(hat(p) ~ "(proportion)"),
        y = "Count",
        title = paste0("Simulated Sampling Distribution of ", expression(hat(p)), " under H0"),
        subtitle = paste0("n = ", input$n, ", π0 = ", round(input$pi0, 3),
                          ", observed p̂ = ", round(phat(), 3),
                          "; B = ", input$B)
      ) +
      theme_minimal(base_size = 12) +
      xlim(c(0,1))
  })
}

shinyApp(ui, server)

Reference Table: \(z\)-Tests for One Proportion

For all cases:

\[ SE = \sqrt{\frac{\pi_0(1 - \pi_0)}{n}}, \quad z = \frac{\hat{p} - \pi_0}{SE} \]

Alternative Hypothesis Formula for \(p\)-value R Code
\(H_A: p > \pi_0\) \(p = 1 - \Phi(z)\) p_val <- 1 - pnorm(z_stat)
\(H_A: p < \pi_0\) \(p = \Phi(z)\) p_val <- pnorm(z_stat)
\(H_A: p \neq \pi_0\) \(p = 2 \cdot (1 - \Phi(|z|))\) p_val <- 2 * (1 - pnorm(abs(z_stat)))

Where:

  • \(\hat{p} = R/n\) (sample proportion)
  • \(\pi_0\) = hypothesized proportion under \(H_0\)
  • \(\Phi(\cdot)\) = cumulative distribution function (CDF) of the standard normal distribution.

Board Problem

A hospital claims that 85% of discharge summaries are finalized within 24 hours.
In an audit of 60 summaries, 46 were finalized within 24 hours.

Research Question: Is the true proportion finalized within 24 hours less than 85%?

  • State the hypotheses.
  • Explain (in words) how you would simulate this test (do not actually simulate).
  • Then, perform the mathematical one-proportion \(z\) test.

Step 1. Ask a Research Question

Do fewer than 85% of discharge summaries get finalized within 24 hours?

Step 2. State Hypotheses

  • \(H_0 : \pi = 0.85\)
  • \(H_A : \pi < 0.85\)

Step 3. Collect and Summarize the Data

  • \(n = 60\) summaries
  • \(x = 46\) finalized within 24h
  • \(\hat{\pi} = \tfrac{x}{n} = \tfrac{46}{60} = 0.767\)

Step 4. Draw Inferences

Simulation (conceptual only)

  • Under \(H_0\), simulate many samples of size \(n=60\) with \(\pi_0=0.85\).
  • For each, compute \(\hat{\pi}_{sim}\).
  • Estimate the left-tailed \(p\)-value as the fraction of simulations with \(\hat{\pi}_{sim} \le \hat{\pi}_{obs} = 0.767\).

Mathematical Approach (z-test)

Standard error under \(H_0\):

\[ SE = \sqrt{\frac{\pi_0 (1-\pi_0)}{n}} = \sqrt{\frac{0.85 \cdot 0.15}{60}} \approx 0.046 \]

Test statistic:

\[ z = \frac{\hat{\pi} - \pi_0}{SE} = \frac{0.767 - 0.85}{0.046} \approx -1.80 \]

\(p\)-value (left-tailed):

\[ p = \Phi(z) = \Phi(-1.80) \approx 0.036 \]

pnorm(-1.8)
[1] 0.03593032

Step 5. Make a Decision

At \(\alpha = 0.05\), since \(p \approx 0.036 < 0.05\), we reject \(H_0\).

Step 6. State a Conclusion in Context

There is statistical evidence that fewer than 85% of discharge summaries are completed within 24 hours.

Before you leave

Today:

  • Any questions for me?

Upcoming Graded Events