Exploration Exercise 2.3

Setup

Dataset: sleep <- ma206data::sleep.
Variable of interest: hours (units: hours).

STEP 1: Ask a research question

Q1. Population of interest, parameter (in words), parameter (symbol)
- Population of interest: All Cadets at West Point
- Parameter (in words): The mean number of hours of sleep Cadets got last night.
- Parameter (symbol): \(\mu\).

STEP 2: Design a study and collect data

Q2. Null hypothesis (words + symbols)
- Null hypothesis (words): The population mean sleep time is 6.25 hours.
- Symbol: \(H_0: \mu = 6.25\).

Q3. Alternative hypothesis (words + symbols)
- Alternative hypothesis (words): The population mean sleep time is less than 6.25 hours.
- Symbol: \(H_a: \mu < 6.25\).

Q4. Sampling plan (n = 75)
Select a simple random sample of 75 Cadets from the entire Corps so that every Cadet has an equal chance to be selected.

Q5. Variable measured
Measure hours of sleep last night. This is a quantitative variable (measured in hours).

Q6. Representativeness (good)
Answers will vary.

Q7. Representativeness (not good) + bias/unbiased fill-in
- Not representative: The class is a convenience sample from one course and instructor on a single day. It may over‑ or under‑represent certain groups.
- Fill‑in: Convenience sampling may be biased, whereas simple random sampling is unbiased.

STEP 3: Explore the data

Q8. Make a histogram of the hours variable.

sleep %>% 
  ggplot(aes(x = hours)) +
  geom_histogram(binwidth = 0.5, color = "black", fill = "gray") +
  labs(title = "Sleep Hours (Last Night)",
       x = "Hours",
       y = "Count") +
  theme_minimal()

Q9. Compute the sample statistics (mean, SD, n). Describe histogram.

n    <- nrow(sleep)
xbar <- mean(sleep$hours)
s    <- sd(sleep$hours)

kable(tibble(
  n = n,
  mean_hours = round(xbar,digits=4),
  sd_hours = round(s,digits=3)
))

n	mean_hours	sd_hours
51	6.0098	1.116

Shape: Looks fairly normal / gaussian / bell-shaped
Center: sample mean \(\bar{x} =\) 6.01 hours.
Variability: \(s =\) 1.12 hours.
Outliers: perhaps the one that’s under 4 and maybe the one that’s above 8 hours.

Q10. Preliminary evidence that mean < 6.25?
If \(\bar{x} < 6.25\) and histogram mass is below 6.25, that suggests preliminary evidence, but we should still run the test: inference is needed.

STEP 4: Draw inference beyond the data

Q11. If \(H_0\) is true, how do sample means behave?
They would be approximately normally distributed (for large n), centered at 6.25 with SD = \(\sigma/\sqrt{n}\). Since \(\sigma\) is unknown, we use \(s\).

Q12. Calculate \(\frac{s}{\sqrt{n}}\) (standard error of null distribution).

sd_null <- s / sqrt(n)
sd_null

[1] 0.1562365

Q13. Is the observed mean surprising under \(H_0: \mu=6.25\)?

null <- 6.25
diff_from_null <- xbar - null
kable(tibble(
  xbar = round(xbar,digits=4),
  mu0 = round(null,digits=4),
  diff = round(diff_from_null,digits=4),
  sd_null = round(sd_null,digits=4),
  standardized_diff = round(diff_from_null / sd_null, digits=4)
))

xbar	mu0	diff	sd_null	standardized_diff
6.0098	6.25	-0.2402	0.1562	-1.5377

If the standardized difference is large in magnitude, it suggests the sample mean is surprising under the null.

Q14. Calculate the t‑statistic and p‑value for \(H_a: \mu < 6.25\).

t_stat <- (xbar - null) / sd_null
p_val  <- pt(t_stat, df = n - 1)

kable(tibble(
  t_statistic = sprintf("%.3f", t_stat),
  p_value = sprintf("%.5f", p_val)
))

t_statistic	p_value
-1.537	0.06525

Report \(t\) (3 decimals) and \(p-value\) (5 decimals).
Conclusion: Since the \(p-value\) is not less than the \(\alpha = 0.05\) significance level we conclude there is insufficient evidence to reject \(H_0\) that Cadets sleep less than 6.25 hours.

STEP 5: Formulate conclusions

Q15. Can we generalize?
Because the data come from three sections of one instructor’s course on one day, it is a convenience sample. Results may not generalize to the entire Corps. Use caution!

STEP 6: Look back and ahead

Q16. Concerns about design and conclusions
- One‑night snapshot may not represent typical sleep.
- Convenience sampling, not random.
- Limited to one instructor’s sections.
- Self‑reported hours may have error.
- Practical vs. statistical significance.

Q17. Next steps
- Use random or stratified random sampling.
- Collect across multiple nights, sections, majors.
- Consider collecting covariates (major, class, workload, etc).

Exploring Further

Q18. If you rejected \(H_0\), does that prove \(\mu < 6.25\)?
No. Hypothesis tests provide evidence, not proof. Results are subject to sampling error and model assumptions.

Q19. If you rejected \(H_0: \mu = 6.5\), does that prove \(\mu \neq 6.5\)?
No. Rejection means the observed data are inconsistent with \(H_0\), but we cannot prove the true mean. Confidence intervals provide interval estimates rather than definitive proof.