# Given values
n <- 2015
phat <- 0.311
null <- 0.362
SD_pi <- sqrt(null * (1 - null) / n)
z_stat <- (phat - null) / SD_pi
p_value <- 2 * (1 - pnorm(abs(z_stat)))
SD_pi[1] 0.010706
z_stat[1] -4.763685
p_value[1] 1.900887e-06
This exercise uses data from the 2024 General Social Survey (GSS), in which 2,015 adult Americans were surveyed and 31.1% reported living in a different state from where they were born.
We treat this as a categorical variable (“different state” vs. “same state”) and use z-tests and normal approximations to the binomial.
Q1. Identify the population and sample in this survey.
Population: All adult Americans aged 18 and older.
Sample: The 2,015 adults surveyed in the 2024 GSS.
Q2. Is it reasonable to believe that the sample of 2,015 adult Americans is representative of the population of all adult Americans? Justify your answer in terms of how the data were collected.
It is reasonable to believe the sample is representative if the GSS used random selection.
The GSS employs probability-based sampling across U.S. households, which supports generalization to the broader adult U.S. population.
Q3. Is the value 31.1% a statistic or a parameter? Which symbol is typically used to represent this quantity?
Q4. Identify (in words) the population parameter that the General Social Survey is attempting to estimate.
Population parameter is \(\pi\), the true proportion of all adult Americans who currently live in a different state from where they were born.
Q5. Is it reasonable to conclude that exactly 31.1% of all adult Americans currently live in a different state from where they were born? Explain why or why not.
It is not reasonable to conclude that exactly 31.1% of all adults live in another state.
Sampling variation means that if another random sample of 2,015 were drawn, \(\hat{p}\) would likely differ.
Instead, we estimate a plausible range for \(\pi\) using confidence intervals.
Q6. Although we expect \(\pi\) to be close to 0.311, suppose we test
\(H_0: \pi = 0.362\) vs. \(H_A: \pi \neq 0.362\).
Use R to calculate the standardized test statistic and the two-sided p-value. Based on your result, would you reject or fail to reject the null hypothesis at \(\alpha = 0.05\)?
[1] 0.010706
[1] -4.763685
[1] 1.900887e-06
Interpretation: The z-statistic measures how many standard deviations \(\hat{p}\) is from the null value.
If the p-value < 0.05, reject \(H_0\).
Based on the small p-value, we reject \(H_0\).
Q7. Interpret the standard deviation under the null hypothesis that you found in Question #6. Explain, in context, what this value tells you.
\(SD_{\pi}\) or SD_pi is the expected variability in \(\hat{p}\) if the population proportion were truly 0.362.
It tells us how much \(\hat{p}\) would vary across repeated samples of size 2,015 if \(H_0\) were true.
Q8. Now consider \(\pi = 0.50\). Is this a plausible value for the population proportion \(\pi\)?
Test \(H_0: \pi = 0.50\) vs. \(H_A: \pi \neq 0.50\).
Report your test statistic, p-value, and conclusion given \(\alpha = 0.05\).
[1] 0.01113865
[1] -16.96795
[1] 0
Conclusion: The p-value is very small, so we reject \(H_0\).
A true proportion of 0.50 does not seem plausible.
Note: the \(p\)-value isn’t literally zero, it’s just extremely small.
Q9. Calculate the standard error (SE\(_{\hat{p}}\)) for this study.
How does it compare to the SD\(_\pi\) values from Questions #7 and #8? Explain why they differ.
SE_phat differs from SD_pi because it substitutes \(\hat{p}\) for \(\pi\).
It is slightly smaller because \(\hat{p}=0.311\) yields less variability than 0.50 or 0.362.
Q10. Calculate and interpret a 95% confidence interval for \(\pi\).
Explain what the interval means in the context of this study.
[1] 0.2907884 0.3312116
Interpretation: We are 95% confident that the true proportion lies between these bounds.
About 95% of such intervals from repeated samples would contain the true \(\pi\).
Q11. Calculate a 99% confidence interval for \(\pi\).
Compare it to your 95% interval. How do the midpoint and margin of error change?
[1] 0.2844375 0.3375625
Comparison: The midpoint (0.311) stays the same, but the interval widens.
A higher confidence level increases the margin of error.
Q12. Suppose that the GSS had only taken a sample size of \(n = 215\).
How would this change your confidence interval?
[1] 0.2491234 0.3728766
Interpretation: Smaller \(n\) increases variability, leading to a wider confidence interval and less precise estimation.
| Concept | Formula | When Used |
|---|---|---|
| Standard deviation under null | \(\sqrt{\pi(1-\pi)/n}\) | Hypothesis tests (\(H_0\) assumed true) |
| Standard error (estimated) | \(\sqrt{\hat{p}(1-\hat{p})/n}\) | Confidence intervals (sample-based) |
| z-statistic | \((\hat{p}-\pi_0)/SD_\pi\) | Tests population proportion hypotheses |
| Confidence interval | \(\hat{p} \pm z^* SE_{\hat{p}}\) | Estimates plausible range for \(\pi\) from data |