Lecture 3.3 - Hypothesis Testing

Author

Professor MacDonald

Published

April 16, 2025

Hypothesis tests

Hypotheses
$p$ values
The reasoning of hypothesis testing
A hypothesis test for the mean
Interval and tests
$p$ values and decisions: what to tell about a hypothesis test

Hypotheses

King county

Percent of houses with a view

11% of houses in King County have a view
Put ourselves in the shoes of a a researcher – we only have a budget to drive around and sample 100 houses in a particular neighborhood
Does our sample represent the the idea that the neighborhood we chose to sample has a higher or lower percentage of houses with a view as compared to the overall county?

Hypotheses

The starting hypothesis to be tested is call the null hypothesis – null because it assumes that nothing has changed.
- We denote it $H_0$
- Called “H naught”
  - $H_0$ : parameter = hypothesized value
  - $H_0$ : has.a.view = 0.11
The alternative hypothesis is not a single value, it contains all other values
- $H_A$ : parameter $\neq$ hypothesized value
- $H_A$ : has.a.view $\neq$ 0.11

How small to convince us?

If the percent of houses with a view in our sample is 16%, we would be skeptical that there was any difference
- Not so unlikely to be different only by random chance
If the percent of houses with a view is 31%, it would clearly indicate a change from 11%
- Extremely unlikely that this could happen just by random chance
Can turn to confidence interval
- Standard deviation of the sampling distribution
- $SD(p) = \sqrt{\frac{pq}{n}} = \sqrt{\frac{0.11\times0.89}{100}} = 0.031$

Is the rate of houses with a view different?

Idea: simulation

Is the rate of houses with a view different?

In 95% of samples of this size, percent of houses with a view is within 5% to 17% just by chance
In any given sample, should see between 5% and 17% of houses with a view
If we surveyed a neighborhood and actually saw 14%
Surprising?
Should we conclude that the percent of houses with a view in this neighborhood is different overall compared to King County?

CLT means we can use the Normal model

Central Limit Theorem: use Normal model to find probability instead of using a simulation
View rate of 0.14, 0.03 from null hypothesis, 0.11
Use the SD = 0.031 to find the area in the tails that lie more than 0.973 $z$ scores away from the null hypothesis
Indicates how rare observed rate is
Probability is 0.165 in each tail, or about 0.33 total that we would get an observation this far or father from the mean by chance

Conclusion

A difference in view percentages of houses of 0.03 or larger would happen about 33% of the time just by chance.
Does/Doesn’t seem very unusual.
- Both?
So observed proportion of 0.14 does/doesn’t provide evidence that this sample has a different percentage of houses with a view
- Borderline

A trial as a hypothesis test

Begin with the presumption of innocence ( $H_0$ )
Collect evidence
- Bank money in house
- Still wearing mask
- Getaway car found in his name
Evidence beyond a reasonable doubt?
- Is 5% small enough chance?
- How about 1%?
- 6.7%?

A trial as a hypothesis test - decision

Beyond a reasonable doubt - ambiguous
Jury does not use probability to decide.
Null hypothesis - quantify exactly how surprising the evidence would be if the null hypothesis were true
How unlikely is unlikely?
1 out of 20, 5%, or 0.05
1 out of 100, 1%, or 0.01
You must judge for yourself in each situation whether the probability of observing your data is small enough to constitute “reasonable doubt.”

$p$ values

The $p$ value and surprise

The $p$ value is the probability of seeing data like these (or even more unlikely data) given the null hypothesis is true.
Tells us how surprised we would be to get these data given $H_0$ is true.
- $p$ value is very low: Either $H_0$ is not true or something remarkable occurred. Reject $H_0$ .
- $p$ value is high: Not a surprise. Data consistent with the model. Do not reject $H_0$ .

Guilty or not enough evidence?

Defendant is either:
- Guilty: $p$ value too small. The evidence is clear.
- Not Guilty: pp value not small enough. The evidence is not sufficient. Not the same as innocent.
  - May be innocent or may be guilty, but not enough evidence found.
Two Choices
- Fail to reject H0H_0 if pp value large.
  - Never accept $H_0$ .
- Reject $H_0$ if $p$ value is small. Accept $H_A$ .

When the $p$ value is not small

It is wrong to say:
- Accept $H_0$ .
- We have proven $H_0$ .
It is correct to say:
- Fail to reject $H_0$
- There is insufficient evidence to reject $H_0$ .
- $H_0$ may or may not be true.
Example: H0H_0: All swans are white.
- If we sample 100 swans that are all white, there could still be a nonwhite swan.

The reasoning of a hypothesis test

Step 1: state the hypothesis

H0H_0:
- $H_0$ usually states that there’s nothing different.
- $H_0:$ parameter = hypothesized value
- Note the parameter describes the population not the sample.
- $H_0$ is called the null hypothesis.
HAH_A:
- $H_A$ is a statement that something has changed, gotten bigger or smaller
- $H_A$ is called the alternative hypothesis.

Hypotheses about breakfast

What would our hypotheses be if we did a study of DKU students and compared it to the percentage found in this paper (treat it as the population mean)?

Hypotheses about breakfast

Our data claims that 65.6% of students don’t eat breakfast.
In a survey of 26 students at DKU, 11 ate breakfast.
Is there evidence that breakfast rate is below 65.6%?
- $H_0: p = 0.656$
- $H_A: p \neq 0.656$

Step 2: the model

Decide on the model to test the null hypothesis and parameter.
Check conditions, e.g. independence, sample size.
If the conditions are not met, either quit or redesign the study.
Normal models use $z$ scores. Other models may not use $z$ scores.
Name the model, e.g. 1-proportion $z$ test.

Step 2a: 1-proportion $z$ test

Conditions: same as a 1-Proportion $z$ interval
Null hypothesis: $H_0: p = p_0$
Test statistics
- $z = \frac{\hat{p}-p_0}{SD(\hat{p})}$
- $SE(\hat{p}) = \sqrt{\frac{p_0q_0}{n}}$

What conditions do we need to check to do a 1-proportion $z$ test?

Step 2b: checking conditions - eating breakfast

Randomization Condition: The 26 students were a random selection of DKU students
10% Condition: 26 is fewer than 10% of the total number of all students who are of interest.
Success/Failure Condition:
$np_0 = (26)(0.656) = 17 > 10$ ?
$nq_0 = (26)(0.344) = 9 > 10$ ?
The conditions are satisfied. We can use the Normal model and perform a 1-Proportion z-Test.

Step 3: mechanics

Claim: 65.6% students eat breakfast.
- 11 students in our class ate breakfast.
Find pp value
- $SE(\hat{p})=\sqrt{\frac{0.656\cdot0.344}{26}}$
- $z$ score of difference = $\frac{0.423 - 0.656}{0.093} = -2.5$
Probability of seeing that large a difference: pnorm(−2.5)→pnorm(-2.5)\rightarrow 1.2% of the time we will see a difference from the mean this large or larger
- 98.8% of the time we will see a difference this small or smaller.
- Note: $p$ values usually reported as “this large or larger”

What can we conclude from these results?

Step 4: conclusion

Is our sample different than the overall mean of students in the US? $p$ value = 0.012
What can be concluded? What does the $p$ value mean?
- $p$ value = 0.012 $\rightarrow$ Fail to Reject/Reject $H_0$
- The survey data does not provide strong evidence that the rate that DKU students eat breakfast is different than US average.
- This should not be the end of the conversation.
- The next step would be to see if the breakfast eating rate is lower for all classes/among freshman/sophomores/etc.

A hypothesis test for the mean

One sample $t$ test for the mean

Assumptions are the same.
$H_0: \mu=\mu_0$
$t_{n-1}=\frac{\bar{y}-\mu_0}{SE(\bar{y})}$
Standard Error of $\bar{y}: SE(\bar{y})=\frac{\sigma}{\sqrt{n}}$
When the conditions are met and $H_0$ is true, the statistic follows the Student’s $t$ Model with $n – 1$ df.
Use this model to find the $p$ value.

Interval and tests

Intervals and tests

Confidence Intervals
- Start with data and find plausible values for the parameter. CI is data-centric.
- Always 2-sided
Hypothesis Tests
- Using CI, is proposed parameter value consistent with interval? Test is model-centric.
- 2-sided test: Within the confidence interval means fail to reject $H_0$ .
- $p$ value = 1 – C is the cutoff.

The special case with proportions

Confidence intervals
- Use $\hat{p}$ to calculate $SE(\hat{p})=\sqrt{\frac{\hat{p}\hat{q}}{n}}$
Hypothesis Tests
- Use $\hat{p}$ to calculate $SD(\hat{p})=\sqrt{\frac{pq}{n}}$

If $SE(\hat{p})$ and $SD(\hat{p})$ are far from each other, the relationship between the confidence interval and the hypothesis test breaks down.

$p$ values and decisions: what to tell about a hypothesis test

How small a $p$ value is small enough?

How small is small enough is context specific.
Test to see if a renowned musicologist can distinguish between Mozart and Hayden.
- $p$ value of 0.1 may be good enough. Just reaffirming known talent.
A friend claims psychic abilities and can predict heads or tails.
- Very small $p$ value such as 0.01 needed. Breaking scientific theory.

Acceptable $p$ value depends on result’s importance

Proportion of students with full time jobs has increased.
- Not that important. $p$ value = 0.05 will work.
Testing the strength of a bridge based on a sample of concrete quality
- Life and death decision. Need a very small $p$ value
Whether rejecting or failing to reject, always cite the $p$ value.
An accompanying confidence interval helps also.

Recommendations from the American Statistical Association

$p$ values can indicate how incompatible the data are with a specified statistical model.
$p$ values do not measure the probability that the studied hypothesis is true, or the probability that the data were produced by random chance alone.
Scientific conclusions and business or policy decisions should not be based only on whether a $p$ value passes a specific threshold.
Proper inference requires full reporting and transparency.
A $p$ value, or statistical significance, does not measure the size of an effect or the importance of a result.
By itself, a $p$ value does not provide a good measure of evidence regarding a model or hypothesis.

Hypothesis tests

Hypotheses

King county

Percent of houses with a view

Hypotheses

How small to convince us?

Is the rate of houses with a view different?

Is the rate of houses with a view different?

CLT means we can use the Normal model

Conclusion

A trial as a hypothesis test

A trial as a hypothesis test - decision

pp values

The pp value and surprise

Guilty or not enough evidence?

When the pp value is not small

The reasoning of a hypothesis test

Step 1: state the hypothesis

Hypotheses about breakfast

Hypotheses about breakfast

Step 2: the model

Step 2a: 1-proportion zz test

Step 2b: checking conditions - eating breakfast

Step 3: mechanics

Step 4: conclusion

A hypothesis test for the mean

One sample tt test for the mean

Interval and tests

Intervals and tests

The special case with proportions

pp values and decisions: what to tell about a hypothesis test

How small a pp value is small enough?

Acceptable pp value depends on result’s importance

Recommendations from the American Statistical Association

$p$ values

The $p$ value and surprise

When the $p$ value is not small

Step 2a: 1-proportion $z$ test

One sample $t$ test for the mean

$p$ values and decisions: what to tell about a hypothesis test

How small a $p$ value is small enough?

Acceptable $p$ value depends on result’s importance