Lecture 3.3 - Hypothesis Testing

Author

Professor MacDonald

Published

April 16, 2025

Hypothesis tests

  • Hypotheses
  • pp values
  • The reasoning of hypothesis testing
  • A hypothesis test for the mean
  • Interval and tests
  • pp values and decisions: what to tell about a hypothesis test

Hypotheses

King county

One sided tests

Percent of houses with a view

  • 11% of houses in King County have a view

  • Put ourselves in the shoes of a a researcher – we only have a budget to drive around and sample 100 houses in a particular neighborhood

  • Does our sample represent the the idea that the neighborhood we chose to sample has a higher or lower percentage of houses with a view as compared to the overall county?

Hypotheses

  • The starting hypothesis to be tested is call the null hypothesis – null because it assumes that nothing has changed.
    • We denote it H0H_0
    • Called “H naught”
      • H0H_0: parameter = hypothesized value
      • H0H_0: has.a.view = 0.11
  • The alternative hypothesis is not a single value, it contains all other values
    • HAH_A: parameter \neq hypothesized value
    • HAH_A: has.a.view \neq 0.11

How small to convince us?

  • If the percent of houses with a view in our sample is 16%, we would be skeptical that there was any difference
    • Not so unlikely to be different only by random chance
  • If the percent of houses with a view is 31%, it would clearly indicate a change from 11%
    • Extremely unlikely that this could happen just by random chance
  • Can turn to confidence interval
    • Standard deviation of the sampling distribution
    • SD(p)=pqn=0.11×0.89100=0.031SD(p) = \sqrt{\frac{pq}{n}} = \sqrt{\frac{0.11\times0.89}{100}} = 0.031

Is the rate of houses with a view different?

  • Idea: simulation

Is the rate of houses with a view different?

  • In 95% of samples of this size, percent of houses with a view is within 5% to 17% just by chance

  • In any given sample, should see between 5% and 17% of houses with a view

  • If we surveyed a neighborhood and actually saw 14%

  • Surprising?

  • Should we conclude that the percent of houses with a view in this neighborhood is different overall compared to King County?

CLT means we can use the Normal model

  • Central Limit Theorem: use Normal model to find probability instead of using a simulation

  • View rate of 0.14, 0.03 from null hypothesis, 0.11

  • Use the SD = 0.031 to find the area in the tails that lie more than 0.973 zz scores away from the null hypothesis

  • Indicates how rare observed rate is

  • Probability is 0.165 in each tail, or about 0.33 total that we would get an observation this far or father from the mean by chance

Conclusion

  • A difference in view percentages of houses of 0.03 or larger would happen about 33% of the time just by chance.

  • Does/Doesn’t seem very unusual.

    • Both?
  • So observed proportion of 0.14 does/doesn’t provide evidence that this sample has a different percentage of houses with a view

    • Borderline

A trial as a hypothesis test

  • Begin with the presumption of innocence (H0H_0)

  • Collect evidence

    • Bank money in house
    • Still wearing mask
    • Getaway car found in his name
  • Evidence beyond a reasonable doubt?

    • Is 5% small enough chance?
    • How about 1%?
    • 6.7%?

A trial as a hypothesis test - decision

  • Beyond a reasonable doubt - ambiguous

  • Jury does not use probability to decide.

  • Null hypothesis - quantify exactly how surprising the evidence would be if the null hypothesis were true

  • How unlikely is unlikely?

  • 1 out of 20, 5%, or 0.05

  • 1 out of 100, 1%, or 0.01

  • You must judge for yourself in each situation whether the probability of observing your data is small enough to constitute “reasonable doubt.”

pp values

The pp value and surprise

  • The pp value is the probability of seeing data like these (or even more unlikely data) given the null hypothesis is true.

  • Tells us how surprised we would be to get these data given H0H_0 is true.

    • pp value is very low: Either H0H_0 is not true or something remarkable occurred. Reject H0H_0.

    • pp value is high: Not a surprise. Data consistent with the model. Do not reject H0H_0.

Guilty or not enough evidence?

  • Defendant is either:
    • Guilty: pp value too small. The evidence is clear.
    • Not Guilty: pp value not small enough. The evidence is not sufficient. Not the same as innocent.
      • May be innocent or may be guilty, but not enough evidence found.
  • Two Choices
    • Fail to reject H0H_0 if pp value large.
      • Never accept H0H_0.
    • Reject H0H_0 if pp value is small. Accept HAH_A.

When the pp value is not small

  • It is wrong to say:
    • Accept H0H_0.
    • We have proven H0H_0.
  • It is correct to say:
    • Fail to reject H0H_0
    • There is insufficient evidence to reject H0H_0.
    • H0H_0 may or may not be true.
  • Example: H0H_0: All swans are white.
    • If we sample 100 swans that are all white, there could still be a nonwhite swan.

The reasoning of a hypothesis test

Step 1: state the hypothesis

  • H0H_0:
    • H0H_0 usually states that there’s nothing different.
    • H0:H_0: parameter = hypothesized value
    • Note the parameter describes the population not the sample.
    • H0H_0 is called the null hypothesis.
  • HAH_A:
    • HAH_A is a statement that something has changed, gotten bigger or smaller
    • HAH_A is called the alternative hypothesis.

Hypotheses about breakfast

Breakfast study

What would our hypotheses be if we did a study of DKU students and compared it to the percentage found in this paper (treat it as the population mean)?

Hypotheses about breakfast

  • Our data claims that 65.6% of students don’t eat breakfast.

  • In a survey of 26 students at DKU, 11 ate breakfast.

  • Is there evidence that breakfast rate is below 65.6%?

    • H0:p=0.656H_0: p = 0.656
    • HA:p0.656H_A: p \neq 0.656

Step 2: the model

  • Decide on the model to test the null hypothesis and parameter.

  • Check conditions, e.g. independence, sample size.

  • If the conditions are not met, either quit or redesign the study.

  • Normal models use zz scores. Other models may not use zz scores.

  • Name the model, e.g. 1-proportion zz test.

Step 2a: 1-proportion zz test

  • Conditions: same as a 1-Proportion zz interval

  • Null hypothesis: H0:p=p0H_0: p = p_0

  • Test statistics

    • z=p̂p0SD(p̂)z = \frac{\hat{p}-p_0}{SD(\hat{p})}
    • SE(p̂)=p0q0nSE(\hat{p}) = \sqrt{\frac{p_0q_0}{n}}

What conditions do we need to check to do a 1-proportion zz test?

Step 2b: checking conditions - eating breakfast

  • Randomization Condition: The 26 students were a random selection of DKU students

  • 10% Condition: 26 is fewer than 10% of the total number of all students who are of interest.

  • Success/Failure Condition:

  • np0=(26)(0.656)=17>10np_0 = (26)(0.656) = 17 > 10?

  • nq0=(26)(0.344)=9>10nq_0 = (26)(0.344) = 9 > 10?

  • The conditions are satisfied. We can use the Normal model and perform a 1-Proportion z-Test.

Step 3: mechanics

  • Claim: 65.6% students eat breakfast.
    • 11 students in our class ate breakfast.
  • Find pp value
    • SE(p̂)=0.6560.34426SE(\hat{p})=\sqrt{\frac{0.656\cdot0.344}{26}}
    • zz score of difference = 0.4230.6560.093=2.5\frac{0.423 - 0.656}{0.093} = -2.5
  • Probability of seeing that large a difference: pnorm(2.5)pnorm(-2.5)\rightarrow 1.2% of the time we will see a difference from the mean this large or larger
    • 98.8% of the time we will see a difference this small or smaller.
    • Note: pp values usually reported as “this large or larger”

What can we conclude from these results?

Step 4: conclusion

  • Is our sample different than the overall mean of students in the US? pp value = 0.012

  • What can be concluded? What does the pp value mean?

    • pp value = 0.012 \rightarrow Fail to Reject/Reject H0H_0

    • The survey data does not provide strong evidence that the rate that DKU students eat breakfast is different than US average.

    • This should not be the end of the conversation.

    • The next step would be to see if the breakfast eating rate is lower for all classes/among freshman/sophomores/etc.

A hypothesis test for the mean

One sample tt test for the mean

  • Assumptions are the same.

  • H0:μ=μ0H_0: \mu=\mu_0

  • tn1=yμ0SE(y)t_{n-1}=\frac{\bar{y}-\mu_0}{SE(\bar{y})}

  • Standard Error of y:SE(y)=σn\bar{y}: SE(\bar{y})=\frac{\sigma}{\sqrt{n}}

  • When the conditions are met and H0H_0 is true, the statistic follows the Student’s tt Model with n1n – 1 df.

  • Use this model to find the pp value.

Interval and tests

Intervals and tests

  • Confidence Intervals
    • Start with data and find plausible values for the parameter. CI is data-centric.
    • Always 2-sided
  • Hypothesis Tests
    • Using CI, is proposed parameter value consistent with interval? Test is model-centric.
    • 2-sided test: Within the confidence interval means fail to reject H0H_0.
    • pp value = 1 – C is the cutoff.

The special case with proportions

  • Confidence intervals
    • Use p̂\hat{p} to calculate SE(p̂)=p̂q̂nSE(\hat{p})=\sqrt{\frac{\hat{p}\hat{q}}{n}}
  • Hypothesis Tests
    • Use p̂\hat{p} to calculate SD(p̂)=pqnSD(\hat{p})=\sqrt{\frac{pq}{n}}

If SE(p̂)SE(\hat{p}) and SD(p̂)SD(\hat{p}) are far from each other, the relationship between the confidence interval and the hypothesis test breaks down.

pp values and decisions: what to tell about a hypothesis test

How small a pp value is small enough?

  • How small is small enough is context specific.

  • Test to see if a renowned musicologist can distinguish between Mozart and Hayden.

    • pp value of 0.1 may be good enough. Just reaffirming known talent.
  • A friend claims psychic abilities and can predict heads or tails.

    • Very small pp value such as 0.01 needed. Breaking scientific theory.

Acceptable pp value depends on result’s importance

  • Proportion of students with full time jobs has increased.

    • Not that important. pp value = 0.05 will work.
  • Testing the strength of a bridge based on a sample of concrete quality

    • Life and death decision. Need a very small pp value
  • Whether rejecting or failing to reject, always cite the pp value.

  • An accompanying confidence interval helps also.

Recommendations from the American Statistical Association

  1. pp values can indicate how incompatible the data are with a specified statistical model.
  2. pp values do not measure the probability that the studied hypothesis is true, or the probability that the data were produced by random chance alone.
  3. Scientific conclusions and business or policy decisions should not be based only on whether a pp value passes a specific threshold.
  4. Proper inference requires full reporting and transparency.
  5. A pp value, or statistical significance, does not measure the size of an effect or the importance of a result.
  6. By itself, a pp value does not provide a good measure of evidence regarding a model or hypothesis.