Lecture 2.1 - Association and correlation

Author

Professor MacDonald

Published

March 26, 2025

Association and correlation

Exercise 1

In your pairs, try to think of two variables that, in the real world, that might have this correlation for each of the following correlations. Try to think of a few examples for each correlation.

  • 0.95
  • 0.75
  • 0.5
  • 0.25
  • 0.0
  • -0.25
  • -0.5
  • -0.75
  • -0.95

Pick a few of these and draw by hand what you expect these graphs to look like.

Exercise 2

Seattle

Viewing an example relationship

  • First, what is our expectation about the relationship between bedrooms and square feet?
  • Direction?
  • Form?
  • Strength?
  • Outliers?

Bedrooms and square feet - direction

A scatterplot is the easiest way to check for direction. In this case, the direction is obvious

Bedrooms and square feet - form

Bedrooms and square feet - strength

Correlation as a measure of strength

This correlation is a little weaker than perhaps what we expected

  • In general, mechanically generated processes with little noise can have very high correlations
  • Most correlations of social or real world processes rarely have above moderate correlation due to noise

Bedrooms and square feet - outlier

Again, we do not have a rule for selecting outliers other than to observe them on the scatterplot. In this case, there is one very obvious value far from other values

To investigate if this outlier matters, we can check some other values of the observation.

What kind of outlier do you think this is? Why?

Outlier - actual observation

Location

House details

Data with no outlier

Describing the association

  1. Direction - positive
  2. Form - linear
  3. Strength - moderate/strong
  4. Outliers - one possible

Outlier:

Does bed and square feeet relationship match expectations?

  • Seems to, more or less, the larger the house, the more bedrooms, so the relationship is positive
  • The relationship is fairly linear, indicating a strong relationship
  • Relationship is moderate
  • Outliers don’t seem like it is a problem

However….

  • What are some possible lurking variables that influence the relationship of bedrooms and bathrooms?

Relationship reexpressed?

One final issue to consider is if this relationship should be reexpressed - made more linear.

  • Seems clearer

Your turn

With your partner, develop some expectations about some of the variables in the kc.houses dataset might be related.

Variables:

What to do with your partner:

  1. Write down an interesting question we think we might be able to answer by examining a relationship in this dataset
  2. Choose the variables that you think might be able to answer this question
  3. Write down what you expect the relationship to be between these two variables based on any prior knowledge
  4. Decide which variable is the response variable and which is the predictor variable
  5. Make a scatterplot using one of the codeblocks in the previous section and identify the features of the association