Lecture 1.2 - Characteristics of distributions
Characteristics of distributions
Distribution of common quantities
Many phenomena in nature have a relatively easily guessed distribution characteristics
- What is the distribution of length of rivers in the U.S.?
- What is the distribution of width of flower sepals?
- What is the distribution of life expectancy across countries in 2007?
Features to guess:
- Shape
- Center
- Spread
Graphs of common quantities
Length of rivers in the U.S.
Flower sepal width
Life expetancy in 2007
Data generating process
Data is what we record.
Data is a function of: Data point = underlying process + random variation + measurement error
Example: flower size.
Truncated data
Data not generate for values above or below specific values.
For example, all age data is truncated at zero.
Activity
Data generating process - advanced: height activity
What do you expect the shape, center, and spread of class height to be? Why? Write down with your partner your guesses.
Height distribution
Closing thoughts
- Many distributions can be guessed in advanced based on the data generating process
- You should have at least a guess as to what the distribution is before starting your exploratory data analysis
- Think carefully about what your variable is actually measuring
- Characteristics of distributions are summaries of the data, almost always obscure features of the data
- Don’t mislead your readers!!