# Course Introduction

# Course at a glance

- Tuesday/Thursday class at 8:30 am to 11:00 am in IB3106
- Wednesday lab 1:15 pm to 2:30 pm in IB3106
- Office hours -
- Tuesday 1:00 pm to 3:00 pm in WDR 3114
- Wednesday, 9:45 am to 11:45 am in WDR 3114

- Homeworks are all due on Sunday at 11:59 pm
- All other announcements and information are posted on the class Teams site

# Course description

How can we use data to shed light on age-old and new human problems such as pollution, discrimination, and economic growth? How can we be “sure” that the evidence we have points us in the right direction? How meaningful are our findings? Do our results suggest the relationships we find between factors such smoking and cancer are meaningful or meaningless? How would we know? How should one properly display and explain your statistical results to these important issues?

This class introduces you to the tools and concepts that begin to tackle these questions. We will cover topics such as data summaries, sampling, data analysis, production of graphical displays, and regression techniques. The goal at the end of the course is that you will be able to conduct basic data manipulation, know how to properly summarize and display data, and make basic statistical inferences using real datasets.

The emphasis in the course will not be on learning mathematical formulas related to statistics but rather to develop an intuitive understanding of statistical inference and measures of uncertainty. For those interested in further study of statistics in a more rigorous way, you may also consider taking the following courses:

- Math 205: the mathematical foundations of statistics
- Econ 203: advanced study of modern regression techniques for Economists
- PoliSci 301: Statistical techniques to infer causation
- Social Sciences 320: advanced statistical techniques applied to real-world problems

A third set of goals for the course is that you will also be able to read more fluently research literature that employs statistics. During this course I will reference, in class, a number of historically important academic articles and we will analyze the data from those articles. Doing so should help you understand how data is used (and misused) to construct social science arguments.

# Course objectives

Upon completing the course, you will develop the following abilities:

- Intuitively interpret statistics in course materials and in the larger world
- Become a statistics results producer in addition to a statistics consumer
- Assess when and how to use statistics to answer specific questions in the social sciences
- Analyze how previously learned problems can be answered with statistical methods
- Apply statistical methods to future social science coursework and capstone project
- Judge how appropriately statistics are used in everyday life when reading the news, business reports, and other real-world applications

In support of this you will be able to:

- Understand and interpret basic statistical properties of data (confidence intervals, t-tests, etc.)
- Identify when various statistical tests are appropriate given a specific dataset
- Formulate testable hypotheses in the data and learn how to execute those tests
- Interpret statistical results to understand both significance of the results and their substantive impact
- Illustrate statistical results with appropriate and clear graphical displays that provide meaning to the reader
- Evaluate critically other, published, statistical work with the skills and techniques learned in class
- Propose an independent research project that integrates statistical methods with their research interest for their capstone project

# Course structure

Because of the pandemic situation this term, the course structure is going to be a little flexible - please bear with me as we make it through this term. All changes to the usual course schedule will be announced on Slack.

*In general*, each week will proceed roughly as follows:

**Monday-Thursday**: Read the textbook chapter, make progress on the lab(s) and homework**Tuesday 8:30 am-11:00 am**: Class session with a conceptual review of the chapter material**Wednesday 1:15 pm-2:30 pm**: Lab activities designed to help gain familiarity with the technical aspects of using R, RStudio and Quarto.**Thursday 8:30 am-11:00 am**: Second weekly class session**Friday**: Complete DataCamp lab(s) before**midnight****Sunday**: Complete the homework that is due before**midnight**on Sunday

# How to prepare

We will be using three different online tools for this class, you will need to sign up for two of these tools. Our main class hub will be on Teams, where I will provide appropriate links for when you need to use the other tools.

- Teams: Teams works a bit like WeChat and a bit like Sakai and is perfect for collaboration. I encourage you to post often on Teams, including interesting statistical things you find in the world, questions about statistics you see in the news, and any other thoughts you have. Make sure to fill out your profile and upload a profile picture so we can keep track of who you are!
- R and RStudio: These software packages that implement the R statistical programming language are free and very popular tools for conducting statistical investigations, including all of the homeworks for this class.
- Quarto: Quarto is a flexible document formatting standard that allows you to create nice looking documents that easily mix text and R code output (I made this website with Quarto - you can view the code here). All your homeworks, midterm, and final projects will be written inside of Quarto documents. Instructions on how to use the Quarto syntax will be provided with the first homework assignment.

- DataCamp: DataCamp has a lot of very useful tutorials that will help you learn how to code in R.

# Required texts

Intro Stats, 6th Edition by De Veaux et. al.

# Course policies

I will detail the policy for this course below. Basically, try to learn stuff, don’t cheat, and be active in class and things should go just fine for you.

## Assessment

Beginning of class warmup quiz

**10%**: At the start of each class session there will be a series of short online questions to start discussion of the class material. I will drop your two lowest quiz grades. I will also curve the grade such that the student with the highest overall quiz grade will receive 100% and all other students will receive a corresponding boost to their grade.Homework

**55%**: At the end of a group of content, a homework will be assigned that will ask you to analyze a dataset and answer questions related to that concept group. These homeworks are always due the Sunday after assigned at 11:59 pm China time. Each homework/midterm will also have a best graph contest, where the person whose classmates vote as having the best graph wins some extra credit points. Note that Homework 1 is worth less than the other homeworks, so you have a chance to make up a poor grade with later assignments.Homework 1: 15%

Homework 2: 20%

Homework 3: 20%

Homework checks

**4%**: To make sure you are making good progress on your homework, the Sunday before the homework is due (except for the first homework) you will be required to submit your progress on the homework so far. You are required to have tried to have answered all the questions covered by the lectures and textbook up to that Sunday. I will not check your answers but rather check to see if you have made a good effort to answer all the questions derived from the material already covered. If you have a reasonable answer for each question checked you will get full points. If you have not made an effort to answer all the questions reviewed you will get a zero.Final project

**25%**: The final exam will ask you to analyze a dataset using all the strategies we have learned in class. The paper for this project should be about 2000 words.Data Camp labs

**5%**: A number of labs on the website Data Camp will be assigned to you; these labs are pass/fail and you will receive full credit if you complete each of the labs by the specified due date.Syllabus quiz

**1%**: A short quiz after the first class regarding the course requirements.Extra credit

**maximum 3%**: You have several opportunities to earn extra credit. There will be three graph contests, whereby the student who is voted by the other students as having the best graph will receive extra credit points. Additionally, those who respond to other students questions on Teams will also be eligible to earn extra credit points.

## Lateness policy

Since the course moves very quickly, if you are submitting work late that means you are falling behind on other material and it may be difficult for you to recover. Therefore, I have a fairly strict lateness policy.

- All major assignments are due at
**11:59:00 pm**. Not 11:59:01 or 11:59:31. - If it is later than 11:59:00 pm, then the assignment will be assessed a 5% lateness penalty
- If it is later than 12:29:00 am, then the assignment will be assessed a 10% lateness penalty
- If it is later than 11:59:00 pm the next day, the assignment will be assessed a 50% lateness penalty
- If it is later than 2 days from the due date, I will no longer accept the assignment

Please be sure to check that your homework is complete and make sure to submit it a few minutes early. You can submit multiple times on Teams so make sure you have a nearly complete version uploaded even if you want to keep working on it right up to the deadline. I will not be sympathetic to messages that complain of computer problems when you are trying to submit for the first time at 11:58:51 pm.

## Attendance Policy

Class will be highly interactive and therefore it is to your advantage to attend class. I do not take roll but if you are not in class for the beginning of class quiz there is no opportunity to make it up without a valid excused absence (doctor’s note, absence arranged in advance, etc.). The lecture will have a number of points on which I will ask students to modify some code and return a result and doing these exercises is mandatory. If you are called on to provide your solution but are not attending the lecture without previously notifying me, it will also be quite embarrassing for you!

## Contact Policy

- I do not respond to email about class unless it is for emergency circumstances. Use Teams to DM me instead.
- I usually try to reply to Teams DMs on the same day (though not in the evenings), however responses may be slower on the weekend
- Do not DM me 2 hours before a homework is due and expect an immediate response!
- For general questions, such as how should one interpret a question on a homework or quiz, you should ask the question in the appropriate Teams channel. That way others can benefit from the response or someone else may be able to answer more quickly than I can. If you DM me a question that really belongs in the help channel I will ask you to repost it.
- Many questions can be answered by carefully checking the class website or reviewing the materials in the Teams chat. If in doubt, though, feel free to ask.

## ChatGPT (and similar) Policy

Unless otherwise specified on an assignment, you may use ChatGPT as much as you wish. However, most of the assignments in this class are not very amenable to ChatGPT help. Your main grade will be understanding, annotating, and interpreting your output. Producing the output is the very simple minimal requirement. Also, ChatGPT often makes many coding mistakes. If you rely too much on ChatGPT to code, you will not understand when it makes mistakes and why it makes mistakes.

## Academic Dishonesty Policy

Don’t cheat. Don’t be that person. Yes, you. You know exactly what I’m talking about too.

More specifically, you are expected to strictly adhere to the Duke Kunshan University Community Standard in all of your work and participation, and violations will be enforced. More details can be found here.

All work must be done exclusively by the individual to whom it has been assigned. You should assume that collaboration on assignments, the use of previously-assigned homework, quizzes and answer keys, outside sources or outside aids (both written and electronic) are not allowed unless explicitly noted in the assignment guidelines or in this syllabus. All cases of suspected cheating will be referred for adjudication to the Dean’s Office. Any violation for which a student is found responsible is considered grounds for failure in the course.

It may sound cliché to say, but if you cheat and borrow other’s code or answers you are only cheating yourself; you will not learn how to do statistics and doing so will mean you will do worse on the midterm and the final anyway. Cheating is ultimately self-defeating so for both of our benefit, please, don’t do it. If you are having trouble completing the assignment and feel tempted to cheat, please contact me directly instead with the difficulties you are having.

## Disabilities Policy

If you need an accommodation due to a disability, you should not hesitate to request one. The process is that requests should be sent to the Dean of Undergraduate Studies, who will contact me with recommended type of accommodation that is needed. You do not need to disclose your reason for requesting an accommodation with me, and asking through the Dean of Undergraduate Studies helps make things official for both you and me.