Quick Introduction to ECON 216

In ECON 216, we will learn how to create data visualization–graphs and charts–such as the ones that support this article, which will enable you to also use data to study social phenomena and make arguments. The course is designed for people who have never worked with data before.

The class is focused on teaching you how to produce documents just like this one, which contain text and graphics based on data, using software called RStudio that uses the statistical programming language R.

Who Votes in America and Who Doesn’t?

The website 538 published an interesting story on October 26, 2020 that explored who does and does not vote in America.

The story is an example of data journalism, where rather than interviewing people a journalist examines a dataset to support the story they want to tell.

A big advantage of data journalism is that, when the data is treated properly, we can get a more accurate view of what average Americans do.

Getting Data

538 makes the data used in their stories available for everyone to use. For this story, 538 surveyed 5,836 people and I have this data stored on my computer. I will load it in using some code that you won’t understand now, but trust me by the end you will!

data <- read.csv("nonvoters_data.csv") %>% 
  subset(Q30!=-1) %>% 
  mutate(party = as.factor(case_when(
    Q30 == 1 ~ "Republican",
    Q30 == 2 ~ "Democrat",
    Q30 >= 3 ~ "Independent/Neither"
  )),
  voting_frequency = voter_category) %>% 
  select(RespId, weight, educ, race, gender, income_cat, voting_frequency, party) 

The data is just a big table in which each person surveyed has a separate row and then information collected about them are stored in the columns of the table. Here is what the table looks like for the first six people. Notes that there is a column called voting_frequency that indicates how frequently someone votes based on public voter records matched to the survey respondents.

head(data)
##   RespId weight                educ  race gender    income_cat voting_frequency
## 1 470001 0.7516             College White Female      $75-125k           always
## 2 470002 1.0267             College White Female $125k or more           always
## 3 470003 1.0844             College White   Male $125k or more         sporadic
## 4 470007 0.6817        Some college Black Female       $40-75k         sporadic
## 5 480008 0.9910 High school or less White   Male       $40-75k           always
## 6 480009 1.0591 High school or less White Female       $40-75k     rarely/never
##                 party
## 1            Democrat
## 2 Independent/Neither
## 3            Democrat
## 4            Democrat
## 5          Republican
## 6 Independent/Neither

Learning from Data

I’ve chosen to look at a section of the data that is about voter demographic characteristics, like race, gender, party affiliation, and income.

A natural question to ask is whether voting frequency differs by demographic groups. First, let’s make a visualization that shows how voting frequency differs by gender. The code below won’t make sense to you now–the point of the class is to teach you how to do this.

ggplot(data = data, mapping = aes(x = gender, fill = voting_frequency)) +
  geom_bar(aes(weight = weight), position = "fill")

We learn two things from this visualization. First, most people vote either rarely or sporadically. Second, there is not much gender difference in voting frequency. Does the latter surprise you? I personally expected more regular voting among female gender.

Next I’m going to fast forward a bit to a fact that surprised me when I looked at the article. Let’s do the same graph but for party affiliation rather than gender. (If you are looking at the code you’ll see all I have to do is replace gender with party in the code.)

ggplot(data = data, mapping = aes(x = party, fill = voting_frequency)) +
  geom_bar(aes(weight = weight), position = "fill")

In this graph we see some more pronounced differences in behavior across categories. People who