In ECON 216, we will learn how to create data visualization–graphs and charts–such as the ones that support this article, which will enable you to also use data to study social phenomena and make arguments. The course is designed for people who have never worked with data before.
The class is focused on teaching you how to produce documents just like this one, which contain text and graphics based on data, using software called RStudio
that uses the statistical programming language R
.
The website 538 published an interesting story on October 26, 2020 that explored who does and does not vote in America.
The story is an example of data journalism, where rather than interviewing people a journalist examines a dataset to support the story they want to tell.
A big advantage of data journalism is that, when the data is treated properly, we can get a more accurate view of what average Americans do.
538 makes the data used in their stories available for everyone to use. For this story, 538 surveyed 5,836 people and I have this data stored on my computer. I will load it in using some code that you won’t understand now, but trust me by the end you will!
data <- read.csv("nonvoters_data.csv") %>%
subset(Q30!=-1) %>%
mutate(party = as.factor(case_when(
Q30 == 1 ~ "Republican",
Q30 == 2 ~ "Democrat",
Q30 >= 3 ~ "Independent/Neither"
)),
voting_frequency = voter_category) %>%
select(RespId, weight, educ, race, gender, income_cat, voting_frequency, party)
The data is just a big table in which each person surveyed has a separate row and then information collected about them are stored in the columns of the table. Here is what the table looks like for the first six people. Notes that there is a column called voting_frequency
that indicates how frequently someone votes based on public voter records matched to the survey respondents.
head(data)
## RespId weight educ race gender income_cat voting_frequency
## 1 470001 0.7516 College White Female $75-125k always
## 2 470002 1.0267 College White Female $125k or more always
## 3 470003 1.0844 College White Male $125k or more sporadic
## 4 470007 0.6817 Some college Black Female $40-75k sporadic
## 5 480008 0.9910 High school or less White Male $40-75k always
## 6 480009 1.0591 High school or less White Female $40-75k rarely/never
## party
## 1 Democrat
## 2 Independent/Neither
## 3 Democrat
## 4 Democrat
## 5 Republican
## 6 Independent/Neither
I’ve chosen to look at a section of the data that is about voter demographic characteristics, like race, gender, party affiliation, and income.
A natural question to ask is whether voting frequency differs by demographic groups. First, let’s make a visualization that shows how voting frequency differs by gender. The code below won’t make sense to you now–the point of the class is to teach you how to do this.
ggplot(data = data, mapping = aes(x = gender, fill = voting_frequency)) +
geom_bar(aes(weight = weight), position = "fill")
We learn two things from this visualization. First, most people vote either rarely or sporadically. Second, there is not much gender difference in voting frequency. Does the latter surprise you? I personally expected more regular voting among female gender.
Next I’m going to fast forward a bit to a fact that surprised me when I looked at the article. Let’s do the same graph but for party affiliation rather than gender. (If you are looking at the code you’ll see all I have to do is replace gender
with party
in the code.)
ggplot(data = data, mapping = aes(x = party, fill = voting_frequency)) +
geom_bar(aes(weight = weight), position = "fill")
In this graph we see some more pronounced differences in behavior across categories. People who