Map maker, map maper, make me a map!

Today we’re going to look at how we can use joins and geospatial data more explicitly to make maps. Maps are among the first data visualizations that ever occured. And some of the most powerful. They’re also one of the places where joins become incredibly important, as to put data on a map we have to join our data with a geospatial description of the map we want.

Death from Heart Disease

Today’s data set that we’ll be using is a data set of heart disease mortality from the CDC

#load the data and prep
#for some data manipulation
library(readxl)
library(dplyr)

heart_disease <- read_excel("./join_maps/hd_all.xlsx", 
                            na="Insufficient Data")

head(heart_disease)
## Source: local data frame [6 x 4]
## 
##     State  County Death_Rate FIPS_Code
##     (chr)   (chr)      (dbl)     (dbl)
## 1 Alabama Autauga      463.0      1001
## 2 Alabama Baldwin      391.4      1003
## 3 Alabama Barbour      533.1      1005
## 4 Alabama    Bibb      511.1      1007
## 5 Alabama  Blount      425.6      1009
## 6 Alabama Bullock      483.2      1011

OK, we see that we have state, county, and information on death. FIPS codes, FYI, are standardized county codes. We’ll be ignoring them.

Introducing: Maps

There are a LOT of ways to get map data into R. We’re going to begin with the simplest - grabbing it from an R package. ggplot2 works in tandem with the maps package to provide a few standardized sets of maps for easy plotting. Let’s take a look at one of counties in the U.S. lower 48.

#install the mapdata library if you
#don't have it
library(mapdata)
## Loading required package: maps
## Warning: package 'maps' was built under R version 3.2.3
## 
##  # ATTENTION: maps v3.0 has an updated 'world' map.        #
##  # Many country borders and names have changed since 1990. #
##  # Type '?world' or 'news(package="maps")'. See README_v3. #
library(ggplot2)

#map_data gets us one of the select maps
map_df <- map_data("county")

head(map_df)
##        long      lat group order  region subregion
## 1 -86.50517 32.34920     1     1 alabama   autauga
## 2 -86.53382 32.35493     1     2 alabama   autauga
## 3 -86.54527 32.36639     1     3 alabama   autauga
## 4 -86.55673 32.37785     1     4 alabama   autauga
## 5 -86.57966 32.38357     1     5 alabama   autauga
## 6 -86.59111 32.37785     1     6 alabama   autauga

OK, we have latitude and longitude of county borders, a group (each county has one group), and both a region and subregion. Note we don’t have states and counties - this map is a bit broader than that. It includes cities and US Territories. Also note that capitalization is wonky - it’s all lower case.

To show you how we would use this data

ggplot(data=map_df, mapping = aes(x = long, y = lat, group = group)) +
geom_polygon()