Preliminary Visualization

For this section, I keep all crashes that were caused by alcohol/drinking and drop the remaining rows. Alcohol related crashes will be referred to as ARC hereafter. Plotting the location of ARCs on the Illinois map will help visualize regions more prone to ARCs. Using lat-long information provided in the crash data set I plot ARCs in Illinois - each magenta point on the map corresponds to a crash. Border of counties of Illinois was drawn using the data available in map_data(“county”). From the map plot below it is clear Cook county has a high density of ARCs, which isn’t surprising given that Cook county is the most populated county of Illinois and contains Chicago which is known for its bars and nightlife. Other big cities or urban areas of Illinois: Champaign, Rockford, St Louis, Springfield etc and major interstate highways - 70, 90, 39 etc are also ARC high density regions.

# Code to plot ARCs in IL
library(ggplot2)
library(ggmap)
library(data.table)
county <- map_data("county")
# we only want IL, so choose region illinois
counties_of_il <- subset(county, region == 'illinois')
center_of_illinois <- c(-89.3985, 40.6331)
il.map = get_map(location = center_of_illinois, zoom = 7, source = "google", maptype="roadmap") # coordinates of the center of illinois
il.map = ggmap(il.map)
arc <- read.csv("ARCdata.csv", head = T)
il.map <- il.map + geom_polygon(data = counties_of_il, aes(x = long, y = lat, group = group), fill = NA, color = "grey42", alpha = 1/2) +  geom_point(data = arc, aes(x = -arc$lon, y = arc$lat), color = "magenta", shape = "." ) 
il.map


Since Chicago was the first city in Illinois where ride-sharing was introduced, let’s take a closer look at Chicago. The border file for Chicago can be downloaded from City of Chicago website. I had to modify the border file using some regex search and replace operations to group the discontinuous parts of the Chicago map. There are 5 different discontinuous regions within Chicago. The modified file can be found here.

# Code to plot ARCs in Chicago
chicago_lat_long <- fread("chicago_boundary_file.csv", col.names = c("long", "lat", "group"))
center_of_chicago <- c(-87.623177, 41.881832)
chicago.map = get_map(location = center_of_chicago, zoom = 10, source = "google", maptype = "roadmap") # location is coordinates of center of chicago
chicago.map.plot <- ggmap(chicago.map)
chicago.map <- chicago.map.plot  + geom_point(data = arc[arc$county == "Cook", ], aes(x = -lon, y = lat), color = "magenta", shape = "." ) + geom_polygon(data = chicago_lat_long, aes(x = long, y = lat, group = group), fill = NA, color = "grey42")
chiacgo.map


So in summary, areas having a higher density of ARCs include major highways connecting cities. Cook county which includes major urbanized areas - Chicago, Evanston and other urbanized areas of IL - Peoria, Springfield, Champaign, Rockford, Bloomington, St Louis. Within Chicago, Downtown and highways like I-90, I-290 have large number of ARCs. The big empty patches in Chicago are O’Hare airport and Harborside International golf center

Cook county, Chicago and few other major cities seem to be a hub for ARCs. However it remains to be seen if this is because people there seem to drink and drive more often or if there is some other feature of these cities (for example, large cities being more crowded might be prone to any road accidents) that leads to more crashes in general. This can be seen by repeating the above analysis for crashes caused by another reason - say weather related crashes. This will be inspected in a future post.

References

Kahle, D., & Wickham, H. (2013). ggmap: Spatial Visualization with ggplot2. R Journal, 5(1).