Data Is Plural

... is a weekly newsletter of useful/curious datasets.

2016.11.30 edition

Causes of death, gunshot detections, election forecasts, news on Facebook, and Alex Trebek.

What kills us. The CDC’s Underlying Cause of Death database provides county-level mortality statistics based on death certificates of U.S. residents for each year from 1999 to 2014. The tool lets you group the data by geography, demographics, place of death (e.g., inpatient hospital, hospice, home, etc.), and other variables. In 2014, for example, about 40,000 residents died of pancreatic cancer — with the highest rates coming in America’s most-rural counties (~15.6 deaths per 100,000 residents) and the lowest rates in country’s most-urban counties (~11.3 per 100,000). The CDC’s “compressed mortality” datasets contain slightly less detail, but go all the way back to 1968. [h/t Drew Ivan]

Gunshot detections. Earlier this month, Forbes published an examination of ShotSpotter, a company that uses networks of outdoor microphones to detect and locate gunshot-like sounds. Forbes found that ShotSpotter has produced “few tangible results.” “In some cities, ShotSpotter hasn’t had the effect city officials and residents had hoped for. While officers are responding to more illegal gunfire, they rarely catch the shooter.” To support its findings, Forbes has published the ShotSpotter data they received from police departments in seven cities: Brockton, Mass.; East Palo Alto, Calif.; Kansas City, Mo.; Milwaukee, Wis.; Omaha, Neb.; San Francisco, Calif.; and Wilmington, N.C. The data varies somewhat for each city, but typically includes the date, time, location, and outcome of the each gunshot alert. [h/t Matt Drange]

Comparing election forecasts. This year, I decided to grade a bunch of prominent election forecasts for BuzzFeed News. Now that Michigan has finally been called, I’ve published the results. I’ve also published the underlying data and code on GitHub, including state-level predictions from all nine forecasters in the analysis.

Five years of Facebook posts from 15 news sites. Data analyst Patrick Martinchek has published a dataset of all Facebook posts from “15 of the top mainstream media sources” — a group that includes The New York Times, The Wall Street Journal, NPR, Fox News, and other familiar sources — from January 2012 through Nov. 8, 2016. Related:What I Discovered About Trump and Clinton From Analyzing 4 Million Facebook Posts.”

I’ll take “Datasets” for $200. A few years ago, Reddit user trexmatt uploaded 216,930 Jeopardy! trivia-tidbits, scraped from, “the nearly comprehensive online Jeopardy! archive maintained by obsessive fans.” Each entry lists the question, answer, category, value, round, show number, and show air-date.