Data Is Plural

... is a weekly newsletter of useful/curious datasets.

2016.11.02 edition

Medicare beneficiaries, human settlement, Amazon reviews, NYC police complaints, and dangerous dogs.

Medicare beneficiaries. The U.S. government’s Medicare Health Outcomes Survey tracks the “physical and mental health and well-being” of Americans covered by Medicare. Each survey, currently available for 1998–2000 to 2012–2014, follows a sample of Medicare beneficiaries for two years, and asks them questions along the lines of, “In the past 12 months, have you had a problem with balance or walking?” The 2012–2014 data includes (at least partial) responses from 296,320 people. [h/t Ricardo Pietrobon]

Where we live and build. The European Commission’s Global Human Settlement Layer combines satellite imagery and census data to measure three things: population, building density, and urban/rural classification. The resulting datasets are fairly detailed — they provide population estimates for every 250-meter square in the world, for example — and are available for 1975, 1990, 2000, and 2015. [h/t Alaistair Rae]

Complaints against NYC police. Earlier this autumn, New York City began publishing a dataset of official citizen complaints against the city’s police, for every case closed since 2006. For each of the 200,000+ allegations, the main dataset includes various details about the incident — e.g., where it took place, and whether there’s video evidence — but no information about the officer involved. Related: Similar data from Indianapolis, which includes demographic information about the complained-against officers but not their names. Also related:The local projects that are making police complaint data open and accessible.” Previously: Complaints against Chicago police, featured Nov. 11, 2015. [h/t Eve Ahearn]

Millions of Amazon reviews. Julian McAuley, an assistant professor at UC San Diego, has collected a massive amount of user-generated data from, including 142.8 million reviews and 1.4 million answered Q&As. (As of mid-2014, Sophie la Girafe was the most-reviewed item in the baby category. Backstory here.) Much of the data can be downloaded directly, but the largest files require contacting McAuley for access. [h/t Reddit user samofny]

The dangerous dogs of Austin, Texas. The city publishes a spreadsheet — last updated in May — of local dogs who’ve officially been “declared dangerous.” (“They have attacked in the past. The owner is required to provide $100,000 in financial responsibility. If they attack again the court could order them put to sleep.”) The file currently contains 63 entries, from a Labrador named Charlie to a Blue Lacy named Flint. [h/t Sharon Machlis]