Data Is Plural

... is a weekly newsletter of useful/curious datasets.

2016.08.24 edition

Crime in American cities, one billion healthcare claims, oil concessions, horse deaths, and traffic signs.

Crime in cities. The Marshall Project has collected and analyzed four decades of FBI data “on the most serious violent crimes in 68 police jurisdictions.” The FBI data covers 1975 through 2014; the reporters “also obtained data directly from 61 local agencies for 2015 — a period for which the FBI has not yet released its numbers.” Between 2010 and 2015, violent crime increased most in Milwaukee (+11%) and declined most in Prince George’s County, Md. (-22%).

One billion Australian healthcare claims. Australia’s Department of Health has recently released an enormous dataset of Medicare and subsidized-prescription claims. It includes all claims from a random 10% sample of patients, and “contains approximately 1 billion lines of data relating to approximately 3 million Australians.” The Medicare claims go back to 1984, and the prescription claims go back to 2003. [h/t Drew Ivan]

Oil concessions. The OpenOil project aims to collect and standardizes data oil and gas development contracts around the world. So far, they’ve gathered at least some data from more than 60 countries. They’ve also published a map of oil concessions in the Middle East and Africa. [h/t Michael Gardiner]

New York racehorse deaths and injuries. New York State tracks every time a horse has been injured or died at a state race track since March 2009. The dataset, which is updated often, also includes a few other types of incidents, such as when a rider falls or horse loses badly. Related: “Horses’ Deaths at Aqueduct Prompt New Rules.” [h/t Mark Secada]

German traffic signs: The German Traffic Sign Recognition Benchmark dataset contains 50,000+ images of 43 kinds of German traffic signs — from the classic “STOP,” to various speed limits, to roundabout indicators. The dataset, published by researchers at Ruhr-Universität Bochum’s Institut für Neuroinformatik, formed the basis of a 2011 machine-learning competition. [h/t Viktor Schepik]