Data Is Plural

... is a weekly newsletter of useful/curious datasets.

2016.05.18 edition

Drug safety, fossils, immigrants, parking tickets, and one million musical artists.

Drug safety. To help monitor drug safety, the FDA collects “adverse event” reports submitted by patients, doctors, and manufacturers. You can download the (anonymized) reports from the FDA directly, but that dataset includes duplicate cases, and sometimes calls the same drug by different names. A group of researchers recently announced that they’ve cleaned up the data — removing duplicates and standardizing nomenclature — so that you don’t have to. The resulting dataset covers 4,245 drugs, more than 17,000 types of reactions, and nearly 5 million case reports. Previously: The SIDER database of pharmaceutical side effects, featured Nov. 11, 2015.

Fossils. The Paleobiology Database, run by a non-profit group of researchers, has aggregated data on more than a million fossils from all around the world. You can access the dataset — organized by species, era, and location — via an interactive map, download form, or API.

Immigrants, internationally. The United Nations publishes estimates of the number of foreign-born residents living in every country. The figures cover 1990 to 2015, at five-year intervals. The Vatican (100% foreign-born) and the United Arab Emirates (88%) had the highest proportion of immigrant residents in 2015; the U.S. (46.6 million) boasted the largest total immigrant population. The dataset also includes estimates by age, sex, and country of origin. Previously: Refugees in America, featured Nov. 25, 2015. [h/t Manu Balachandran]

Tens of millions of parking tickets. I Quant NY author Ben Wellington recently discovered that New York City had been “ticketing legally parked cars for millions of dollars a year.” To reach that finding, Wellington analyzed three years of parking tickets, amounting to more than 30 million summonses. NYC isn’t alone in providing parking ticket data; Philadelphia, Toronto, Baltimore, Seattle, and others publish similar datasets.

Musical metadata. The MusicBrainz database contains metadata on more than one million artists, 16 million recordings, 900,000 pieces of cover art. You can download the data in bulk or query it via an API. Previously: The smaller-but-more-detailed Million Song Dataset, featured Feb. 10. [h/t Geoff Boeing]