Data Is Plural

... is a weekly newsletter of useful/curious datasets.

2017.03.29 edition

Military spending, food surveillance, failed banks, (more) real-time air quality, and 100 million domain names.

Military spending. The Stockholm International Peace Research Institute’s Military Expenditure Database is based on official reports, International Monetary Fund yearbooks, newspaper articles, and other sources. It covers most major countries since the 1950s and more than 100 countries since 1988. The dataset also quantifies military spending on a per-capita basis, as share of the country’s GDP, and as a proportion of total government spending. Also: The Defense Manpower Data Center publishes spreadsheets detailing the number of active and reserve U.S. personnel stationed in each state, territory, and foreign country. Previously: SIPRI’s database of international arms transfers (Nov. 18, 2015). [h/t K.K. Rebecca Lai, Troy Griggs, Max Fisher and Audrey Carlsen]

Food surveillance. Late last year, the FDA began publishing a dataset of ”adverse events” that have been reported to its Center for Food Safety and Applied Nutrition. The database currently covers January 2004 through December 2016, and includes reports of (suspected) bad reactions to foods, dietary supplements, and cosmetics. For instance, the first row names a particular brand of chocolate chips as the potential culprit in the hospitalization of a two-year-old girl, whose symptoms included a rash, swelling face, cough, and difficulty breathing. Previously: FDA adverse event data for pharmaceutical drugs (May 18, 2016). [h/t Sheila Hagar + Drew Ivan]

Failed banks. The Federal Deposit Insurance Corporation publishes a spreadsheet of failed banks for which the agency has been appointed as a receiver — some 550 banks since October 2000. It also provides short descriptions of each bank failure. The most recent: Proficio Bank of Cottonwood Heights, Utah, which closed on March 3. More on the FDIC’s receivership program here.

Real-time air quality, part II. After last week’s item on Berkeley Earth’s real-time air quality data, reader Olaf Veerman pointed me to OpenAQ. The open-source project currently gathers pollution data from nearly 5,500 locations in 47 countries, aggregated “from real-time government and research grade sources.” You can download the data via OpenAQ’s API. [h/t Olaf Veerman]

100 million domain names. The anonymously-published DNS Census 2013 “is an attempt to provide a public dataset of registered domains and DNS records” — essentially the Internet’s phone book. The dataset, which has also been uploaded to the Internet Archive, includes 2.7 billion Domain Name System records and 106,928,034 distinct domains, organized by extension (e.g., .com, .info, .edu). RIP, [h/t Andrew Ferlitsch]