Data Is Plural

... is a weekly newsletter of useful/curious datasets.

2017.02.08 edition

Metro/subway ridership, house prices, Nobel Prizes, recipe ingredients, and life expectancies.

Metro/subway ridership. Two weeks ago, Bloomberg News reporters requested entrance and exit data from Washington, DC’s Metrorail system for three days: Jan. 20, 2009 (Obama’s first inauguration), Jan. 20, 2017 (Trump’s inauguration), and Jan. 21, 2017 (the Women’s March). A week later, they received the data — but as PDFs, which they turned into structured data and published this week. Related: NYC’s MTA publishes detailed turnstile-by-turnstile data, and Chicago publishes daily “L” ridership data for each station going back to 2001. Plus:Second Avenue Subway Relieves Crowding on Neighboring Lines,” which uses the NYC data.

International house prices since 1975. The International House Price Database combines and standardizes house price indices from 23 countries — mostly in Europe and North America, but also including South Africa, Australia, New Zealand, Japan, South Korea, and Israel. The dataset, published by the Federal Reserve Bank of Dallas, is deeply documented and updated quarterly. Previously: Historical San Francisco rents (May 25, 2016) and the U.S. Census Bureau’s Annual Characteristics of New Housing (June 22, 2016).

Nobel Prizes. The prestigious Scandinavian awards have an API. The official documentation explains it succinctly: “The data is free to use and contains information about who has been awarded the Nobel Prize, when, in what prize category and the motivation, as well as basic information about the Nobel Laureates such as birth data and the affiliation at the time of the award. The data is regularly updated as the information on Nobelprize.org is updated, including at the time of announcements of new Laureates.” Related:These Nobel Prize Winners Show Why Immigration Is So Important For American Science,” by my colleague Peter Aldhous. Plus: The R code supporting Peter’s analysis.

Recipe ingredients. For their 2011 paper, “Flavor network and the principles of food pairing,” four scientists analyzed 56,498 recipes downloaded from three websites — allrecipes.com, epicurious.com, and menupan.com. To support their findings, the authors published two datasets. One names the cuisine and ingredients for each recipe. The other dataset counts how often any two ingredients appeared in the same recipe. (Parmesan cheese and beef appeared together 93 times; starfruit and Algerian geranium oil just once.) Related:food2vec – Augmented cooking with machine intelligence,” published last month. [h/t Rob Barry]

Life expectancies. The World Health Organization publishes life expectancy estimates for 194 countries, for each year between 2000 and 2015. Related:One Dataset, Visualized 25 Ways.” Previously: American life expectancies by city (April 13, 2016).