Data Is Plural

... is a weekly newsletter of useful/curious datasets.

2017.09.13 edition

Global flooding, more house price indices, political facetime, SEC server logs, and wood densities.

Global flooding. The Dartmouth Flood Observatory’s Global Archive of Large Flood Events contains data about 4,500+ floods, dating back to 1985. It’s updated often, and is available in Excel, XML, HTML, and geospatial formats. The variables include each flood’s location, timespan, severity, main cause, and estimated impact. The organization also publishes detailed maps of the “maximum observed flooding” for specific disasters, such as for Hurricane Harvey and for Hurricane Irma. Related: A Science Magazine mini-profile of the DFO and its founder. Previously: U.S. tide gauges and flood observations (DIP 2016.03.23), UK coastal flooding (DIP 2017.08.09), and FEMA flood risk maps (DIP 2017.08.30).

House price indices, part two. Two weeks ago, DIP featured Case-Shiller’s home price index data. There are, in fact, several other prominent (and downloadable) house price indices, including the Federal Housing Finance Agency’s House Price Index, the National Association of Realtors’ indices, and Zillow’s Home Value Index. Helpful: This guide to various home price indices and how they’re constructed, by Jed Kolko, formerly Trulia’s chief economist. Related: This critique of Case-Shiller’s approach, also by Kolko.

Trump, McConnell, Schumer, Ryan, and Pelosi on TV. The Internet Archive has pumped footage from CNN, Fox News, MSNBC, and the BBC through software trained to recognize the faces of Donald Trump and majority/minority leaders of the U.S. House and Senate. The result: Face-O-Matic, a dataset released to the public last week. For each face the software found, the dataset includes the network, program, date, time, duration, and a link to the footage on the TV News Archive. Since mid-July, Face-O-Matic has logged more than 50,000 sightings. [h/t Nancy Watzman]

SEC server logs. When companies file reports to the U.S. Securities and Exchange Commission, they do so through the SEC’s EDGAR system. The SEC makes those filings available online, and it uses EDGAR’s server logs to analyze web traffic to the site. The SEC’s EDGAR Log File Data Set contains a set CSVs — one for each day between February 14, 2003 and December 31, 2016 — extracted from those server logs. For each document visited, the data includes the visitor’s unique-but-obfuscated IP address, the date and time of the visit, the IDs of the document and associated company, and some information about the visitor’s browser. [h/t Brian C. Keegan]

It wood be hard to ignore this dataset. The “robust and curated” Global Wood Density Database contains more than 16,000 entries, culled from scientific literature, websites, and unpublished scholarship. The densest so far is a Caesalpinia sclerocarpa from Mexico, weighing in at 1.39 grams per cubic centimeter. Related: The TRY database of “curated plant traits” (free registration required). [h/t Amy Zanne]