Data Is Plural

... is a weekly newsletter of useful/curious datasets.

2017.07.26 edition

Interned Japanese Americans, Trump visiting Trump, women who’ve run for the House, the Enron emails, and data about a data podcast.

Interned Japanese Americans. The Densho Digital Repository is an archive of oral histories, photographs, newspaper clippings, and other primary sources relating to the internment of Japanese Americans during World War II. Among the materials: several datasets listing people sent to the internment camps, based on official government records. The largest dataset contains more than 100,000 entries and includes details such as each internee’s “relocation” site, arrival date, hometown, birth year, time spent in Japan, marital status, religion, educational degrees, occupation, and military service. The National Archives hosts the raw data, as well as its documentation.

Trump’s visits to Trump properties. NBC News has been tracking the president’s visits to his own luxury properties. For each day since Trump took office, the data — available to download at the bottom of the page — tells you which properties he visited and whether any were golf courses. Since February, Trump has visited his properties roughly 10 days a month, including 25 trips to Mar-a-Lago and 42 trips to his golf courses. Related: A similar tracker from The New York Times. [h/t Rachel Schallom]

Women running for the U.S. House. As the basis for his recent study, “Is Running Enough? Reconsidering the Conventional Wisdom about Women Candidates” (paywalled, but a draft is freely available), PhD candidate Peter Bucchianeri compiled a dataset of female candidates in House primary elections from 1972 to 2010. The spreadsheet covers 1,242 candidacies, and includes each candidate’s party, votes garnered in the primary and general elections, the seat’s incumbency status, the district’s demographics, and more.

The Enron emails. During the course of its Enron investigation, the Federal Energy Regulatory Commission obtained the emails of approximately 150 (mostly high-ranking) Enron staff. You can find versions of the dataset — cleaned, deduplicated, and restructured in various ways — hosted by Carnegie Mellon, UC Berkeley, and Duke Law. Related:What the Enron Emails Say About Us,” published by The New Yorker last week. Nathan Heller writes: The Enron archive “remains one of the country’s largest private e-mail corpora turned public. Its lasting value is less as an account of Enron’s daywork than as a social and linguistic data pool, a record of the way we write online when we’re not preening for the public eye.”

Data podcast data. Data Stories is a podcast about data visualization, hosted by Enrico Bertini and Moritz Stefaner. To celebrate their recently-published 100th episode, the hosts released a spreadsheet detailing the date, title, number and genders of guests, length, and timestamped subchapters of each episode so far. Related: Christian Laesser’s visualization of the data. [h/t Benjamin Cooley]