Data Is Plural

... is a weekly newsletter of useful/curious datasets.

2016.02.03 edition

Angry travelers, Wikipedia biographies, Zika data, post-Fukushima radiation, and movie chatter.

Angry travelers. The Transportation Security Administration publishes spreadsheets of legal claims against the agency, including the location, circumstances, and outcome of each claim. The most expensive settlement on record appears to involve a vehicle-related personal injury in July 2004, for which the TSA paid $125,000. On the other end of the spectrum: In 2014, a traveler recouped $1.25 for lost food or drink at Hilton Head Island Airport. [h/t Seth Kadish + Lindsey Cook]

Famous people on Wikipedia. Last month, a group of researchers introduced Pantheon 1.0, “a manually verified dataset of globally famous biographies.” It starts with 11,341 Wikipedia biography pages in 25 languages, and adds birthplace, birthdate, gender, occupations, and page views. You can download the data or explore it online. Baffling factoid: As of May 2013, High School Musical star Corbin Bleu had biographies in more language editions than anyone other than Jesus Christ and Barack Obama. Related: A broader-but-shallower dataset of more than 400,000 influential people on the English-language Wikipedia. [h/t Ben Dilday]

Zika data. Fears about the Zika virus — and a possible, but not proven, connection to microcephaly — are growing. Little data on the latest outbreak has been published, but here’s an open guide to what’s available so far, including reported cases of microcephaly in Brazil and the number of suspected Zika samples sent to Colombia’s national institute of health.

Post-Fukushima radiation. Next month marks the five-year anniversary of the Fukushima Daiichi disaster, the worst nuclear accident since Chernobyl. Since shortly after the meltdown, volunteers for Safecast have been collecting radiation measurements in Japan and beyond. The results are available to download or to access via API.

Movie chatter. The Cornell Movie-Dialogs Corpus contains 220,579 “conversational exchanges” between 9,035 characters in 617 movies. Included: “Hello. My name is Inigo Montoya. You killed my father. Prepare to die.”