Data Is Plural

... is a weekly newsletter of useful/curious datasets.

2019.07.31 edition

Foreign military trainings, talk radio transcripts, patent geography, UK ministerial resignations, and Soviet space dogs.

Foreign military trainings. For nearly two decades, the US Department of Defense has released detailed tables on the foreign military units it has trained. For each training, the information describes the units trained, number of trainees, course name, start and end dates, location, cost, and more. Unfortunately, the government publishes these records only as PDFs. To make the data more accessible, Security Force Monitor, a project of the Columbia Law School Human Rights Institute, has converted the PDFs into an open, queryable database. An associated GitHub repository contains an extensive methodology, the extraction code, and the raw data. [h/t Jamon Van Den Hoek]

Talk radio transcripts. A team of researchers at the MIT Media Lab has built a corpus of machine-generated transcriptions from 284,000 hours of talk radio. The transcripts capture approximately 2.8 billion words from 50 semi-randomly selected stations, and include metadata, such as the program name, the speaker’s (guessed) gender, and whether the speaker seemed to be in the studio or on the phone. [h/t Lynn Cherny]

Patent geography. Researchers at two Swiss universities have created a dataset of inventors’ and applicants’ locations listed in 18.8 million patents filed between 1980 and 2014. The locations, which span 46 countries, are specified both by their geographic coordinates as well as their administrative areas (e.g. city, state, country). [h/t Gaétan de Rassenfosse]

UK ministerial resignations. The UK Institute for Government has been updating a spreadsheet of ministers who’ve resigned since 1979, the post each one held, the reasons for resignation, and the prime minister in charge at the time. The spreadsheet, which so far contains 151 resignations through last week, includes a few methodological notes embedded as comments in the header row. [h/t Gavin Freeguard]

Soviet space dogs. Duncan Geere has compiled a database of the 48 dogs who participated in the USSR’s space program in the 1950s and 1960s. The information, which also includes details about the canines’ 42 flights, is based on Olesa Turkina’s book, Soviet Space Dogs.