Data Is Plural

... is a weekly newsletter of useful/curious datasets.

2022.06.08 edition

House primaries, where college grads go, Hong Kong political prisoners, mercenaries, and Roman amphitheaters.

Six decades of House primaries. In 2014, Stephen Pettigrew, Karen Owen, and Emily Wanless published a dataset of all Democratic and Republican primary election results for the US House of Representatives between 1956 and 2010. It indicates each election’s year, state, redistricting status, primary system (open, closed, semi-open, multiparty), and more. The dataset also lists each candidate’s name, gender, prior office, and votes received. In 2020, Michael G. Miller and Nicki Camberg published a follow-up dataset, adding coverage for 2012 through 2018. It uses the same variable names and structure as the earlier dataset, so that the two files can be easily combined.

Where college grads go. Johnathan Conzelmann et al. have created a dataset that estimates the geographic distribution of recent graduates from 2,600 US colleges and universities, calculated from information on the schools’ official LinkedIn landing pages. For each institution, the dataset indicates the proportions of alumni in each of the 278 specific US locations in LinkedIn’s geographic lexicon and cross-references them with government-defined metropolitan and micropolitan statistical areas. Read more: An introductory Twitter thread. [h/t Sharon Machlis]

Hong Kong political prisoners. The Hong Kong Democracy Council, a US-based advocacy group, last month published the first version of its Hong Kong Political Prisoners Database, which contains information about 1,000+ protesters, opposition leaders, and national security law defendants incarcerated since the city’s pro-democracy mass protests in mid-2019. It lists each defendant’s age, arrest date, arrest location, conviction date, convicted offenses, sentencing date, sentence length, and other details. An accompanying report describes the database’s context and methodology. [h/t Samuel Bickett]

Mercenaries. Ulrich Petersohn et al.’s Commercial Military Actor Database examines “the market for force” in 72 countries from 1980 to 2016. It contains information, primarily sourced from news reports, on thousands of contractual relationships between providers (mercenaries and private military/security companies) and their clients (governments, opposition groups, NGOs, and transnational corporations). The contracted work ranges “from combat services and support services (e.g., communication, maintenance), to logistics, security, consultancy, training, and reconstruction.”

Roman amphitheaters. Sebastian Heath, a professor of computational humanities and Roman archaeology, has constructed a dataset of 260+ amphitheaters in the Roman Empire. It provides the structures’ known names, coordinates, orientations, and capacities, among other characteristics, and links the entries to external data sources.