Data Is Plural

... is a weekly newsletter of useful/curious datasets.

2021.08.18 edition

US military bases and deployments, Louisiana deaths behind bars, A/B headline tests, the Magazine of Early American Datasets, and shows within shows.

US military bases and deployments. Michael A. Allen et al. are gathering and standardizing data on the United States military’s global presence. The project’s CSV files and R package include annual, country-level troop counts between 1950 and 2020, drawn from prior work by economist Tim Kane and from the government’s Defense Manpower Data Center. They also include a listing of US military bases abroad, primarily sourced from Base Nation, a book by political anthropologist David Vine (who, disclosure, is a cousin of mine). Related: Vine also maintains various lists of US military bases abroad since 1776 and has published a follow-up book, The United States of War. [h/t u/smurfyjenkins]

Louisiana deaths behind bars. Incarceration Transparency, a project undertaken by law students and faculty at Loyola University New Orleans, has compiled data on more than 830 deaths in Louisiana jails, prisons, and juvenile detention centers, primarily between 2015 and 2019, based on 130+ public records requests. The information includes each decedent’s name, age, sex, race, and trial status; the date, facility, and cause of death; and other factors. Read more: The New Yorker’s recent profile of the project and the professor leading it. Previously: Deaths in US jails, via Reuters (DIP 2020.10.21).

A/B headline tests. The Upworthy Research Archive describes 32,000+ headline-testing experiments conducted in 2013–15 by Upworthy, the online publication that popularized a once-ubiquitous style of headline. The dataset, contributed by the publication to a team of academics, is split into three tranches for use in different phases of research. In total, it covers 150,000+ headline-plus-image permutations; for each, it provides the headline, an image identifier, the number of viewers assigned to see it, the number who clicked, and other details.

The Magazine of Early American Datasets. MEAD, as the publication acronymizes itself, “provides sweet, intoxicating data for your investigations of early North America and the Atlantic World.” The initiative, affiliated with the University of Pennsylvania, hosts a few dozen datasets on a range of topics; many focus on Pennsylvania and on slavery, while other subjects include George Washington’s shipping invoices and the 19th century children’s book industry. [h/t Noah Veltman]

Shows within shows. Nestflix, a new website by designer/developer Lynn Fisher, catalogs more than 400 fictional films and TV shows that appear within actual films and TV shows. For instance: 30 Rock’s The Rural Juror and Home Alone’s Angels with Filthy Souls. The project is open-source; the data files for each item include the title, a description, a quotation, the parent show/film, and more.