Data Is Plural

... is a weekly newsletter of useful/curious datasets.

2023.05.10 edition

Prison demographics, more AI incidents, blocked rail crossings, grammatical phenomena, and planetary nomenclature.

Prison demographics. In a recent paper in Nature, Brennan Klein et al. describe how they “manually assembled and validated a dataset covering all 50 US states, the District of Columbia and the Federal Bureau of Prisons to both quantify the widening racial disparity observed during the first year of the COVID-19 pandemic and uncover its plausible causes.” Their Dataset on Incarcerated Populations provides the monthly number (or interpolated estimates) of people, by race/ethnicity, in each prison system. The dataset goes back at least to 2010 for most states; for 18 states, it also includes admission/release counts. Related: The Bureau of Justice Statistics’ annual data on prison demographics, which provide state-level counts by sex but not race/ethnicity. Previously: The Vera Institute’s Incarceration Trends Dataset (DIP 2019.01.02) and NYU Public Safety Lab’s Jail Data Initiative (DIP 2023.04.12). [h/t Shawn Musgrave]

More AI incidents. The AI, Algorithmic, and Automation Incidents and Controversies initiative, founded by Charlie Pownall in 2019, maintains a repository of such events, as well as related systems (e.g., GPT-4) and datasets (e.g., Labeled Faces in the Wild). The project’s spreadsheet features 1,000+ entries, each listing a title, type, year, country, sector, operator, purpose, and more — and linking to more detailed descriptions on AIAAIC’s website. Previously: The AI Incident Database (DIP 2023.04.19).

Blocked rail crossings. “As rail profits soar, blocked crossings force kids to crawl under trains to get to school,” a recent investigation by ProPublica and InvestigateTV has found. In addition to on-the-ground reporting, photos, and video, the article cites the Federal Railroad Administration’s database of blocked crossing complaints. The database’s ~70,000 reports, going back to December 2019, each list a crossing ID, street, city, state, railroad, reported incident date, duration, reason, impacts, and additional comments. Related: The FRA’s database of all rail crossings. [h/t Tom Hughes]

Grammatical phenomena. Grambank, the result of a collaboration involving 100+ linguists, examines a range of grammatical phenomena, “from word order to verbal tense, nominal plurals, and many other well-studied comparative linguistic variables.” The project’s dataset, available to download and explore online, spans 195 such features across 2,400+ languages and dialects. For instance, here’s the page for feature GB030, which asks, “Is there a gender distinction in independent 3rd person pronouns?” [h/t Robin Sloan]

Planetary nomenclature. The International Astronomical Union’s Gazetteer of Planetary Nomenclature “provides a unique system of official names for planetary surface features, natural satellites, dwarf planets, and planetary rings for the benefit of the international science community, educators, and the general public.” You can browse, search, and the download the data, as well as view images of their locations. As seen in: Cinzia Bongino’s The Names on the Moon. [h/t Giuseppe Sollazzo]