Data Is Plural

... is a weekly newsletter of useful/curious datasets.

2023.10.25 edition

Court-debt jailings, runway incursions, Goldin records, postal code ecology, and spreadsheets from Information is Beautiful.

Court-debt jailings. “In the absence of a clear picture of debt imprisonment,” Stanford’s Computational Policy Lab and Big Local News “set out in 2018 on a first-of-its-kind data integration effort to answer the most basic question of all: how many people are being jailed for unpaid court debts?” To do so, the team submitted “hundreds of public records requests with county jails”; last month, their Debtors’ Prisons Project published standardized, anonymized versions of data received from 100 counties (primarily in Texas and Wisconsin), “totalling more than 4 million individual jail booking records,” plus warrant data from Oklahoma and Delaware. The datasets indicate (where available) each arrestee’s race, ethnicity, sex, age, state, ZIP code, booking date, release date, and release type, as well as each charge’s description, severity, and whether it represents a “failure to pay.” Read more: The team’s tutorial for analyzing the data.

Runway incursions. The Federal Aviation Administration maintains a database of runway incursions, which it defines as “any occurrence at an aerodrome involving the incorrect presence of an aircraft, vehicle, or person on the protected area of a surface designated for the landing and take-off of aircraft.” It currently contains 30,000+ entries, spanning October 2001 through July 2023; each lists an incursion’s date, category, location, severity, aircraft types, and weather conditions. You can export those elements as a CSV file, but the exports lack the online database’s event narratives. As seen in: “Airline Close Calls Happen Far More Often Than Previously Known” and “How a Series of Air Traffic Control Lapses Nearly Killed 131 People,” by the NYT’s Sydney Ember and Emily Steel. [h/t Alan Levenson]

Goldin records. Claudia Goldin, who was awarded 2023’s Nobel Memorial Prize in Economic Sciences, maintains a faculty webpage with some of the data she’s created, digitized, and/or improved. They include records from the 1915 Iowa State Census (“the first census in the United States to include information on education and income”), rosters of eleven US orchestras from the 1930s to 1990s, compulsory education and child labor state laws during the early 1900s, appendix tables for a 1975 paper estimating the economic costs of the US Civil War, and more.

Postal code ecology. David Willinger et al. have used two major satellite data sources — the Shuttle Radar Topography Mission and Advanced Spaceborne Thermal Emission and Reflection Radiometer — to create ecolo-zip, “a novel geospatial dataset that provides a granular-yet-global, parsimonious-yet-rich ecological characterization of over 1.5 million postal codes across 94 countries and regions.” Those characterizations include “physical topography (elevation, mountainousness, distance to sea), vegetation (normalized difference vegetation index), and climate (surface temperature).”

Beautiful spreadsheets. The team behind infographics powerhouse Information is Beautiful maintains a catalog of the data used in 130+ of their projects, going back to 2009. The entries links to spreadsheet tables that present original research as well as numbers drawn more directly from academic studies, government reports, Wikipedia, and other sources. Recent topics include plastic waste, large language models, and Marvel movies.