Data Is Plural

... is a weekly newsletter of useful/curious datasets.

2025.05.14 edition

Deportation records, European workforces, California ghost guns, US dams, and what the nose knows.

Deportation records. The Deportation Data Project, run by a team of academics and lawyers, “collects and posts public, anonymized U.S. government immigration enforcement datasets.” These include data from border apprehensions, deportations, Title 42 expulsions, ICE arrests and detentions, ICE-operated flights, and more. Some of the data files come directly from the government, while others were initially obtained from the government by other organizations, such as the University of Washington Center for Human Rights. The project also posts information about its Freedom of Information Act requests. Read more: The project’s “U.S. Immigration Enforcement Data: A Short Guide.” As seen in: “The Rising Cost of ICE Flying Immigrants to Far-Flung Detention Centers” (Bloomberg). [h/t Alex Albright]

European workforces. Each quarter, dozens of countries collectively conduct more than 1.7 million interviews for the European Union Labour Force Survey. The survey, the continent’s largest, aims “to classify people into 3 groups that are mutually exclusive and cover the whole target population”: employed, unemployed, and outside the labor force. Eurostat publishes aggregate results, with breakdowns by age, sex, country, nationality, citizenship status, education level, sector, and more. Detailed microdata are also available to approved researchers. As seen in: Bruegel’s labor market dashboard. [h/t Nina Ruer]

California ghost guns. “Ghost guns have been a uniquely Californian issue,” with the state accounting for a majority of the untraceable firearms that are reported to the ATF, according to The Trace. Earlier this year, on its Gun Violence Data Hub, the publication posted datasets counting the ghost guns recovered by California law enforcement agencies, as well as “firearm-level data on guns reported lost or stolen in the state.” [h/t Aaron Mendelson]

US dams. The National Inventory of Dams “documents all known dams in the U.S. and its territories that meet certain criteria” related to the dam’s height, reservoir size, and likely impacts of its “failure or mis-operation.” The inventory, maintained by the US Army Corps of Engineers since the 1970s, now includes 92,000+ structures. The data — available via a searchable map, bulk downloads, and an API — indicate each dam’s name, location, year built, structural characteristics, purpose, operational status, and much more. Previously: Global Dam Watch’s datasets (DIP 2020.01.29) and the USGS’s National Hydrography Dataset (DIP 2022.10.12).

What the nose knows. Antonie Louise Bierling et al. have published a dataset of “descriptions, evaluative ratings, and qualitative labels for 74 chemically diverse mono-molecular odors, rated by a large sample of young adults.” Another paper by Bierling et al. “elicited body odor descriptions from 2,607 participants across 17 countries and 13 languages” to assemble “a standardized lexicon of body odor words.” Related: The Pyrfume Project provides “tools, models, and data for odorant-linked research.”