Data Is Plural

... is a weekly newsletter of useful/curious datasets.

2023.04.12 edition

Jail rosters, sanctions enforcement, border surveillance, flash flooding in urban England, and Dutch textile shipments.

Jail rosters. The NYU Public Safety Lab’s Jail Data Initiative has built a fleet of web scrapers to gather and process daily roster data from 1,000+ US city and county jails. The project’s aggregate metrics include the number of people in these jails each day, the daily numbers of people newly incarcerated and released, and how long people have been held (among those released). You can filter and download these counts by age, gender, and race. Profiles of individual jails also list the most common charges and the overall demographics of people held there. The time frames available vary by jail, but mostly begin in 2020 or 2021. You can also apply for access to person-level records. [h/t Adam Vine + Orion Taylor]

Sanctions enforcement. Political scientists Bryan R. Early and Keith A. Preble have assembled a dataset of the US Office of Foreign Assets Control’s sanctions-related penalties, settlements, and findings of violation since 2003. To do so, they combed through the agency’s mostly-PDF-based archive of these enforcement actions. Each of the dataset’s 1,000+ entries indicates the action date, type, and monetary amount; entity name, type, location, and sector; the specific sanctions programs violated; and more. Previously: OFAC’s sanctions lists (DIP 2018.02.21) and OpenSanctions (DIP 2021.09.08).

Border surveillance. The Electronic Frontier Foundation has constructed a map and dataset of 340+ Customs and Border Protection surveillance towers along the US-Mexico border. “Compiled using public records, satellite imagery, road trips, and even exploration in virtual reality,” the dataset indicates each tower’s location, name, type, and vendor; it also links to sources and satellite imagery. Separate entries list potential future towers proposed by CBP and automated license plate readers at CBP checkpoints. [h/t Corin Faife]

Flash flooding in urban England. ClimateNode’s Helen Jackson has created a dataset and series of interactive maps of recent summertime flash flooding in urban England. To construct the dataset, Jackson “analysed approximately 17,400 articles about flooding from around 300 newspaper websites,” and then used natural language processing to “extract the names of streets, buildings and other places affected” — some 2,800+ locations in all, corresponding to 56 flooding events since 2010. [h/t Giuseppe Sollazzo]

Dutch textile shipments. The Dutch Textile Trade Project “aims to understand the circulation of globally-sourced textiles on Dutch ships around the world in the seventeenth and eighteenth centuries by examining data drawn from trade records alongside samples of textiles and visual culture depicting textiles in use.” The 22,000+ entries in the project’s main dataset each represent a shipment from one port to another; they indicate the textile type, quantity, and value, shipment date, supplier, and more. [h/t Dan Bouk]