Data Is Plural

... is a weekly newsletter of useful/curious datasets.

2024.01.24 edition

Police data metadata, historical fishing intensity, Medicaid offices, Seine water quality, and human heights.

Police data metadata. The Police Data Accessibility Project is compiling a meta-dataset of police records: where to find them online, what time period they cover, how often they’re updated, and other characteristics. The searchable, downloadable dataset includes links to 1,700+ resources, such as traffic stop datasets, crime maps, use-of-force reports, contract and policy listings, and many other types of records, across hundreds of agencies. The team also maintains a dataset of 23,000+ criminal legal agencies. Related: The Vera Institute of Justice’s Police Data Transparency Index, which scored 90+ local police agencies across 10 categories of data transparency; its methodology page links to a more detailed, downloadable spreadsheet.

Historical fishing intensity. Yannick Rousseau et al. have generated a series of datasets estimating annual fishing effort from 1950 to 2017 by country, year, gear type, vessel length, sector (industrial, artisanal motorized, and artisanal unmotorized), and category of species targeted. The datasets provide “information on number of vessels, engine power, gross tonnage, and nominal effort,” a metric that multiplies the engine power by the number of days at sea. Their sources include “a range of publicly available sources, governmental reports, and grey literature”. Related: Co-author Reg A. Watson’s Global Fisheries Landings dataset, which estimates “commercial, small-scale, illegal and unreported fisheries catch,” also since 1950. Previously: Global Fishing Watch’s fishing effort datasets, based on vessel tracking signals (DIP 2021.01.13).

Medicaid offices, geocoded. Paul R. Shafer et al. have created a dataset of 3,000+ Medicaid offices in the US, identified via state and county government websites. The team of Boston University researchers, who focused on “public-facing Medicaid offices providing enrollment support,” have provided each office’s agency name, state, city, and address, and latitude/longitude coordinates (primarily sourced via the US Census Bureau’s geocoder).

Seine water quality. Ahead of the 2024 Summer Olympics, Paris has been trying to decontaminate the Seine river to swimmable levels. But the city’s efforts appear to be falling short, according to government water testing data obtained, published, and analyzed by Mathieu Lehot-Couette, a reporter at The records include results from periodic samples taken at 14 points along the river, which Lehot-Couette has standardized into a spreadsheet of 1,400+ measurements of E. coli and enterococci between 2015 and 2023.

Human heights. Economic historians Jörg Baten and Matthias Blum have assembled a dataset on average male heights by decade and country. The estimates, derived from hundreds of scholarly and statistical sources, stretch back several centuries and span 140+ countries. A related resource page also provides individual-level data compiled by Baten and others, such as the heights of 1,000+ 19th-century Bavarian military conscripts. [h/t Karsten Johansson]