Data Is Plural

... is a weekly newsletter of useful/curious datasets.

2021.03.03 edition

Police misconduct settlements, congressional e-newsletters, California vaccine availability, illegal cheetah commerce, and soil.

Police misconduct settlements. Reporters at FiveThirtyEight and The Marshall Project filed freedom-of-information requests to 50 US cities, asking for data on all civil lawsuits against their police departments/officers “that resulted in a monetary legal settlement” in 2010–2019 — including incident and settlement dates, plaintiff and defendant names, allegation descriptions, amounts awarded, and more. They received full or partial data from 31 of those cities, and found that total payouts exceeded $3 billion. Don’t miss: The data repository’s “words of caution,” which discourage comparisons between cities. Related: Coauthor Laura Bronner’s introductory Twitter thread. [h/t Eric Gardner]

Congressional e-newsletters. For more than a decade, political scientist Lindsey Cormack’s DCinbox project has collected “every official e-newsletter sent by sitting members of the U.S. House and Senate.” You can search the corpus online and also download all the emails as a series of CSV files, grouped by month. For each of the 130,000+ mailings, the files provide the date, subject, body, and sender’s Bioguide ID. (April 2020 was the highest-volume month, with more than 2,300 messages, nearly all of them mentioning the coronavirus.)

California vaccine availability. The volunteer-driven VaccinateCA project calls “hundreds of potential [COVID-19] vaccination sites daily, asking them if they have the vaccine and if so to whom they will administer it to and how to get an appointment.” You can examine the results on their website, and also use their API to access the latest data, which includes status reports on every California county, 20+ health care providers, and thousands of potential vaccination sites. Related: “What We’ve Learned (So Far).” [h/t Simon Willison]

Illegal cheetah commerce. A team of conservationists has compiled a decade of data on illegal cheetah sales and ownership, drawing from “over 300 sources, including direct communications with field informants, veterinarians, and cheetah owners,” court records, social media, and more. The dataset covers 1,800+ cases — including both actual seizures and alleged/suspected incidents — involving 4,000+ cheetahs or cheetah parts/derivatives.

Soil. SoilGrids “uses state-of-the-art machine learning methods to map the spatial distribution of soil properties across the globe,” including organic carbon density, pH, clay content, and more. The maps use data from the World Soil Information Service, which standardizes millions of soil records. Both projects are run by the International Soil Reference and Information Centre, which also catalogs dozens of public-access soil datasets. [h/t Jonathan Whitaker]