Data Is Plural

... is a weekly newsletter of useful/curious datasets.

2023.08.02 edition

Nursing home inspections, local ideology, COVID-19 in deer, Colombian public-sector algorithms, and forageable fruit.

Nursing home inspections. A couple of weeks ago, ProPublica launched a major update to its Nursing Home Inspect database, originally published in 2012 (and briefly mentioned in DIP 2016.07.06). The new version “includes more data, new views that summarize problems, and advanced search features.” As an accompanying guide explains, the database now “covers nearly 400,000 deficiencies from over 90,000 reports at over 15,000 homes.” The project links to its source data, including the federal government’s list of Medicare-certified nursing homes and the date, regulation, scope/severity, and narrative text of all deficiencies. Tip: The Seattle Times provides guidance for downloading those deficiency records. Later today: ProPublica is hosting a webinar about the updated database.

Local ideology. “Little is known about the American public’s policy preferences at the level of Congressional districts, state legislative districts, and local municipalities,” Chris Tausanovitch and Christopher Warshaw wrote in the Journal of Politics a decade ago. To address the issue, the researchers applied Bayesian statistical methods to large public-opinion surveys to generate numeric “ideal point” (left-vs-right) estimates at many geographic levels. Through their American Ideology Project, they have since updated these estimates, including most recently with data through 2021. Previously: Ideology estimates for state legislators (DIP 2020.01.01), updated in April 2023, and members of Congress (DIP 2023.03.01). [h/t Mike Stucka]

COVID-19 in deer. Aijing Feng et al. “collected 8,830 respiratory samples from free-ranging white-tailed deer across Washington, D.C. and 26 states in the United States between November 2021 and April 2022.” The researchers sequenced the COVID-19 genomes from 391 of the samples that tested positive for the virus, finding that the infections “originated from at least 109 independent spillovers from humans, which resulted in 39 cases of subsequent local deer-to-deer transmission and three cases of potential spillover from white-tailed deer back to humans.” The study’s public spreadsheets include data about each sequenced sample (collection date, state, virus lineage, etc.), each spillover event, and more. [h/t Tyler Dedrick]

Colombian public-sector algorithms. Juan David Gutiérrez et al. have compiled a spreadsheet listing 113 automated decision systems (“sistemas de decisión automatizada”) in the Colombian public sector. Each entry lists the system’s name, government entity, level of government (national, department, municipal), sector, description, objectives, data used, various categorizations, and much more. The project also includes a spreadsheet of the 300+ sources used in the compilation.

Forageable fruit. The Falling Fruit project “is a celebration of the overlooked culinary bounty of our city streets.” It provides a map — “not the first of its kind, but [aspiring] to be the world’s most comprehensive” — and downloadable dataset of 1.5 million locations of edible plants in public, although not strictly fruit. The entries come from user contributions, as well as imports of community maps and tree inventories. [h/t Susie Cambria]