Data Is Plural

... is a weekly newsletter of useful/curious datasets.

2022.04.27 edition

Earmarks, intermediate care facilities, more county-level mortality data, alpine snow, and the meta-Dataverse.

Earmarks. The US government’s spending bill for FY 2022, enacted last month, heralded the return of earmarks, which were banned for a decade. This time around, Congress is publishing a set of documents that describe legislators’ specific funding requests — but as sideways PDFs instead of structured data. So the Bipartisan Policy Center has converted the PDFs into a spreadsheet listing the 4,975 approved earmarks within them. Bloomberg Government has also compiled and shared a similar spreadsheet. Both files list each earmarked project’s name, recipient, state, and price tag, as well as the legislator(s) who made the request and the subcommittee that approved it; BPC’s file additionally includes the location, agency, agency-account, and legislators’ Bioguide IDs. [h/t First Branch Forecast]

Intermediate care facilities. In the US, intermediate care facilities provide residential, long-term care to people whose intellectual or developmental disabilities require active treatment and continuous supervision. On Monday, colleagues at BuzzFeed News published an investigation into one of the largest ICF owners, supplemented by an analysis of inspection data gathered through the Centers for Medicare & Medicaid Services’ QCOR portal. (Disclosure: I helped to write an initial version of the data-collection code.) The records, available in bulk for the first time, enumerate surveys conducted between 2010 and 2021, the facilities examined, and the types of deficiencies inspectors found. [h/t John Templon]

More county-level, COVID-related mortality data. The CDC provides provisional, county-level mortality data for 2018 to the near-present (DIP 2021.12.15), but requires you to build specific, targeted queries. So researchers at Documenting COVID-19 used the agency’s WONDER API to gather key stats for every county, such as breakdowns of deaths among causes the CDC says are commonly comorbid with COVID-19, and have shared them in a downloadable repository. [h/t Betsy Ladyzhets + Juan Francisco Saldarriaga]

Alpine snow. Michael Matiu et al. (2021) compiled a dataset of daily snowfall and snow depths at 2,000+ measurement stations in the European Alps, spanning the years 1970 to 2019. They gathered records from open data portals and through requests to authorities in five countries, standardized them into a consistent format, and filled in gaps where possible. [h/t Olivier Lejeune]

The meta-Dataverse. The Dataverse project is an open-source platform for data sharing. (This newsletter has linked to dozens of Dataverse-hosted datasets, mostly on the original Harvard Dataverse.) It offers an API for fetching metadata on each known installation and the datasets within them, which project member Julian Gautier has been using to assemble and update bulk snapshots.