Data Is Plural

... is a weekly newsletter of useful/curious datasets.

2022.11.30 edition

Pills, per-pupil spending, travelers’ coronavirus variants, Indonesia earthquake intensities, and more roadkill.

Pills. From its launch in 2009 until its retirement last year, the National Library of Medicine’s Pillbox project collected and created 8,600+ photographs of medical pills. The images, which are still available to download, are accompanied by a dataset that provides information on 83,000+ pills’ shape, size, color, markings, dosage, and other characteristics derived from drug labels. Related: The library’s DailyMed service provides frequently-updated images and data from 140,000+ labels submitted to the FDA for drugs and other regulated products. As seen in: Jon Keegan’s Pillbox overview in Beautiful Public Data. [h/t Giuseppe Sollazzo]

Per-pupil spending. The National Education Resource Database on Schools (“NERD$”) describes itself as the “first-ever national dataset of public K-12 spending by school.” Its researchers, based at Georgetown University, aggregate and standardize the expenditure disclosures that the Every Student Succeeds Act requires states to publish. You can explore and download the data they’ve processed for fiscal year 2019, including spending totals, enrollment counts, and normalized figures that facilitate cross-state comparisons. For 2020–22, you can access “the raw files we obtain from states while our team conducts validation checks and norms the data.” As seen in: “How much money do states spend on education?” (USAFacts). [h/t Douglas Hummel-Price]

Travelers’ coronavirus variants. In the past year, the CDC’s Traveler-Based Genomic Surveillance program has collected 60,000+ voluntary nasal swabs from people disembarking international flights at four major US airports. The agency uses the samples as an “early warning system” to detect emerging SARS-CoV-2 variants and publishes weekly metrics that include participation counts, positivity rates (per pooled sample), and variant distributions. Read more: An interview with two private-industry experts working on the program, by the COVID-19 Data Dispatch’s Betsy Ladyzhets.

Indonesia earthquake intensities. Gempa Nusantara, a database compiled by Stacey S. Martin et al., uses historical documents to catalog 7,300+ “macroseismic effects” of 1,200 earthquakes near Indonesia during a four-century span, from 1546 to 1950. It provides summaries of the local reports and categorizes the effects according to the European Macroseismic Scale, which focuses on the intensity of ground-shaking and potential impacts on buildings and terrain.

More roadkill. Florian Heigl et al. have compiled a pair of datasets containing 15,000+ reports of vertebrate roadkill from 2014 to 2020, submitted by 900+ people through a phone app. The datasets differ in identification confidence, but both provide locations, dates, and taxonomic classifications. Although the records span 40+ countries, the majority come from Austria, where the project is now focused. Previously: Andean roadkill (DIP 2021.07.07).