Data Is Plural

... is a weekly newsletter of useful/curious datasets.

2020.02.19 edition

Standardized electric-utility info, school vaccination rates, global empires, farming households, and pinball.

Electric utilities, standardized. “Electric utilities report a huge amount of information to the US government,” but “much of this data is not released in well documented, ready-to-use, machine readable formats.” That assessment comes from the Public Utility Data Liberation (PUDL) project, which aims to clean, standardize, and cross-link the electric utility information gathered by various agencies. Earlier this month, PUDL published its first data release; it includes information originally collected through Energy Information Administration Form 860 (details about individual generators) and Form 923 (individual power plants), the Environmental Protection Agency’s Continuous Emissions Monitoring System (hourly emissions), and the Federal Energy Regulatory Commission’s Form 1 (price rates and financial audits). The code PUDL uses to download, extract, and standardize the raw data is also available online. [h/t Zane Selvans]

School vaccination rates. Reporters at the Wall Street Journal collected data on school-specific vaccination rates — both overall and also for the measles, mumps, and rubella (MMR) vaccine — from 32 US states’ health departments. In total, the WSJ’s dataset covers more than 46,000 schools, of which 42,000 have at least one vaccination rate available. Most states provided data for the 2018–19 school year; the rest did so for 2017–18.

4,500 years of empire. To help study “the imperial roots of global trade,” a trio of economists have built a dataset of 168 historical empires. For each empire, the dataset lists the modern-day countries under rule (and during which years), plus whether the empire had a centralized administration, centralized religion, and/or monopoly on coin-minting. [h/t Jain Family Institute]

Farm households. The Rural Household Multiple Indicator Survey (RHoMIS) “collects information on 758 variables covering household demographics, farm area, crops grown and their production, livestock holdings” and more. In an academic article published this month, researchers from the Nairobi-based International Livestock Research Institute presented a dataset of responses collected from 13,310 farm households in 21 countries in sub-Saharan Africa, Central America, and Asia between 2015 and early 2018.

Pinball. The crowdsourced website provides data on more than 25,500 pinball machines at more than 7,400 locations in the US, UK, Canada, Australia, Finland, and Japan. The website’s API lets you access the underlying data, including the specific machines available at each location.