Data Is Plural

... is a weekly newsletter of useful/curious datasets.

2023.09.20 edition

US facility greenhouse gas emissions, net migration estimates, gubernatorial approval, Department of Interior drones, and one million bugs.

US facility GHG emissions. The EPA’s Facility Level Information on Greenhouse Gases Tool “gives you access to greenhouse gas data reported to EPA by large emitters, facilities that inject CO2 underground, and suppliers of products that result in GHG emissions when used in the United States.” The information comes from the agency’s Greenhouse Gas Reporting Program, which also provides bulk data downloads. Per the EPA: “Approximately 8,000 facilities are required to report their emissions annually, and the reported data are made available to the public in October of each year.” The data, which go back to 2010, indicate the facility type, emissions reported, measurement methods, types of fuel used, and much more. Previously: Climate TRACE’s estimates of the world’s largest GHG emitters (DIP 2022.11.16). [h/t Terin V. Mayer]

Net migration estimates. Using national and subnational data on birth rates, death rates, and population counts, Venla Niva et al. have constructed a dataset of estimated net migration for each part of the world, each year between 2000 and 2019. The estimates are available as gridded data (with ~10km resolution) and at three levels of administrative units: national, provincial, and communal. (For the US, the latter two levels correspond to states and counties, respectively.) The researchers have also published an interactive map of the estimates for each administrative unit.

Gubernatorial approval. Political scientist Matthew M. Singer’s State Executive Approval Database contains the results of 10,000+ gubernatorial approval polls, spanning all 50 states and going back decades. The database, which builds on earlier efforts by Thad Beyle et al., lists each poll’s date, state, governor, pollster, sample size, sample type, ratings scale, percentage of positive/negative responses, and more. Previously: The Executive Approval Project (DIP 2019.10.16), which Singer co-directs.

DOI drones. Through a FOIA request to the US Department of the Interior, journalist Ben Welsh has obtained and published the agency’s drone roster. For each of the 850+ remote-controlled aircraft, the dataset lists the agency bureau and office, drone manufacturer, model, cost, serial number, and more. The FOIA request also unearthed spreadsheets listing specific drone flights and multi-flight deployments. Previously: Drone registration data (DIP 2022.12.21), also obtained by Welsh via FOIA.

Bug shots. Zahra Gharaee et al.’s BIOSCAN-1M Insect Dataset contains one million microscope photographs of bugs, each taxonomically classified by experts and supplemented with raw DNA sequences and genetic “barcode” identifiers. The dataset, part of the broader BIOSCAN initiative, includes 8,300+ species across 3,400+ genera; the specimens were primarily collected in Costa Rica, Canada, and South Africa, using tent-like traps. Previously: Bug splats (DIP 2020.03.04). [h/t Robin Sloan]