Data Is Plural

... is a weekly newsletter of useful/curious datasets.

2022.10.05 edition

Grid emissions, chain and indie restaurants, wildfire smoke pollution, federal audits, and a decade of tasks.

Grid emissions. Ember, an “energy think tank that uses data-driven insights to shift the world from coal to clean electricity,” has begun compiling annual and monthly statistics on electricity demand, generation, and estimated greenhouse gas emissions by country, standardized from national and international sources. The annual estimates span two decades and 200+ countries and territories; the monthly dataset provides somewhat less coverage. Both can also be explored online. Related: Singularity’s Open Grid Emissions initiative estimates the hourly grid emissions of balancing authorities and power plants in the US, currently for 2019 and 2020. Previously: Other energy-related datasets. [h/t Philippe Quirion]

Chain and indie restaurants. Xiaofan Liang and Clio Andris of Georgia Tech’s Friendly Cities Lab have published a map and dataset examining the “chainness” of 700,000+ US restaurants. Starting with records provided by a marketing-data company, the researchers standardized the restaurants’ names, counted their frequencies, and classified them as chains (those with more than five outlets) or not. The dataset also lists each restaurant’s cuisine and location. As seen in: Andrew Van Dam’s exploration of the data for his new-ish Washington Post column, Department of Data.

Wildfire smoke pollution. Marissa L. Childs et al. have developed a “machine learning model of daily wildfire-driven PM2.5 concentrations using a combination of ground, satellite, and reanalysis data sources that are easy to update.” (PM2.5 refers to particulate matter 2.5 micrometers in diameter or smaller.) The researchers then used that model to generate daily smoke PM2.5 estimates for each county, Census tract, and 10-kilometer-grid tile in the contiguous US, for 2006–2020. Read more: Coverage and maps in the New York Times. [h/t George LeVines]

Federal audits. Nonprofits, state/local governments, and other noncommercial entities expending $750,000+ of federal funds in a year are required to undergo a standardized audit of their financials and compliance. The US Federal Audit Clearinghouse maintains a public database of those audits; it offers bulk downloads of the report data (about the auditee, auditor, findings, and more), as well a tool to search and access individual reports. [h/t Big Local News]

A decade of tasks. Between April 2009 and February 2019, software engineer Renzo Borgatti set 17,000+ daily tasks for himself. He completed slightly less than half of them. He labeled them with tags such as “@meeting”, “@talk”, and “@clojure”. He estimated how many “pomodoros” each would take, and how many they really did. We know this because Borgatti allowed Derek M. Jones to publish a partially-redacted dataset of his tracked tasks. Previously: One software company’s task estimates (DIP 2019.04.24), also published by Jones.