Data Is Plural

... is a weekly newsletter of useful/curious datasets.

2021.11.10 edition

Toxic pollution, campsite reservations, time zones, NFTs, and damn fine coffees.

Toxic pollution. Last week, ProPublica published what it’s calling “the most detailed map of cancer-causing industrial air pollution in the U.S.,” along with an investigation based on the map’s revelations. In a methodology article, reporters explain how they analyzed billions of rows of data from the Environmental Protection Agency’s Risk-Screening Environmental Indicators model, which “takes a variety of inputs, including emissions data, weather modeling, and facility specific information, and puts out estimated concentrations of toxic chemicals in the air around industrial facilities.” The EPA publishes the model’s output as bulk downloads, in an online dashboard, and in other formats. Related: The model incorporates information from the EPA’s Toxics Release Inventory, which publishes self-reported emissions data from certain mandated industrial facilities.

Campsite reservations. The US government’s Recreation Information Database “represents an authoritative source of information and services for millions of visitors to federal lands, historic sites, museums, and other attractions/resources.” It provides bulk data and an API describing recreational areas, campgrounds, campsites, permit entrances, scheduled tours, and more. You can also download detailed historical data on individual campsite and tour reservations going back to 2006. As seen in: “The Camping Crunch,” published by the Center for Western Priorities, and accompanying methodology. [h/t @mtmagog]

Time zones. The Time Zone Database, used extensively by major operating systems and programming languages, “contains code and data that represent the history of local time for many representative locations around the globe.” Its files include detailed notes on sourcing and are “updated periodically to reflect changes made by political bodies to time zone boundaries, UTC offsets, and daylight-saving rules.” Read more: “Exploring 120 years of timezones,” by Colin Eberhardt. [h/t Lon Riesberg]

NFTs. A team developing open-source tools for monitoring cryptocurrency activity has built a dataset of 7 million transactions of non-fungible tokens (NFTs) on the Ethereum blockchain. The dataset covers April to late September 2021, spans 9,000 NFT projects, and records each transaction’s sender, receiver, value, timestamp, and location in the blockchain. Read more: The team’s analysis. [h/t Ibrahim Ahmed]

Damn fine coffees. Earlier this year, data visualist Judit Bekker live-blogged her effort to catalog and visualize every coffee consumed in all three seasons of Twin Peaks. Bekker’s dataset indicates the episode, timestamp, scene, location, circumstances of 258 coffee-drinkings, plus who drank them. [h/t Soph Warnes]