Data Is Plural

... is a weekly newsletter of useful/curious datasets.

2016.12.07 edition

Pipelines, solar panels, Chicago cab rides, STEM surveys, and classical music.

Pipelines. The U.S. Energy Information Administration publishes a bunch of geographic data, including shapefiles mapping the country’s crude oil, petroleum product, hydrocarbon gas liquid, and natural gas pipelines. (They were last updated five months ago.) Additionally, the Pipeline and Hazardous Materials Safety Administration keeps track of “significant incidents” — for example, those that caused a serious injury or $50,000 in damage. Related:Six maps that show the anatomy of America’s vast infrastructure.” Also related: ProPublica’s Pipeline Safety Tracker, covering 1986–2012.

Solar panels. The Open PV Project is a “community driven, comprehensive database” of solar panel installations in the U.S., ranging from home installations to utility-scale projects. The database, run by the Department of Energy, contains more than 1 million installations — with a total capacity of 16,000+ megawatts — and tracks their locations, sizes, costs, installers, and other variables. [h/t Dad]

Chicago cab rides. Last month, Chicago’s city government published data on more than 100 million local taxi rides taken in the city since 2013. (The city gathers the data through “periodic reporting by two major payment processors believed to cover most taxis in Chicago.”) The dataset contains each ride’s start/end times, pickup/dropoff location (based on Chicago’s “community areas”), distance, cost, payment type, and taxi company. Related:Analyzing 1.1 Billion NYC Taxi and Uber Trips, with a Vengeance,” which contains pointers to similar data for New York City. [h/t Dan Nguyen]

STEM surveys. The IPUMS Higher Ed portal provides data from three “leading surveys for studying the science and engineering (STEM) workforce in the United States.” The surveys currently cover 1993 through 2013 and include questions about educational choices, demographics, employment outcomes, and more. Requires a free account. [h/t Michael A. Rice, a teacher at Ingraham High School in Seattle]

Classical music, annotated.MusicNet is a collection of 330 freely-licensed classical music recordings, together with over 1 million annotated labels indicating the precise time of each note every recording, the instrument that plays each note, and the note’s position in the metrical structure of the composition.” [h/t Lon Riesberg]