Data Is Plural

... is a weekly newsletter of useful/curious datasets.

2018.06.06 edition

Gas and oil infrastructure, volcanoes and eruptions, retracted medical papers, cenotes, and NICAR lightning talks.

Global gas and oil infrastructure. The Department of Energy’s National Energy Technology Laboratory has published what it says is the “first-ever database inventory of oil and natural gas infrastructure information from the top hydrocarbon-producing and consuming countries in the world.” The database contains tons of geospatial information and “identifies more than 4.8 million individual features like wells, pipelines, and ports from more than 380 datasets in 194 countries. It includes information about the type, age, status, and owner/operator of infrastructure features.” Helpful: The authors’ (detailed) methodology paper. [h/t Michael McLaughlin]

Volcanoes and eruptions. The Smithsonian Institution’s Global Volcanism Program maintains a database of more than 12,000 volcanoes and 11,000 eruptions — dating from 10450 BCE to the present year. You can search the data online, and then download the results as a spreadsheet. Related:Here’s every volcano that has erupted since Krakatoa.” [h/t Duncan Geere + Rachel Schallom + Lazaro Gamio]

Retracted medical papers. PubMed, the National Library of Medicine’s search engine for biomedical and life-sciences literature, lets you search for retracted publications; just add “retracted publication”[PTYP] to your query. For instance, here are retracted articles that were originally published in 2016. Using the “Send to” link at the top-right of the query pages, you can download all the results. Data scientist Neil Saunders has gathered this data and condensed it into an interactive, graphical report. (Clicking on the axis labels takes you the relevant PubMed search.) Related: The code behind Saunders’ report. [h/t u/cavedave]

Cenotes. The Mexican state of Yucatán publishes a dataset listing the names and locations of cenotes, the region’s famous water-filled sinkholes. Related: Other datasets from the Programa de Ordenamiento Ecológico Territorial del Estado de Yucatán. [h/t Forest Gregg]

NICAR lightning talks. Ever since 2010, the National Institute for Computer-Assisted Reporting (NICAR) annual conference has featured a session of five-minute “lightning talks,” selected by popular vote. NICARian Christine Zhang has compiled a spreadsheet of all 309 lightning talk proposals, the proposed presenters, their professional affiliations, how many votes each proposal received, and more. Related:Nine Years of NICAR Lightning Talks (and Cats),” Zhang’s analysis of the data. Also related: The code behind Zhang’s analysis.