Data Is Plural

... is a weekly newsletter of useful/curious datasets.

2017.05.17 edition

North Korean missile tests, food prices, rising seas, a Chicago PD watchlist, and 112,936 story arcs.

North Korean missile tests. The James Martin Center for Nonproliferation Studies publishes what it calls “the first database to record flight tests of all missiles launched by North Korea capable of delivering a payload of at least 500 kilograms a distance of at least 300 kilometers.” The database currently contains 107 missile tests — starting with North Korea’s first, launched in April 1984, to its latest, launched Sunday morning. For each test, the data includes the missile’s launch site, highest altitude, distance travelled, landing location, success/failure, and other details. [h/t Ian Greenleigh]

Global food prices. The UN World Food Programme’s vulnerability analysis group collects and publishes food price data for more than 1,000 towns and cities in more than 70 countries. The dataset, which goes back more than a decade, covers basic staples, such as wheat, rice, milk, oil, and more. It’s updated monthly and feeds into (among other things) the UNWFP’s price-spike indicators. Related: The Humanitarian Data Exchange, which hosts the dataset for the UN. Also: The Economist’s Big Mac Index. [h/t Andrew McCartney]

Rising seas. How might rising sea levels affect coastal flooding? A new-ish NOAA Technical Report, published in January, combines historical data on global sea levels with “regional factors contributing to sea level change for the entire U.S. coastline.” The result: Localized projections under six sea-level rise scenarios, ranging from “low” to “extreme.” You can download the data (at the bottom of this page) or explore it on a map. Related: Climate Central describes what NOAA’s “extreme” scenario could mean for America (including more maps and calculations). Previously: Tide gauge data (DIP 2016.03.23) and sea ice measurements (DIP 2016.09.14). [h/t Susie Cambria]

“The watch list Chicago police fought to keep secret.” The Chicago Sun-Times has obtained and published an August 2016 copy of the Chicago Police Department’s “Strategic Subject List,” a database that scores nearly 400,000 (unnamed) people on a scale from 10 to 500, based on an algorithm that attempts to estimate their risk of being involved in gun violence (either as a shooter or a victim). The database includes demographic, geographic, criminal history, and other information about the people it ranks. “But the database doesn’t indicate — and the police won’t say — how much weight is given to each factor in computing the scores, which are produced using an algorithm developed at the Illinois Institute of Technology,” according to the Sun-Times.

Story arcs. “The WikiPlots corpus is a collection of 112,936 story plots extracted from English language Wikipedia.” The plots describe movies, books, plays, TV series, TV episodes, video games, and other stories — essentially, any thing that has a Wikipedia article with the word “plot” in one of its subheadings. Related:Examining the arc of 100,000 stories: a tidy analysis” and “Gender and verbs across 100,000 stories: a tidy analysis,” two blog posts by David Robinson that use the data.