Data Is Plural

... is a weekly newsletter of useful/curious datasets.

2018.05.23 edition

Ebola, public transit, political resistance campaigns, wind, and Wikipedia’s citations.

Ebola. Caitlin Rivers, a computational epidemiologist at the Johns Hopkins Center for Health Security, has started compiling data tracking the current Ebola outbreak in the Democratic Republic of Congo. So far, the datasets are based on case counts and other information from the DRC’s Ministry of Health and the World Health Organization. A series of “data interpretation notes” accompanies each dataset. (Rivers administered a similar data repository during the 2014 Ebola outbreak.) Related:Most Maps of the New Ebola Outbreak Are Wrong,” by Ed Yong.

Public transit, curated. As a way to “lower the barrier“ for analyzing public transportation data, researchers at Finland’s Aalto University have published “a curated collection of [now more than] 25 cities' public transport networks in multiple easy-to-use formats including network edge lists, temporal network event lists, SQLite databases, GeoJSON files, and the GTFS data format.” On the project’s website, you can browse, visualize, and download each city’s data. (The cities are mostly in Europe and Australia, but also include Detroit, Winnipeg, and Antofagasta, Chile.) Previously: TransitLand and TransitFeeds (DIP 2016.07.27). [h/t NYU Data Science Community Newsletter]

Political resistance campaigns. The Nonviolent and Violent Campaigns and Outcomes (NAVCO) Data Project, based at the University of Denver, “catalogues major nonviolent and violent resistance campaigns around the globe from 1900-2013.” The project’s initial dataset explored the general characteristics of hundreds of campaigns; follow-up datasets have examined the annual activity and tactics of smaller subsets. Each dataset comes with a detailed codebook. Note: Free registration is required to download the most recent datasets. [h/t Peace Science Digest]

Wind. Earlier this month, the Department of Energy’s National Renewable Energy Laboratory made a big new slice of its Wind Integration National Dataset available online. The latest version provides API access to 50 terabytes of wind-related measurements — about 10% of the full database. It includes “barometric pressure, wind speed and direction, relative humidity, temperature, and air density data” between 2007 and 2013, from nearly 5 million locations in/near the continental United States. The NREL has also published an animated map of the data. Note: Free registration is required to access the API. Previously: Wind turbines (DIP 2018.04.25). [h/t Michael McLaughlin]

What Wikipedians cite. The Wikimedia Foundation has published a dataset listing each clearly-cited source (e.g., a book with an ISBN, a scholarly article with a DOI, etc.) on each page of each of Wikipedia’s 298 languages editions — 15,693,732 source-page combinations in all. Related:The Most-Cited Authors on Wikipedia Had No Idea,” by Louise Matsakis. [h/t Ted Lawless]