Data Is Plural

... is a weekly newsletter of useful/curious datasets.

2016.05.04 edition

Pirated papers, the Ku Klux Klan, disciplined doctors, goooooooooaaaaals, and grape harvests.

Scientific paper trails. Sci-Hub bills itself as “the first pirate website in the world to provide mass and public access to tens of millions of research papers.” Who’s downloading papers from the site? “Everyone,” Science magazine concluded after analyzing data culled from six months of Sci-Hub server logs. For every download, the dataset identifies the paper downloaded, the date and time, an anonymized version of the downloader’s IP address, and a rough location. [h/t Melissa Bierly + Tom Grahame]

The Ku Klux Klan, 1915–1940. Scholars at Virginia Commonwealth University have identified and mapped the locations of 2,000 KKK branches active in the early 20th century. The dataset contains the city, state, earliest-known-date, and sources for each “klavern.” Related:Active Hate Groups in the United States in 2015,” a report by the Southern Poverty Law Center. [h/t K Reed]

Disciplined doctors. The National Practitioner Data Bank tracks medical malpractice payments, license suspensions, Medicare expulsions, and other lists of penalized physicians. The public use data file includes dozens of details per entry but excludes the part that is almost certainly most important to patients: the doctors’ names. Related:Doctors perform thousands of unnecessary surgeries,” according to a 2013 USA Today investigation that relied partly on the NPDB.

Goooooooooaaaaal. OpenFootball collects and publishes results and rosters from national and international soccer/football matches, including the Premier League and the World Cup. Related: English soccer/football results, 1871–2014. [h/t Wendy Mak]

Grape timing. Climate scientists have compiled a dataset of grape-harvest-dates from 380 European vineyards, across 27 regions, and stretching back 650 years. The earliest data-point refers to a Burgundy harvest in 1354. Related: The original academic paper. [h/t Martín González]