Data Is Plural

... is a weekly newsletter of useful/curious datasets.

2017.05.24 edition

America’s card catalog, domestic radicalization, ransomware payments, 50 million doodles, and canned beer.

America’s card catalog. Last week, the Library of Congress released its largest dataset ever: nearly 25 million records for books, maps, manuscripts and other items in its online catalog. For each item, the data includes standardized bibliographic information, such as the title, author, publication date, and genre. (The dataset represents the online catalog as it was in 2013; more recent data will cost you.) Related: A bit of background about the library’s MARC (Machine Readable Cataloging Records) data format.

Domestic radicalization. The Profiles of Individual Radicalization in the United States (PIRUS) database “contains deidentified individual-level information on the backgrounds, attributes, and radicalization processes of nearly 1,500 violent and non-violent extremists who adhere to far right, far left, Islamist, or single issue ideologies in the United States” — including the Klu Klux Klan, the Taliban, and the Animal Liberation Front, among others. The dataset covers 1948 through 2013 and was released earlier this year by a team at the University of Maryland. [h/t Lorand Bodo]

Ransomware payments. When the malware program known as “WannaCry” hit hundreds of thousands of computers earlier this month, it demanded that the computers’ owners pay $300 in Bitcoin — or lose all of their data. Keith Collins at Quartz has been using Blockchain’s API to track Bitcoin payments to the three digital wallets that the hackers designated to receive the ransoms. He’s published the data and is also using it to power a Twitter bot. Related:Victims of the WannaCry ransomware attacks have stopped paying up” and “Inside the digital heist that terrorized the world—and only made $100k,” both by Collins. Previously: Historical Bitcoin prices (DIP 2017.03.08).

Fifty million doodles. Google is clever: It created a drawing game, got 15 million people to play it, and then turned those doodles into into a public dataset of people drawing. You can download the raw data, or just browse the doodles online.

🎶 Two thousand cans of craft beer on the wall 🎶. The website CraftCans.com publishes a database of 2,000+ canned beers. For each beer, the database lists its name, style, brewery, size, alcohol level, and bitterness. The website doesn’t provide a direct download, but — as Jean-Nicholas Hould points out — you can basically just copy-paste the website’s data into your favorite spreadsheet program. Or, if you want something slightly cleaner, you can use this script. Related: This data-profiling tutorial by Hould, which uses the data. Also related: RateBeer.com’s API, but you’ll need to request a developer key to use it. Plus: This interactive graphic, which uses the RateBeer data to explore America’s microbrew epicenters. And also: Official brewery production stats from the U.S. Alcohol and Tobacco Tax and Trade Bureau. [h/t Daniel Brady]