Data Is Plural

... is a weekly newsletter of useful/curious datasets.

2017.03.08 edition

The federal checkbook, historical Bitcoin prices, drug exclusivity, blockbuster speaking roles, and pictures of food.

The federal checkbook. From Treasury.io: “Every day at 4pm, the United States Treasury publishes data tables summarizing the cash spending, deposits, and borrowing of the federal government.” Those data tables “catalog all the money taken in that day from taxes, the programs, and how much debt the government took out.” On Monday, for instance, the government spent $481 million on the Postal Service. One hitch: The Treasury’s data tables are (subjectively) ugly and (objectively) spreadsheet-unfriendly. So Treasury.io — an open-source civic project — continuously converts the files into good ol’ tabular data. You can download individual tables as CSVs, get the whole dataset as a big SQLite database, or query the API. There’s also a data dictionary and a Twitter bot.

Historical Bitcoin prices. The Bitcoin exchange rate hit an all time high last week, at more than $1,290 to the dollar. That’s according to CoinDesk’s Bitcoin Price Index, an average rate derived from several major exchanges. You can download daily and hourly data for the index and its components. [h/t Jan Doggen]

Drug patents and exclusivity. The FDA’s “Orange Book” lists approved drugs, their associated patents, and government-granted exclusivity rights. The Orange Book is available as a 1,400-page PDF, but you can also download the key data as structured text files. The files are updated monthly. Related:Drugs For Rare Diseases Have Become Uncommonly Rich Monopolies,” published by Kaiser Health News and NPR in January. Question for readers: The Orange Book data comes as tilde-delimited files, the first I’ve ever seen. Do you have ~any other examples~? [h/t Sydney Lupkin]

Speaking roles in 2016’s blockbusters. Researcher Amber Thomas has parsed the transcripts of last year’s 10 highest grossing films. The resulting data files indicate each character’s number of turns speaking, number of words spoken, and gender. Previously: Dialogue from 2,000 movies, by gender (April 13, 2016).

Pictures of food. A trio of European researchers has published a dataset containing 101,000 photos of food — 1,000 images each from 101 food categories, all downloaded from foodspotting.com. The categories include apple pie, escargots, onion rings, paella, bibimbap, prime rib, and more. [h/t Reddit user cavedave]