Data Is Plural

... is a weekly newsletter of useful/curious datasets.

2019.11.13 edition

Millions of violent crimes, online news language, interwar debt, food flows, and perceived vs. real ages.

Millions of violent crimes. The Trace has posted raw data on 4.3 million murders, nonfatal shootings, assaults, robberies, and rapes, obtained from 56 police and sheriff’s departments in the United States. Related: Sarah Ryley’s introductory Twitter thread. Also related: The Trace and BuzzFeed News’ investigative reporting on cities’ failure to arrest shooters, for which Sarah, Sean Campbell, and I used many of these datasets.

Online news language, updated live. The GDELT Project’s Web News Ngram dataset keeps track the frequency individual words and two-word in online news around the world. The dataset incorporates news sources in 142 languages and provides overall word counts for every 15-minute window since January 1, 2019. An additional dataset tracks phrasings used in 10 character-based languages. Previously: GDELT’s similar dataset for television news (DIP 2019.08.21). [h/t Kalev Leetaru]

Interwar debt. In a recent working paper titled “Instruments of Debtstruction,” researchers at the International Monetary Fund share a “comprehensive instrument-level database of sovereign debt for 18 advanced and emerging countries over the period 1913–46.” (The dataset currently published alongside the paper seems to be missing one of the 18 countries, Russia.) The “instruments” include bonds, credit lines, and several other forms of debt; for each instrument issued, the dataset contains the debt’s coupon rate, maturity, and currency.

Food flows. A team of researchers has developed a statistical model to estimate the flow of food commodities between every pair of US counties in 2012. To calculate the estimates, the researchers used data from the Census’s Commodity Flow Survey, ORNL’s Freight Analysis Framework, the USDA’s Census of Agriculture, and several other sources. Related: One of the paper’s authors summarizes the findings. [h/t Jain Family Institute weekly newsletter]

Perceived vs. real ages. asks visitors to guess people’s ages, based on photographs. You can download a database of the results, which currently includes more than 220,000 guesses about more than 4,600 photos. For each photograph, the database also includes some metadata, such as the person’s actual age. Related: The researchers describe their project in Scientific Data.