Data Is Plural

... is a weekly newsletter of useful/curious datasets.

2017.08.09 edition

Nutrition facts, commodity flows, pidgin and creole languages, UK coastal flooding, and talent agencies.

Nutrition facts. The USDA National Nutrient Database for Standard Reference is the primary source for most of the food nutrition facts you see in America. The database assesses more than 8,000 foods, from abiyuch to zwieback, and provides the average nutrient levels per 100 grams — e.g., protein, carbohydrates, vitamin D, caffeine, lycopene, and water. North of the border, you can find the (bilingual) Canadian Nutrient File. It’s based on the USDA data, but excludes stateside foods “known not to be on the Canadian market”, adds some foods (such as poutine and ptarmigan), and makes adjustments based on “Canadian levels of fortification and regulatory standards.” The United Kingdom has its own nutrient file, as do many other countries. [h/t Reddit user Alacritous]

Interstate commodity flows. The federally funded Freight Analysis Framework “integrates data from a variety of sources to create a comprehensive picture of freight movement among states and major metropolitan areas by all modes of transportation.” For each year between 2012 and 2015, the database “provides estimates for tonnage (in thousand tons) and value (in million dollars) by regions of origin and destination, commodity type, and mode.” Last week, Axios published an interactive map of the state-to-state flows for each commodity group, as well as some helpful caveats and “head-scratchers.” [h/t Chris Canipe]

Pidgin and creole languages. The Atlas of Pidgin and Creole Language Structures contains data on 76 languages, such as Trinidad English Creole, Afrikaans, Guadeloupean Creole, and Singapore Bazaar Malay. For each language, the dataset includes information about 130 “structural features,” example sentences, and more. Previously: The World Atlas of Language Structures (DIP 2016.01.06) and a database of the Trans-New Guinea language family (DIP 2015.11.04). [h/t Rachael Tatman]

A century of UK coastal flooding. Earlier this year, the researchers behind published an updated version of their their database of UK coastal floods. They combined tidal gauge data with reports from scientific journals, newspapers, and social media to identify 329 “coastal flooding events” that occurred between 1915 to 2016. For each event, the dataset includes the date, region, and severity level, which ranges from 1 (“nuisance”) to 6 (“disaster,” applied to only one event — the North Sea flood of 1953).

Talent agencies. California’s Department of Industrial Relations publishes a dataset of all licensed talent agencies, with each agency’s name, address, license number, workers’ comp insurer, and bond issuer. Florida publishes something similar. Previously: Texas’s licensed professionals (DIP 2015.12.09).