Data Is Plural

... is a weekly newsletter of useful/curious datasets.

2017.01.25 edition

Colleges and economic mobility, three centuries of the UK economy, TV talk, European trees, and mugshots.

Colleges and economic mobility. A team of economists studying “the equality of opportunity” has published new research identifying which colleges “help the most children climb the income ladder.” For their analysis, the researchers combined federal tax records and data from the Department of Education. California State University–Los Angeles was one of the greatest engines of mobility; nearly 1 in 10 students enrolled there began in the bottom 20% of income but reached the top 20% by their early thirties. You can download the findings, which include similar statistics for more than 2,000 schools, as a series of spreadsheets. Related:Some Colleges Have More Students From the Top 1 Percent Than the Bottom 60. Find Yours,” from the New York Times.

Three centuries of UK macroeconomic data. The Bank of England publishes a spreadsheet of historical economic data going back, in some cases, to the late 1600s. The country’s GDP in 1700 was £11.7 billion in 2013 prices. That’s about 1/157th the size of the UK’s GDP in 2015. And in November 1694, monthly short-term interest rates were roughly 6%. [h/t Ian Greenleigh]

TV talk. The GDELT Project and the Internet Archive have collaborated to make the latter’s Television News Archive more powerfully searchable. Their new tool, announced in December, lets you search across “more than 5.7 billion words from over 150 distinct stations spanning July 2009 to present” at a sentence-by-sentence level. The results are downloadable as CSV or JSON files. Previously: The Political TV Ad Archive (Feb. 2, 2016).

European trees. EU-Forest is a new dataset that, according to its authors, “extends by almost one order of magnitude the publicly available information on European tree species distribution.” The new project merges and harmonizes data from 21 national forest surveys and two related databases. In all, EU-Forest includes more than 580,000 observations of more than 200 species in 1km-by-1km square plots of land, and is available in both tabular and geospatial file formats. Previously: American tree maps (Dec. 23, 2015) and NYC street trees (Nov. 16, 2016).

Standard mugshots. The National Institute of Standards and Technology publishes Special Database 18 “for use in development and testing of automated mugshot identification systems.” The dataset contains 3,248 mugshot photos portraying 1,573 different people (mostly men), and includes each arrestee’s age and gender. [h/t Noah Veltman]