Data Is Plural

... is a weekly newsletter of useful/curious datasets.

2016.03.09 edition

Billionaires, legislative linguistics, historical mortgages, crossword plagiarism, and baseball.

Two thousand billionaires. Researchers have compiled a multi-decade database of the super-rich. Building off the Forbes World’s Billionaires lists from 1996–2014, scholars at Peterson Institute for International Economics have added a couple dozen more variables about each billionaire — including whether they were self-made or inherited their wealth. (Roughly half of European billionaires and one-third of U.S. billionaires got a significant financial boost from family, the authors estimate.)

Legislative linguistics. The Sunlight Foundation’s Capitol Words project lets you explore the frequency of words and phrases in the Congressional Record since 1996. For example: “weapons of mass destruction”, “war” vs. “peace”, or “Obamacare”. The underlying data is available via an API.

Historical mortgages. With the help of volunteers, the New York Public Library is transcribing 6,000+ mortgage and bond ledgers from Emigrant Savings Bank, founded in 1850 and the oldest such bank in the city. You can search the transcribed records, or download the (very) raw data.

Overlapping crosswords. The cruciverb industry is facing its first major plagiarism scandal, unearthed thanks to a newly-published database of crosswords that are at least 25% similar to previous-published puzzles.

Baseball, baseball, baseball. If you’re looking for historical data on baseball teams, players, salaries, or managers, Sean Lahman’s Baseball Archive likely has it. The archive was updated with data from the 2015 season last week. Related: Retrosheet’s game logs — a record of every major league game since 1871. [h/t Joe Murphy]