Data Is Plural

... is a weekly newsletter of useful/curious datasets.

2016.05.25 edition

Elevation maps, criminal risk assessments, historical San Francisco rents, soccer salaries, and photography biographies.

Ain’t no mountain high, ain’t no valley low. Governments around the world have used “LiDAR” — a laser-powered surveying technology — to build impressively precise elevation maps. In many cases, they’ve also released these topographic datasets to the public. The U.S., for instance, publishes gobs of LiDAR data through the Interagency Elevation Inventory. And you can also find LiDAR datasets for the United Kingdom, Spain, Finland, Slovenia, Denmark, Switzerland, the Netherlands, and New York City. Related: Using LiDAR data to print a 3D map of London.

Risky predictions. “There’s software used across the country to predict future criminals. And it’s biased against blacks,” a ProPublica analysis has found. The investigation focused on risk assessments and recidivism in Broward County, Florida, and found that black defendants were more likely than white defendants to be mislabeled as “high risk.” The reporters have published their methodology, code, and the underlying data — including two years of Broward County risk assessments — on GitHub.

Historical San Francisco rents. To help understand San Francisco’s soaring real estate prices, Eric Fischer transcribed decades of apartment and house listings in the San Francisco Chronicle. For each year from 1948 through 1979, Fischer jotted down every monthly rent advertised in the paper on the first Sunday in April. (Similar data for 1979 through 2001 is available from San Francisco’s Housing Study DataBook.) The transcriptions are available on GitHub. [h/t Kendall Taggart + Michael Andersen]

American soccer salaries. The Major League Soccer Players Union publishes salary data going back to 2007, and released 2016’s figures last week. (At $7.17 million in total compensation, Orlando City’s Kaká ranks as the league’s highest-paid player.) The MLSPU publishes the data as PDFs; I’ve converted those PDFs into CSVs for you. [h/t Rose Eveleth + John Templon]

Photography biography. The Photographers’ Identities Catalog aggregates data on more than 110,000 photographers and photo studios throughout history. The information “has been culled from trusted biographical dictionaries, catalogs and databases, and from extensive original research” by the New York Public Library’s photography experts. The catalog — which includes data on gender, geography, range of years active, and more — is available as raw CSVs on GitHub.