Data Is Plural

... is a weekly newsletter of useful/curious datasets.

2020.01.01 edition

Water quality, legislator ideology, Google copyright complaints, California university history, and game show risk-taking.

Global water quality. The UN’s GEMStat project provides “scientifically-sound data” on freshwater quality around the world. The data portal lets visitors explore and download water-sample results from thousands of stations in more than 80 countries. The information available for each sample varies, but it can include chemical, biological, and physical properties. Note: Not all locations have been sampled recently, and data downloads are limited to 500 stations at a time and for noncommercial purposes only. Related: “The Invisible Water Crisis,” a World Bank report from last year that used the GEMStat data. [h/t Rylan Dobson]

State legislator ideologies. Political scientists Boris Shor and Nolan McCarty’s have assigned ideology scores, on a conservative-to-liberal scale, to every US lawmaker in all 50 state legislatures. The most recent update, published May 2018, covers more than 22,000 legislators from 1993 through 2016. Shor and McCarty derived the numeric scores from a combination of legislative voting records and responses to Vote Smart’s “Political Courage Test.”

Copyright complaints. Google has received millions of requests from copyright holders to delist billions of URLs from its search results. The company’s transparency reports include a section where you can explore and download data about these requests. One of the datasets describes 8.5 million delisting requests, with their dates, copyright holders, numbers of URLs targeted, and links to more details in the Lumen archive. Another contains every web domain targeted, while another lists the URLs for which Google says it took no action. [h/t Dan Nguyen]

California university history. The UC ClioMetric History Project is digitizing decades of administrative records from the University of California and other schools in the state (such as USC and Stanford). So far, they’ve uploaded data on more than 750,000 student enrollments, tens of thousands of faculty members, and 800,000 courses.

Game show gambles. To study how people make decisions in risky situations, a team of academics analyzed contestants’ choices in 100+ episodes of Deal or No Deal that aired in the Netherlands, Germany, and the US. Their dataset is available through ICPSR (registration required) and the Wayback Machine.