Data Is Plural

... is a weekly newsletter of useful/curious datasets.

2016.06.08 edition

Nuclear accidents, government court payouts, local justice data, the Netflix Prize, and chart-topping music.

Nuclear accidents. Researchers in Europe have published a database of 216 nuclear energy accidents — a compendium they say is “twice the size of the previous best data set.” For each accident, the database contains the date, location, description, and four measurements of severity: its ratings on the International Nuclear Event Scale and on the Nuclear Accident Magnitude Scale, the number of fatalities, and total monetary cost. (The three most expensive: Chernobyl, Fukushima, and a 1995 accident at Japan’s Monju Nuclear Power Plant, estimated to have caused $15.5 billion in damages.) [h/t Dad]

Government court payouts. The U.S. government maintains a “judgment fund,” which it uses to pay plaintiffs when federal agencies lose in court (or settle “actual or imminent lawsuits”). The Department of the Treasury, which administers the fund, publishes data on these payouts for each fiscal year going back to FY2006. [h/t CJ Ciaramella]

Local justice data. The Sunlight Foundation’s Hall of Justice brings together “nearly 10,000” criminal justice datasets and research documents from across the United States. You can search for topics and filter by geography, publisher, and accessibility (open, open-but-not-machine-readable, restricted access, et cetera.). Related: Sunlight’s “lessons learned from a year of opening police data.” [h/t Susie Cambria + Noah Veltman]

The Netflix Prize, archived. In 2006, Netflix launched a $1 million challenge to beat the company’s movie-recommendation algorithm. In 2009, Netflix awarded the prize to a group of AT&T scientists (though ultimately didn’t use the winning algorithm). The challenge, which was open to the public, was based on a dataset of 100 million ratings from 480,000 (anonymized) users, corresponding to more than 17,000 movies between Oct. 1998 and Dec. 2005. The dataset, once hosted at UC Irvine, is currently available through the Internet Archive. Previously: MovieLens, featured Jan. 27. [h/t Brandon Loudermilk]

Billboard hits and lyrics. Statistics grad student Kaylin Walker scraped 50 years of Billboard’s “Year-End Hot 100” rankings and those songs’ lyrics. Related: Walker’s analysis and methodology. [h/t Melissa Bierly]