Data Is Plural

... is a weekly newsletter of useful/curious datasets.

2019.05.01 edition

National oil companies, decertified police officers, Nobel laureates’ papers, piano performances, and Fortnite.

State-owned oil companies. The browseable and downloadable National Oil Company Database, a project of the Natural Resource Governance Institute, pulls together official data on nearly 100 metrics concerning 71 oil/gas companies owned by 61 countries. For instance: Petróleos de Venezuela, S.A., reported transferring roughly $5.5 billion dollars to its government in 2016, down from nearly $28 million in 2013; Saudi Aramco produces the equivalent of 13 million barrels of oil daily; and in 2017, Russia’s Rosneft generated approximately $283,000 in revenue per employee. [h/t Rachel Ziemba]

Decertified police officers. USA Today has collaborated with more than 100 of its affiliated newsrooms and the Invisible Institute to gather police disciplinary records “from thousands of state agencies, prosecutors and local police departments” around the country, creating “the biggest collection of police misconduct records” ever assembled. They’re starting to make the records public, beginning with a database of 30,000+ officers who’ve had their certifications revoked. The database lists each officer’s name, state, agency, and year decertified. It includes records from 44 states, but you won’t find Massachusetts in it, for instance, because the state doesn’t license police officers. And although there are a handful of records from New York state, none regard NYPD officers; that’s in part because the country’s largest police force keeps its misconduct cases secret. (Last year, colleagues at BuzzFeed News published a database of 1,800 NYPD officers accused of misconduct, based on some of those secret records, obtained from a source who requested anonymity.)

Nobel laureates’ papers. A team of researchers has compiled the publication histories of 545 Nobel laureates — 92% of the prize-winners in physics, chemistry, and physiology-or-medicine between 1900 and 2016. The researchers say they spent more than 1,000 hours collecting and validating the data, drawing on the Nobel website, laureates’ personal pages, Wikipedia entries, and the Microsoft Academic Graph (featured in DIP earlier this month).

Piano performances. The MAESTRO dataset gathers recordings from nine years of the International Piano-e-Competition, where “virtuoso pianists perform on Yamaha Disklaviers which, in addition to being concert-quality acoustic grand pianos, utilize an integrated high-precision MIDI capture and playback system.” The MIDI data “includes key strike velocities and sustain pedal positions”; additional metadata contains each performance’s year, composer, and title. Related: OpenAI’s music-composing MuseNet neural network, trained in part on the MAESTRO data.

Fortnite. Through an unofficial API, you can access to data on the latest items, weapons, challenges, and other aspects of the global video game phenomenon.