Data Is Plural

... is a weekly newsletter of useful/curious datasets.

2020.02.12 edition

Banking crises, Europe in translation, simulated hurricanes, arXiv metadata, and the greatest hip-hop tracks of all time.

Banking crises. Luc Laeven and Fabián Valencia — economists at the European Central Bank and the International Monetary Fund, respectively — have built and maintain a dataset of systemic banking crises, like those that rippled across the globe in 2008. First published in 2008 and most recently updated in 2018, the dataset covers on 151 crises affecting 118 countries from 1970 to 2017. For each episode, the dataset provides the starting and ending dates, policy responses, output loss, fiscal cost, increase in public debt, and more. [h/t Erik Gahner Larsen]

Europe in translation. IATE is the European Union’s official terminology database, containing translations for words and phrases such as “orange juice,” “climate change policy”, and “competence of the Member States.” (That’s succo d’arancia in Italian, ilmastonmuutospolitiikka in Finnish, and tagállami hatáskör in Hungarian.) Over the past 20+ years, the project has accumulated more than 970,000 entries, translated into nearly 8 million phrasings in 25 languages. You can search the entries online or download the entire dataset as a single XML file. [h/t Laura Solana Garzón]

Tropical cyclone simulations. A team of scientists have used historical hurricane and typhoon data to simulate a 10,000 plausible years of cyclone activity. The dataset covers the world’s most active “basins” — the areas where cyclones form — and includes each simulated storm’s path, maximum wind speed, average pressure, and more. [h/t Jose A Cañizares]

arXiv metadata. Founded nearly 30 years ago, arXiv is an open-access repository of more than 1,600,000 scholarly articles — typically “preprints” of papers, uploaded by the authors before being peer-reviewed — in physics, math, computer science, statistics, economics, and several other fields. The website participates in Open Archives Initiative, providing metadata on uploaded articles through the initiative’s protocol; it also has an API. Last summer, computer science student Bora M. Alper collected the metadata for all the site’s papers and published it as a single file.

Biggie data. What are the greatest hip-hop songs of all time? Last year, the BBC posed that question to more than 100 artists, producers, critics, and other experts, asking each to rank their top five tracks. (Notorious B.I.G.’s “Juicy” nabbed the highest rating.) Software engineer Simon Jockers has turned the responses into a structured dataset and visualized the results.