Data Is Plural

... is a weekly newsletter of useful/curious datasets.

2020.01.15 edition

Missing migrants, global antitrust, microbes, geographic renaming proposals, and the life of a supercomputer.

Missing migrants. The Missing Migrants Project “tracks deaths of migrants, including refugees and asylum-seekers, who have gone missing along mixed migration routes worldwide.” Research for the project, an initiative of the International Organization for Migration, began after the fatal Lampedusa shipwrecks of October 2013. For each incident, the project’s datasets specify the location, the number of people who died, the number who are missing, the number who survived, the sources of information, the source quality, and more. Previously: European migration deaths, 1993 to May 2018 (DIP 2018.07.18). [h/t u/cavedave + Topi Tjukanov]

Global antitrust. The Comparative Competition Law project classifies the legal provisions and enforcement of antitrust laws around the world, over time. The project is run by law professors Anu Bradford and Adam Chilton, and features several datasets and detailed codebooks. (They require an email address but no formal registration.) Related: Chilton’s introductory Twitter thread. [h/t Libor Dusek]

Microorganisms. The Microbe Directory is an attempt to profile more than 7,000 bacteria, viruses, archaea, and other microorganisms. The directory can be downloaded in bulk and describes the microbes’ optimal temperature, optimal pH, Gram stain, pathogenicity, antimicrobial resistance, and more.

Geographic (re)naming proposals. When someone wants to officially name or rename a geographic feature of the United States — such as a mountain, creek, or island — they file a proposal with the US Board on Geographic Names. Those proposals end up on the agency’s “Action List,” the most recent year of which can be downloaded as a spreadsheet. Previously: The Board’s database of every US geographic name (DIP 2015.10.21). [h/t Noah Veltman]

The life of a supercomputer. CIEMAT — a public institution in Spain that studies energy and the environment — has published nearly a decade of processing receipts from its Euler supercomputer. The records, which span the supercomputer’s entire lifetime, contain metadata for more than 9 million computing jobs, including timestamps, memory usage, and more.