Data Is Plural

... is a weekly newsletter of useful/curious datasets.

2020.11.11 edition

Child detention, USPS performance, transit costs, the early Islamic world, and the opera.

Child detention. The Marshall Project has obtained and published official data from US Customs and Border Protection listing 580,000+ times that the agency detained migrant children since early 2017. For each detention, the dataset includes the date and time the child entered and left CBP custody, as well as the child’s age, gender, and citizenship. Related: The Marshall Project’s report on the data.

USPS performance. As part of Jones v. United States Postal Service, a federal lawsuit filed in August, USPS must submit weekly performance reports that indicate, at a national and district level, the percentage of mail that was processed (though not necessarily delivered) on time. The agency files these reports as PDFs; Save the Post Office, a decade-old website run by a retired English professor, has been collecting those PDFs and converting them into spreadsheets. Related: Aaron Gordon’s pre-election analysis of the USPS data, from Gordon’s (limited-run) newsletter about the postal service.

Transit costs. “Why do transit-infrastructure projects in New York cost 20 times more on a per kilometer basis than in Seoul?” With the aim of answering questions like these, the NYU-based Transit Costs Project is building a dataset that already spans more than 500 urban rail projects around the world. For each project, the dataset specifies the city, start year, end year, rail length, number of stations, total cost, and more.

The early Islamic world. The al-Ṯurayyā Project features an interactive map of the early Islamic world, with 2,000 named locations — from Damascus to Baghdad and beyond — and historical routes between them. The underlying dataset includes geocoordinates, Arabic spellings, transliterations, primary sources, and other details. [h/t Jajwalya Karajgikar]

The opera. Operabase has gathered information about more than 500,000 opera performances staged since 1996. The website doesn’t provide direct downloads but you can access a dataset on six full seasons of stagings, covering thousands of runs in hundreds of cities, thanks to a “data donation” to support Alexander N. Cuntz’s study of how copyright affects performance frequency.