Data Is Plural

... is a weekly newsletter of useful/curious datasets.

2020.10.07 edition

Polling places, Xinjiang internment camps, historical newspaper imagery, a millennium of UK macroeconomic data, and a lifetime of first kisses.

Polling places. As part of a new investigative series, the Center for Public Integrity and Stateline have published a dataset of polling places during the 2012–18 US general elections for 30 states (and plan to add more states “in the coming weeks”). To assemble the dataset, reporters filed 1,200 records requests, and then converted the disparate files they received into standardized CSVs listing each polling place’s county, precinct, name, and address.

Xinjiang internment camps. In August, BuzzFeed News published a two-part investigation into China’s “vast new infrastructure” for imprisoning Muslim minorities in its Xinjiang region. The reporters used a novel methodology to of hundreds of detention facilities: examining the gaps in Baidu Maps’ satellite imagery. Last month, they published a dataset of those facilities’ coordinates and statuses. The Australian Strategic Policy Institute has also launched its Xinjiang Data Project, identifying more than 380 detention facilities as well as the destruction of religious/cultural sites in the region. The project — which builds on previous research by the institute, BuzzFeed News, and others — classifies the detention sites into four tiers, from “low-security re-education facilities” to “suspected maximum-security prisons.” [h/t William Yang]

Historical newspaper imagery. The Library of Congress’ Newspaper Navigator dataset extracts “visual content” from more than 16 million pages of newspapers from 1789 to 1963, drawn from the library’s Chronicling America project (DIP 2017.08.16). To compile the dataset, its creators used machine learning to detect seven types of visuals: photos, illustrations, maps, comics, editorial cartoons, headlines, and ads. They also built an interactive search tool. [h/t Jessamyn West]

Even older UK economic figures. The Bank of England has considerably expanded its longitudinal dataset on the UK’s economy (DIP 2017.01.25), renaming it “a millennium of macroeconomic data.” A few indicators (such as GDP per capita) now stretch back to 1086, thanks to the Domesday Book, while several others (such as consumer price inflation) now extend to the 13th century. [h/t Alex Albright]

A lifetime of first kisses. The Kiss List explores artist Galen Beebe’s 48 first kisses. Reconstructed from “memories and journals,” and developed with her partner John West, the dataset and visualization present “a set of facts that show who, what, where, when, and why I kissed how I did.”