Data Is Plural

... is a weekly newsletter of useful/curious datasets.

2020.04.29 edition

Parlimentary speeches, COVID-19 preprints, mobility indicators, US retail, and poems by kids.

Six million parliamentary speeches. ParlSpeech V2 contains 6.3 million parliamentary speeches from nine countries: Austria, the Czech Republic, Germany, Denmark, the Netherlands, New Zealand, Spain, Sweden, and the United Kingdom. The dataset, created by political scientists Christian Rauh and Jan Schwalbach, includes the full text of each speech, plus the date, speaker, and the speaker’s party. Related: Roll call votes from the European Parliament’s first six terms (1979–2009). [h/t Robert Stelzle]

COVID-19 preprints. Preprints — academic papers published online before they’ve gone through traditional peer review — have become a common way for scientists to disseminate their coronavirus-related findings. So researchers Nicholas Fraser and Bianca Kramer have begun compiling a dataset of more than 6,000 COVID-19 preprints. For each paper, the dataset includes the title, abstract, DOI, date posted, and the hosting repository (such as medRxiv, the most common so far).

Mobility indicators. Tech companies have repurposed some of the data they collect from you into explorable, downloadable datasets that estimate the degree to which movement patterns have (or haven’t) changed in recent months. Among them: Apple, which is quantifying requests for directions; Google, which is counting visits to places such as grocery stores and transit station; and Descartes Labs, which is tracking smartphone movements. Related: Sociologist Kieran Healy recently found and explained a curious February 17 spike in Apple’s data. [h/t Hillary Hartley]

US retail. Since the 1950s, the US Census Bureau has conducted monthly surveys of retail and food-services industries. The results — which estimate sales and inventory numbers by subsector — are available as machine-readable data going back to 1992. The next release is scheduled for May 15. [h/t Giuseppe Sollazzo]

Poems by kids. PoKi is “a corpus of 61,330 poems written by children from grades 1 to 12,” scraped with permission from a Scholastic website. The dataset includes each poem’s title, text, and character count, plus the author’s first name and grade. Noteworthy: “PoKi is made freely available for research with the condition that the research be used for the benefit of children.”