Data Is Plural

... is a weekly newsletter of useful/curious datasets.

2022.07.06 edition

Banned and challenged books, mass expulsions, European air traffic, Shakespeare, and Saturday Night Live.

Banned and challenged books. A recent report from PEN America identified 1,500+ decisions, made between July 2021 and March 2022, to ban books from classrooms and school libraries. A spreadsheet accompanying the report lists each decision’s date, type, state, and school district, as well as each banned book’s title, authors, illustrators, and translators. Related: Independent researcher Tasslyn Magnusson, in partnership with EveryLibrary, maintains a spreadsheet of both book bans and book challenges, with 3,000+ entries since the 2021–22 school year. [h/t Gary Price]

Mass expulsions. Political scientist Meghan M. Garrity’s Government-Sponsored Mass Expulsion dataset focuses on “policies in which governments systematically remove ethnic, racial, religious or national groups, en masse.” Using a combination of archival research and secondary sources, Garrity documents 139 such events, estimated to have expelled more than 30 million people between 1900 and 2020. For each expulsion, the dataset provides “information on the expelling country, onset, duration, region, scale, category of persons expelled, and frequency.” To download it, visit the Journal of Peace Research’s replication data portal and search for “mass expulsion.”

European air traffic. Eurocontrol, the main organization coordinating Europe’s air traffic management, publishes an “aviation intelligence portal” with a range of industry metrics, including traffic reports that count the daily number of flights by country, by airport, and by operator. The portal also offers bulk datasets on topics such as airport traffic, flight efficiency, estimated CO2 emissions, and more. [h/t Giuseppe Sollazzo]

Shakespeare. The Folger Shakespeare “brings you the complete works of the world’s greatest playwright, edited for modern readers.” Its digital editions of the Bard’s plays and poems are available to read online and to download in various file formats. It also provides an API, with endpoints for synopses, roles, monologues, word frequencies, and more. [h/t Cameron Armstrong]

Saturday Night Live. Joel Navaroli’s aims to catalog and cross-reference every episode, cast member, host, character, sketch, impression, and other aspects of Saturday Night Live’s 47-and-counting seasons. An open-source project by Hendrik Hilleckes and Colin Morris scrapes much of that information into structured data files. As seen in: Morris’s 2017 analysis of gender representation in SNL sketches.