Data Is Plural

... is a weekly newsletter of useful/curious datasets.

2025.05.21 edition

Unemployment insurance, FiveThirtyEight’s public data, corporate contracts, positions of power in Italy, and state formation/dissolution.

Unemployment insurance. The US Department of Labor’s Office of Unemployment Insurance publishes dozens of datasets collected through its coordination of state-administered programs. These include quantifications — reported regularly by each state — of claimant demographics, denials of eligibility, appeal caseloads, overpayments, disaster unemployment assistance, and much more. For example: Data from ETA 9050 reports, available monthly and going back to the late 1990s, indicate how many of each state’s claimants received their first payments within one week, two weeks, et cetera. The office also provides a chartbook and various report-generating tools.

FiveThirtyEight’s public data. ABC News shut down FiveThirtyEight earlier this year. Datasets built and collected by the publication were featured often in this newsletter. (For part of 2022, FiveThirtyEight also paid to republish Data Is Plural.) Its still-available data repository contains 160+ subdirectories, covering sports predictions, presidential cabinet turnover, media mentions, surveys on comma usage and steak preferences, the Bechdel Test, competitive Scrabble, and much more. [h/t Jan Willem Tulp]

Corporate contracts. Peter Adelson and Julian Nyarko’s Material Contracts Corpus contains “over one million contracts filed by public companies with the U.S. Securities and Exchange Commission (SEC) between 2000 and 2023,” which the authors collected from the SEC’s EDGAR filings database. In addition to the text of the contracts, the dataset provides metadata — including party names and contract types — extracted from the documents using machine-learning techniques. The dataset is available to download in bulk and search online. [h/t Alice Kalinowski]

Who runs Italy? Sesso è Potere, an ongoing project from info.nodes and onData, examines gender representation in positions of power in Italy. The 2025 report draws on individual-level datasets of leaders in politics (national lawmakers, ambassadors, regional legislators, mayors, city council members, etc.), business, media, higher education, and other fields. Read more: The project’s 2023, 2022, and 2021 reports. [h/t Liberiamoli Tutti]

Long- and short-lived states. Marten Scheffer et al.’s Mortality of States Index “documents commonly agreed state formation and end dates for over 440 different states, covering approximately 5,000 years from 3100 BCE (Egyptian Dynasties I and II) to 2021.” The data — built from sources including Seshat, the Correlates of War Project, and The Encyclopedia of Empire — “cover a broad set of entities ranging from persistent empires to fleeting polities such as the Maukhari dynasty of Northern India or multiple Khaganates that lasted under a century.”