Data Is Plural

... is a weekly newsletter of useful/curious datasets.

2024.05.22 edition

Federal court dockets, Amazon purchases, Security Council resolutions, satellite-aided rescues, and protected wines.

Federal court dockets. Journalist Matt Clark has compiled a database of more than 350 million docket entries across more than 13 million cases in 180+ federal courts — including the majority of district, appellate, and bankruptcy courts. The records, which Clark collected through the RSS feeds that many federal courts provide, span 2013 to the near-present. Clark’s downloadable database provides information about each docket entry (time filed, entry number, description, and URL), case (number, name, type, and URL), and court. Although the database does not include the docketed documents themselves, they can be retrieved via PACER and the free RECAP archive, among other sources.

Amazon purchases. The MIT Media Lab’s Alex Berke et al. have compiled “a first-of-its-kind dataset containing detailed purchase histories from 5027 U.S. consumers, spanning 2018 through 2022, with more than 1.8 million purchases […] crowdsourced through an online survey and shared with participants’ informed consent.” The published data include “order date, product code, title, price, quantity, and shipping address state,” and are “linked to survey data with information about participants’ demographics, lifestyle, and health.” The researchers found that a stratified subsample of the data demonstrated “expected seasonal trends and strong relationships to other public datasets.” [h/t Data Science Community Newsletter]

Security Council resolutions. Seán Fobbe et al.’s Corpus of Resolutions: UN Security Council “collects and presents for the first time in human and machine-readable form all resolutions, drafts, and meeting records of the UN Security Council, including detailed metadata, as published by the UN Digital Library and revised by the authors.” It covers all 2,700+ resolutions from the council’s founding in 1946 through early 2024. In addition to providing the texts all six official UN languages, the dataset includes each resolution’s title, date, council votes, related meeting number, meeting transcript, keywords, countries of focus, and more. An auxiliary dataset represents the corpus’s internal citations as a directed graph. [h/t Sharon Machlis]

Satellite-aided rescues. NOAA’s Search and Rescue Satellite-Aided Tracking (SARSAT) program is part of an international collaboration to locate distress beacons activated (manually or automatically) by mariners, aviators, and wilderness explorers. The agency publishes annual maps of SARSAT-enabled rescues, along with data for the most recent year-plus. NOAA has also provided Data Is Plural with data for 2016–2022. The maps and data files contain each rescue’s date, category, description, beacon type, coordinates, and number of people saved. [h/t Dan Brady]

France and Italy’s protected wines. Sebastian Candiago et al. have assembled a dataset of 5,400+ Italian and French wines granted Protected Designation of Origin status, restricting their production to specific geographies and methods. For each wine, the dataset lists its name, country, designated area, color, category, grape varieties used, maximum allowed yields, registration date, and more. Previously: Protected European ham and the EU’s register of protected indications (DIP 2023.05.24).