Data Is Plural

... is a weekly newsletter of useful/curious datasets.

2024.07.10 edition

Federal inmate complaints, corporate AI activity, East Asia building outlines, US coral reefs, and ancient shipwrecks.

Federal inmate complaints. The Federal Bureau of Prisons’ Administrative Remedy Program “allow[s] an inmate to seek formal review of an issue relating to any aspect of his/her own confinement.” In October 2022, the Data Liberation Project (which, disclosure, I run) filed a FOIA request seeking records from the agency database that tracks those complaints. In response, BOP last month provided data on 1.78 million complaint and appeal submissions filed from January 2000 through late May 2024, spanning nearly 1 million distinct cases. The records, published yesterday with the help of volunteers, indicate when each filing was received, its relevant case number, complaint subject, facility where the issue occurred, case status, status update date, reasons for rejection/closure, and other details.

Corporate AI activity. The Private-Sector AI Indicators dataset, from Georgetown University’s Emerging Technology Observatory, provides “a diverse range of indicators of AI-related activity for hundreds of companies worldwide, from startups to multinationals.” For each of the 670+ companies included, the dataset counts the number of AI-related research articles published by its employees (disaggregated by topic), AI-related patents filed (by application area and use-case), and workforce (overall and those estimated to be AI-involved). It also lists each company’s main location, sector, growth stage, and description, as well as aliases, stock listings, and identifiers in several external data sources. See also: An interactive version of the database. [h/t Zach Arnold]

East Asia building outlines. Recent years have seen the development of ambitious datasets that provide the outlines of buildings by the millions. For instance, from the archives: buildings in the US (DIP 2018.07.18), Africa (DIP 2021.08.25), and Canada and New Zealand (DIP 2019.09.25). In the Journal of Remote Sensing, however, Qian Shi et al. note a relative lack of such data for buildings in East Asia, which the authors attribute “to the more complex distribution of buildings and the scarcity of auxiliary data”. As an antidote, they’ve generated a dataset that outlines more than 280 million buildings in 2,800+ cities across China, Japan, South Korea, North Korea, and Mongolia.

US coral reefs. The National Coral Reef Monitoring Program collects scientific and socioeconomic/attitude survey data related to the coral reefs offshore of the continental United States, Hawaii, and US territories. It provides the data through public visualizations as well as download tools and raw files. The scientific data include species-level coral cover, colony density, bleaching prevalence, and disease rates; fish populations; water alkalinity and dissolved inorganic carbon levels; and more. [h/t Gary Price]

Ancient shipwrecks. The Summary Geodatabase of Shipwrecks 1500BCE-1500CE merges two catalogs of ancient wrecks: one from the Oxford Roman Economy Project and one from Harvard’s Mapping Past Societies project. Building on scholarly research by Toby Parker, Julia Strauss, and others, the combined effort includes 1,900+ known wrecks, listing (where known) their coordinates, depth, estimated time period when wrecked, year discovered, cargo, and more.