Data Is Plural

... is a weekly newsletter of useful/curious datasets.

2023.04.26 edition

Drinking water violations, childcare prices, kinship terms, art history allocations, and “the oldest experimental crop field in America.”

Drinking water violations. Through the Safe Drinking Water Act, the US Environmental Protection Agency sets baseline standards for the country’s ~150,000 public water systems. Although enforcement is mostly delegated to the states and territories, all monitoring, violation, and enforcement data is reported to the EPA and stored in its Safe Drinking Water Information System. The agency provides bulk downloads of the data, going back decades, plus a search tool and dashboard. As seen in: “Which cities have health issues with their drinking water?” (USAFacts). Related: Sara Hughes et al.’s Municipal Drinking Water Database, which connects information about 2,000+ municipal water systems to local demographic, government, climate, and political indicators. [h/t Greg Pierce]

Childcare prices. The National Database of Childcare Prices, launched in January by the Department of Labor’s Women’s Bureau, “is the most comprehensive federal source of childcare prices at the county level.” For each county and year from 2008 to 2018, the dataset provides estimates of the median and 75th-percentile weekly cost, disaggregated by provider type and child age. The estimates are calculated from the market surveys the federal Child Care and Development Fund requires participating states to conduct. [h/t Erik Gahner Larsen]

Kinship terms. Kinbank is a browseable and downloadable database of family-tree nomenclature for 1,000+ spoken languages, across 100+ types of relationships. Examples include ahätatum (Akkadian for one’s younger sister), yerudê (Galibi Carib for one’s husband’s brother’s wife), and ɗan’ùbā (Hausa for one’s paternal half-brother). As the project’s team describes in a recent paper, they’ve collected these “kinship terminologies” mostly from secondary sources, which “ranged from ethnographies and grammars, to simpler descriptions like wordlists”; those sources “are primarily in Roman script […] and can contain transcription inconsistencies across languages.”

Art history allocations. For her undergraduate thesis, “Quantifying Art Historical Narratives,” Holland Stam measured the amount of space (in text and in images) devoted to each artwork and artist in 25 editions of two major art history textbooks: Gardner’s Art Through the Ages and Janson’s History of Art. Stam’s thesis repository includes the measurement data, which also indicates each artist’s nationality, gender, race, and ethnicity. Related: An R package for the data. As seen in: “Resampling to understand gender in #TidyTuesday art history data,” a post and screencast by Julia Silge.

The Morrow Plots. The University of Illinois’ Morrow Plots, established in 1876, “are the oldest experimental crop field in America and the second oldest in the world.” An interdisciplinary team has compiled, cleaned, and standardized the experiment’s archival records (such as this notebook). For each year and plot, their dataset lists the crop, date of planting, treatment plan, amount of various substances applied, yield per acre, and more.