Data Is Plural

... is a weekly newsletter of useful/curious datasets.

2021.04.28 edition

Water access points, foreign labor requests, congressional scandals, global mail, and the Tour de France.

Water access points. Launched in 2015, the Water Point Data Exchange today describes 577,000+ specific water access points: boreholes, hand-dug wells, protected springs, rainwater harvest tanks, and more. The platform gathers information from governments and their partners in 50+ countries — mostly in Africa, but also with 10,000+ data-points each in Afghanistan, Bangladesh, Haiti, and India. The records indicate the coordinates of each access point, the date checked, water availability when checked, the water source and/or transport system, and other details. [h/t Katy Sill and Adam Kariv]

Foreign labor requests. In order to hire foreign workers through the government’s H-1B, H-2A, and H-2B programs, US employers need permission from the Department of Labor. The agency collects and publishes data on each “certification” request, detailing the employer (name, location, industry, etc.), job position (title, pay, etc.), and approval status. The datasets go back more than a decade for each program and receive quarterly updates, the most recent being posted last week. Related: At BuzzFeed News, we used the data throughout our 2015/16 series investigating the H-2 program, and maintain a dataset that standardizes key fields from the raw files. [h/t George Ho]

Congressional scandals. In 2018, Michael G. Miller and Brian T. Hamel published a study examining how voters and donors responded to scandals embroiling members of the US House between 1980 and 2010, building on the work of Scott Basinger and others. They have since expanded the dataset, which now covers both the House and the Senate and extends through 2018, providing information on 316 legislator-scandal combinations (categorized as financial, sexual, political, or “other”) and their outcomes.

Global mail. For more than a century, the Universal Postal Union has collected and published statistics about the world’s postal systems. Online, you can query and export country-level data — the number of letter-boxes and permanent post offices, operating revenue, total staff, and much more — going back to 1980. Related: Jon C. Rogowski et al. have used historical UPU reports to count the number of post offices per country between 1875 and 2007. Previously: US post office locations, 1639–2000 (DIP 2021.04.07).

Le Tour. On its official website, the Tour de France lists riders’ results in its famed bicycle race since 1903. The site doesn’t provide downloads, but applied mathematician Thomas Camminady has scraped it to build a CSV file containing each finisher’s rank, time, team, and more.