Data Is Plural

... is a weekly newsletter of useful/curious datasets.

2023.08.09 edition

Mapping material, sweeteners, Federal Reserve communications, English families, and winning numbers.

Much mapping material. The Overture Maps Foundation has released its first datasets, which include 59 million “points of interest” (landmarks, businesses, parks, etc.), 785 million building outlines, road network data, and administrative boundaries. The initiative, which is steered by several giant tech companies, “could help third-party developers use maps that don’t rely on Google and Apple,” The Verge’s Emma Roth writes. The datasets draw on a range of sources, including the project’s member-companies, OpenStreetMap, and USGS’s 3D Elevation Program. Read more: “Exploring the Overture Maps places data using DuckDB, sqlite-utils and Datasette,” by Simon Willison, who considers the data release “a really big deal.” [h/t Avi Levin]

Sweeteners. The US Department of Agriculture’s Sugar and Sweeteners Yearbook Tables provide “summary statistics on sugar, sugarbeets, sugarcane, corn sweeteners (dextrose, glucose, and high-fructose corn syrup), and honey.” Compiled by the agency’s Economic Research Service from a range of national, international, and industry sources, the statistics are provided as regularly-updated spreadsheets, many of which go back multiple decades. They estimate global and country-level production, supply, distribution, and prices, as well as US imports and consumption. [h/t Sam Larson]

Federal Reserve communications. Agam Shah et al. have compiled a corpus of key communications by the US Federal Reserve’s Federal Open Market Committee, which “controls the three tools of monetary policy — open market operations, the discount rate, and reserve requirements.” Gathered from the Fed’s website, the corpus includes all meeting minutes and speeches from 1996 to mid-October 2022, and all press conferences from April 2011 to mid-October 2022. The published records include the raw text and metadata of each communication, as well as datasets filtered to key sentences. Previously: Federal Reserve Bank directors (DIP 2021.05.05) and Fed forecasts (DIP 2018.02.07).

English families. The Families of England project, led by economic historians Gregory Clark and Neil Cummins, aims “to reconstruct the economic and social position, and the demography, of a representative set of English families” over time. A recent paper by Clark includes a public version of the dataset, which “details the family connections of 422,374 people with rarer surnames in England for births from 1600 to 2022.” The dataset, based in part on genealogies from the Guild of One-Name Studies, indicates (where available) each person’s years of birth, marriage, and death, plus indicators of literacy, sex, occupational status, and more. [h/t Derek M. Jones]

Winning numbers. New York State’s Gaming Commission publishes various lottery-related datasets, including the winning numbers for many national and state lotteries, such as Powerball (since 2010), Mega Millions (since 2002), and Pick 10 (since 1987). New York isn’t alone; the Colorado Lottery, for instance, also publishes downloadable drawing histories. Their Powerball results go back to August 2001 and include the jackpot values, unavailable from New York. As seen in: “The jackpot is a lie,” by Zach Seward.