2025.08.27 edition
Congressional campaign policy platforms. CampaignView, a project from Rachel Porter et al., provides 43,000+ policy-platform points and 5,000+ biographical narratives of candidates for the US House of Representatives between 2018 and 2022. The information was collected from campaign websites “in real time a week before each state’s primary election,” cleaned, standardized, and categorized. For each platform point, the dataset and searchable interface indicate the candidate, the text of the policy point, its subheading on the campaign website, and a human-labeled policy topic code.
Billboard hits. For his upcoming book, Uncharted Territory: What Numbers Tell Us about the Biggest Hit Songs and Ourselves, Chris Dalla Riva has compiled a dataset of all 1,100+ Billboard number one hits from 1958 to early 2025. For each song, the dataset includes information about the artists, songwriters, producers, and label; genre, time signature, keys, BPM, the presence of various instruments; song structure and lyrics; whether the song was entered into Eurovision; and much more. Read more: Chris’s music-through-data newsletter, Can’t Get Much Higher.
EU antitrust cases. The European Union’s Directorate-General for Competition has begun publishing case datasets, providing information about decades of antitrust and cartel cases, merger cases, Digital Markets Act cases, subsidy cases, and more. The datasets generally provide the case title, companies involved, key dates, decision, and attachments, among other variables.
Texas oil and gas. The Railroad Commission of Texas, the state’s “agency with primary regulatory jurisdiction over the oil and natural gas industry,” publishes datasets of wells and pipelines, annual production by oil/gas field, permitted production, underground injection monitoring, and more. The commission also provides query interfaces to many of its datasets. As seen in: “The surprising transparency hidden inside the US oil and gas machine” (Data Desk). [h/t Amy DiPierro]
Wines and wine ratings. Rogério Xavier de Azambuja et al.’s X-Wines dataset describes 100,000+ wines produced in 60+ countries. The dataset — collected and standardized from “wine-specialized websites,” winery websites, and other sources — indicates each wine’s name, type, grape varieties, alcohol level, acidity level, country, region, winery, and vintages. It also includes 21 million ratings (on a 1-to-5 scale) of those wines by 1 million anonymized users. Previously: Databases from the Wine Economics Research Centre (DIP 2022.10.12) and France and Italy’s protected wines (DIP 2024.05.22). [h/t Soph Warnes + Charlie Murphy]