Data Is Plural

... is a weekly newsletter of useful/curious datasets.

2021.10.20 edition

Judicial financial disclosures, wildlife death and illness reports, a decade of news articles, more parliamentary speech, and Halloween candy.

Judicial financial disclosures. US federal judges must file annual reports disclosing their investments, external income, and other potential conflicts of interest. The filings are technically available to the public, but onerous to access. So the Free Law Project undertook an effort to obtain and parse as many of them as possible, ultimately creating a database of 250,000+ pages of disclosures, which you can search online or via an API, including “complete coverage of every judge, justice, and magistrate between 2011 and 2018.” Read more: “131 Federal Judges Broke the Law by Hearing Cases Where They Had a Financial Interest,” the first article in an ongoing series by the WSJ, which had early access to the database. [h/t Tom Folkes]

Wildlife death and illness reports. WHISPers, hosted by the USGS National Wildlife Health Center, is a “repository for sharing basic information about historic and ongoing wildlife mortality (death) and/or morbidity (illness) events,” with contributions from “hundreds of natural resource managers and stakeholders across the U.S. and beyond.” Wisconsin officials, for instance, reported that a bald eagle died of lead poisoning in Adams County this April. You can search WHISPers by date range, county, species, diagnosis, and more — up to 500 events at a time, exportable as CSV files. [h/t Terra R. Kelly et al.]

A decade of news articles. For their analysis of investigative publishing trends, Eray Turkel et al. gathered nearly 6 million articles published by 50 outlets (mostly local newspapers) in the 2010s, drawn from the pay-to-access NewsBank service. The study’s public dataset includes each article’s title, date, byline, and word and sentence count, plus various linguistic metrics calculated by the researchers. Related: To examine the online news economy, the Stanford-based team is seeking volunteers to have certain web-browsing tracked. [h/t Shosh Vasserman]

More parliamentary speech. PhD candidate Daniel Braby’s parlCymru and parlScot provide the text of 5 and 20+ years of spoken contributions to the Welsh and Scottish parliaments, respectively, plus speaker and circumstance metadata. German publication Dekoder’s Daniel Marcus has gathered 385,000+ transcripts from 25+ years of speech in Russia’s State Duma, powering an interactive chart of word frequencies. Previously: Spoken contributions to nine other parliaments (DIP 2020.04.29). [h/t Fabrice Deprez]

Halloween candy. For FiveThirtyEight’s “Ultimate Halloween Candy Power Ranking” (2017), Walt Hickey had readers vote on head-to-head matchups among 85 confections. The project’s dataset includes each candy’s winning percentage, various categorizations (e.g., Does it contain chocolate?), relative cost, and sugariness. [h/t Eric Gardner]