Data Is Plural

... is a weekly newsletter of useful/curious datasets.

2022.12.14 edition

Flu trends, university enrollments, Jewish texts, women’s college basketball rosters, and Mastodon membership.

Flu trends. The CDC’s Influenza Division collaborates with state and local health departments, hospitals, laboratories, and other partners to keep tabs on flu trends. Its Weekly U.S. Influenza Surveillance Report tracks case counts, positivity rates, strain distribution, and other metrics that you can explore and download through an interactive dashboard. The records go back to the 1997–98 flu season and are available at a national, regional, and state level. Read more: “The US has never recorded this many positive flu tests in one week” (Vox). [h/t Jay Arthur]

University enrollments. Elizabeth Buckner’s Global Longitudinal University Enrollment Dataset “compiles and estimates institution-level enrollment data on universities worldwide from 1950 to 2020” at five-year intervals — 17,000+ institutions in all, across 180+ countries. In addition to enrollment figures, the dataset “includes a number of other useful variables on institutional characteristics, merged from various sources, including sector (i.e., public/private), founding year, and whether the institution is PhD granting or not.” It also forms the basis of an accompanying, country-level dataset.

Jewish texts. Sefaria, a nonprofit co-founded a decade ago by author Joshua Foer and engineer Brett Lockspeiser, is “assembling a free living library of Jewish texts and their interconnections, in Hebrew and in translation.” Those texts include the Torah itself, plus rabbinic scholarship, legal works, prayer books, historical dictionaries, and more. In all, the project contains more than 300 million words and has generated 3 million intertextual links between them. The initiative provides its data via an API and bulk download, and its code is open-source. Read more: “The quest to put the Talmud online” (Washington Post, 2018). [h/t Avi Levin]

Women’s college basketball rosters. Students in Derek Willis’s “Sports Data Analysis & Visualization” course at the University of Maryland’s journalism school have assembled data on 13,000+ players on women’s college basketball teams, sourced from 900+ rosters for the 2022–23 NCAA season. Their main dataset lists each player’s name, team, position, jersey number, height, year, hometown, high school, and more.

Mastodon membership. The open-source website instances.social tracks 16,000+ servers running Mastodon, the most prominent of the decentralized social networks seen as alternatives to Twitter. It collects each server’s domain, name, description, user count, status count, and more. Since late November, Simon Willison has been creating a longitudinal record of the site’s directory and charting the overall trend.