Data Is Plural

... is a weekly newsletter of useful/curious datasets.

2023.12.20 edition

Dengue, workforce training providers, Antarctic ice sheets, finance ministers, and Netflix viewership.

Dengue. OpenDengue, a new project based at the London School of Hygiene & Tropical Medicine, “aims to build and maintain a database of dengue case counts for every dengue-affected country worldwide since 1990 or earlier.” (According to a 2013 study cited by the WHO, the virus causes an estimated 96 million symptomatic infections per year.) Drawing from a range of sources, the team has collected weekly, monthly, and/or annual national counts from 100+ countries so far and welcomes contributions. For each count, the project’s datasets indicate what was being counted: suspected cases, confirmed cases, probable cases, or a combination. For several dozen countries, the data also contain sub-national counts — including for 5,000+ municipalities in Brazil and all 70+ Thai provinces. [h/t Sarah Newey]

Workforce training providers. The Workforce Almanac, launched last month by the Harvard-based Project on Workforce, maps “almost 17,000 providers of workforce training, which we have defined as short-term (lasting less than two years), post-high school training opportunities in which learners gain work-relevant skills to help them find a job.” The dataset lists each organization’s name, address, city, state, coordinates, and various categorizations. To build it, the team merged and cleaned data from four sources: the Department of Labor’s Registered Apprenticeship Partners Information Database System (RAPIDS) and, IRS nonprofit registrations, and the Integrated Postsecondary Education Data System (IPEDS).

Antarctic ice sheets. The Scientific Committee on Antarctic Research’s Bedmap Data Portal provides access to “ice bed, surface and thickness point data from all Antarctic geophysical campaigns since the 1950s.” In an accompanying paper, Alice C. Frémand et al. describe the process they undertook to standardize decades of ice-survey data and to publish the outputs based on “findable, accessible, interoperable, and reusable” (FAIR) data principles. In all, the records contain “82 million data points collected as part of 277 campaigns,” spanning three generations of data. Previously: Antarctic geology (DIP 2023.05.31).

Finance ministers. Brenna Armstrong et al. have compiled a dataset of 2,900+ people who served as a national finance minister or equivalent position between 1972 and 2017. For each of their 3,200+ tenures, the dataset lists the minister’s name, country, year-month started/ended, gender, whether the minister received an advanced economics education, and whether they’d be considered a technocrat — i.e., someone who had policy expertise but hadn’t held elective office. [h/t Phenomenal World]

Netflix viewership. In late 2021, Netflix began publishing downloadable datasets of the 10 most popular movies and TV shows each week, overall and by country. Last week the company published a new data report, which lists all 18,000+ titles viewed for 50,000+ hours on the platform in the first half of 2023. The report, which Netflix says it will publish twice a year, indicates each title’s name, release date, approximate hours viewed, and whether it was available globally. [h/t Avi Levin + Saul Pwanson]