Data Is Plural

... is a weekly newsletter of useful/curious datasets.

2022.03.30 edition

WHO outbreak alerts, census tract urbanization, local digital news, research organizations, and teenagers on TV.

WHO outbreak alerts. The World Health Organization’s Disease Outbreak News provides “information on confirmed acute public health events or potential events of concern,” with reports issued when those events meet certain criteria. Colin J. Carlson et al. have assembled a dataset of all 2,700+ reports published between January 1996 and December 2019; it includes report metadata (date and headline, for example), as well as variables drawn from the descriptive text, such as the diseases discussed, countries affected, case counts, whether the report implicates mass gatherings, and more.

Census tract urbanization. Studies of US urbanization often examine census tracts, but those boundaries have changed over time and are relatively new in many parts of the country. To aid longitudinal analyses, Scott N. Markley et al. have taken 2010’s tract boundaries and used several techniques to estimate the number of housing units within them every decade from 1940 to 2010, plus 2019. The dataset also estimates each tract’s “urbanization year,” when it surpassed 200 units per square mile. Read more: An introductory Twitter thread. Related: The Longitudinal Tract Data Base, which also provides historical estimates of housing units (and other Census variables) matched to 2010 tracts, going back to 1970.

Local digital news. Project Oasis, which aims to “map and showcase the growing number of locally focused digital news publications in the U.S. and Canada,” has collected data on 700+ such organizations. Its downloadable, browsable dataset describes their ownership, tax status, years in operation, communities served, and other characteristics. A parallel effort is underway in Europe, with plans to publish a final report in early 2023. [h/t Hacks/Hackers]

Research organizations. The Research Organization Registry is “a community-led project to develop an open, sustainable, usable, and unique identifier for every research organization in the world.” Earlier this month, the project published its first “independent” release, expanding on data seeded from a prior initiative. It contains the names, location, contact information, and other structured information for 100,000+ organizations.

Teenagers on TV. In search of “a better understanding of the age differences between teen characters in TV shows and the actors who portray them,” Amber Thomas last year manually compiled data on 240+ characters in 33 series that premiered between 2000 and 2021. For each character, Thomas’s dataset lists their name, fictional age/grade, gender, and love interests, as well as the actor’s name and birth date.