Data Is Plural

... is a weekly newsletter of useful/curious datasets.

2021.03.31 edition

US coronavirus case details, high-resolution population densities, electoral gender quotas, EU merger decisions, and dried beans.

US coronavirus case details. Last week, the US CDC began publishing a new, geographically-specific COVID-19 “case surveillance” dataset. Each of the 22 million rows represents a (de-identified) coronavirus case, accounting for roughly 70% of the country’s current official case count. The details include (where available) the person’s county and state of residence; age, sex, race, and ethnicity; “presence of any underlying medical conditions and risk behaviors”; and whether the person was hospitalized and/or died. Read more: Betsy Ladyzhets writes at the COVID-19 Data Dispatch, “After months of no state-by-state demographic data from the federal government, we now have county-by-county demographic data. This is a pretty big deal!"

High-resolution population densities. “Using a mixture of machine learning techniques, high-resolution satellite imagery, and population data,” researchers at Facebook and Columbia University have “mapped hundreds of millions of structures distributed across vast areas and then used that to extrapolate the local population density.” The project, which began half a decade ago, now provides population density datasets covering much of the world. (China, Russia, and Canada are among the notable countries missing.) Read more: Benjamin Schmidt’s exploratory Twitter thread. Previously: The Global Human Settlement Layer (DIP 2016.11.02).

Electoral gender quotas. Many countries apply gender quotas to their parliamentary elections, typically by reserving a certain number of seats for women or by regulating political parties’ candidate lists. In other instances, parties have instituted voluntary quotas. The Gender Quotas Database categorizes these rules for more than 120 nations, and provides additional details through its country profile pages.

EU merger decisions. Pauline Affeldt et al. have compiled a quarter century of merger control decisions by the European Commission’s Directorate-General for Competition — decisions relating to 5,000+ cases and 31,000+ product/market combinations between 1990 and 2014. The dataset lists the target company, acquiring company, industry, product, outcome, decision date, and more. [h/t Anna Rita Bennato]

Dried beans. Researchers at Turkey’s Selçuk University have built a computer vision program to measure and classify images of dried beans. Their dataset includes 13,611 specimens across seven varieties; for each, it reports the bean’s perimeter, axis lengths, roundness, and more. [h/t Meredith Broussard]