Data Is Plural

... is a weekly newsletter of useful/curious datasets.

2022.02.02 edition

Abortion facility distances, China’s tech giants abroad, House committee witnesses, immigrant populations in 1900, and borrowed words.

Abortion facility distances. Caitlin Knowles Myers, an economist with a focus on reproductive policy, has compiled a dataset that calculates — for every county in the contiguous US, every month between January 2009 and June 2021 — the distance you would have to travel to the nearest abortion facility. Researchers can also request access to Myers’s underlying database of the facilities themselves. As described in her working paper, “Measuring the Burden: The Effect of Travel Distance on Abortions and Births,” Myers gathered the information from a range of sources, including state licensing databases, facility websites, and Planned Parenthood directories. As seen in: “Abortions could require 200-mile trips if Roe is overturned” (Axios). [h/t Rose Mintzer-Sweeney]

Chinese technology companies abroad. Mapping China’s Tech Giants, a project relaunched by the Australian Strategic Policy Institute last year, examines the overseas expansion of 27 major Chinese technology companies, from Alibaba to ZTE. The project’s dataset includes 3,900+ entries, each describing and locating an operation or connection abroad. They’re grouped into a couple dozen categories, such as commercial partnerships, overseas offices, data centers, 5G relationships, training, and donations. [h/t Samantha Hoffman]

House committee witnesses. Political scientists Lauren C. Bell and J.D. Rackey have compiled a spreadsheet of 435,000+ people testifying before the US House of Representatives from 1971 to 2016. They began with a text file scraped from a ProQuest database, provided by the authors of a dataset that focused on social scientists’ testimony (DIP 2020.12.23). Then, they determined each witness’s first and last name; type of organization; the committee, date, title, and summary of the relevant hearing; and more. [h/t Derek Willis]

Immigrant populations in 1900. The 1900 US Census’s public report includes a table counting the foreign-born residents of each state and territory — overall and disaggregated into a few dozen origins, which range from subdivisions of countries (Poland is split into “Austrian,” “German,” “Russian,” and “unknown” columns) to entire continents (“Africa”). It’s officially available as a low-resolution PDF. Reporters at Stacker, however, recently transcribed it into a CSV file for easier use. [h/t Emilia Ruzicka]

Borrowed words. The World Loanword Database, a project from the Max Planck Institute for Evolutionary Anthropology, examines how languages have borrowed words from each other. For 41 historical and contemporary languages, it lists 1,000–2,500 words, experts’ judgment of whether they were borrowed, from what language, and other etymological details. Previously: The World Atlas of Language Structures (DIP 2016.01.06). [h/t blopeur]