Data Is Plural

... is a weekly newsletter of useful/curious datasets.

2020.07.08 edition

The Paycheck Protection Program, women in office, programmer surveys, UK land parcels, and Dungeons & Dragons dialogue.

The Paycheck Protection Program. On Monday, the US Treasury Department and Small Business Administration released detailed data on the financial assistance given to businesses through the government’s Paycheck Protection Program. For the 660,000+ loans of at least $150,000, the dataset includes each recipient’s name, address, industry classification, and business type, plus the name of lender, the number of jobs the business said would be supported, loan amount (grouped into several ranges), and more. For aid less than $150,000, the dataset contains similar information, but without names or addresses. Related: Efforts to make the data more accessible are already underway. Simon Willison, for instance, generated a searchable database of all loans of $150,000+, and The Washington Post has published an interactive database of the $1,000,000+ loans. Also related: There are some errors in the data. [Update, 2023-01-11: The page linked to “the dataset” no longer provides access; instead, see this page from the Small Business Administration.]

Women in office. Rutgers University’s Center for American Women and Politics has made public its Women Elected Officials Database, which “represents the most complete collection of information anywhere in the world about women elected officials in the United States.” It covers all “women who have held office at the congressional, statewide elected executive, and state legislative levels nationwide,” going back to the 1890s. For each of the 11,540 officeholders, “the database includes their geographic information, party identification, and race identification where available.” You can explore the data online and also (with a free registration) download it. Previously: Women candidates for the US House, 1972–2010 (DIP 2017.07.26) .

Programmer surveys. For the past decade, programming Q&A site StackOverflow has run an annual survey, asking developers about the languages they use, their workplaces, learning goals, salaries, and more. The site provides anonymized, respondent-level data for each survey, including the 2020 edition, which received 64,000+ responses. Between 2016 and 2018, the not-for-profit FreeCodeCamp ran an annual “new coder” survey, which attracted more than 31,000 responses in its most recent year; those datasets are also available to download. [h/t Jason Norwood-Young

UK land parcels. Last week, the UK expanded access to its datasets defining the geographical boundaries of 23 million “title extents” in England, Wales, and Scotland. Previously: UK property sales (DIP 2016.03.23).

Dungeons & Dragons dialogue. Microsoft researchers Revanth Rameshkumar and Peter Bailey have assembled the Critical Role Dungeons and Dragons Dataset, converting 159 transcripts of a popular, live-streamed role-playing show into structured information about 398,682 bits of dialogue. [h/t Lynn Cherny]