Data Is Plural

... is a weekly newsletter of useful/curious datasets.

2020.12.02 edition

Student loans, rural facilities in India, more coups, Jefferson’s weather, and integer sequences.

Student loans. The US Department of Education publishes a range of aggregate datasets on federal student loans, including the amounts outstanding ($1.5+ trillion overall, from 43 million students), volumes of financial aid requested and awarded (by student demographic and by school), default rates, and forgiveness.

Rural facilities in India. As part of its Pradhan Mantri Gram Sadak Yojana road-development program, India’s Ministry of Rural Development has gathered data on 700,000+ rural facilities, which data-science engineer Pratap Vardhan has organized into state-level CSV files. The information includes each facility’s name, category (e.g., education, medical, etc.), subcategory, state, district, block, address, and geocoordinates. Related: An exploratory Twitter thread by Vardhan, who says, “This is probably the largest open indian geo-tagged dataset I’ve seen!? It’s mostly great!?”

More coups. Last month, the University of Illinois’ Cline Center for Advanced Social Research published version 2.0 of its Coup D’état Project, a dataset detailing more than 900 coups, attempted coups, and coup conspiracies from 1945 to 2019. Each entry indicates the country and date, plus the “type of actor who initiated the coup (i.e. military, palace, rebel, etc.) as well as the fate of the deposed executive (killed, injured, exiled, etc.).” Previously: Powell and Thyne’s coup dataset (DIP 2016.07.20).

Jefferson’s weather. From July 1776 to June 1826, Thomas Jefferson recorded thousands of nearly-daily weather observations — temperatures, precipitations, humidities, wind speeds — at Monticello, Paris, Milan, and scores of other locations. Now a UVA/Princeton collaboration has turned those handwritten records into an explorable and downloadable database. [h/t Erica Cavanaugh]

Integer sequences. The decades-old, frequently-updated, and downloadable On-Line Encyclopedia of Integer Sequences contains more than 338,000 lists of those things. Each has some particular significance, ranging from the famous (the Fibonacci numbers) to the intriguing (“days required to spread gossip to n people”) to the obscure (“numbers n such that 2^n + 35 is prime”) to the super-obscure. Related: This xkcd comic and its impact. [h/t Dan Brady]