Data Is Plural

... is a weekly newsletter of useful/curious datasets.

2017.10.18 edition

Puerto Rico’s recovery, subnational conflicts, patents and trademarks, a sisterly survey, and a network of ideas.

Puerto Rico’s recovery. Since shortly after Hurricane Maria hit Puerto Rico, the territory’s government has been publishing a dashboard of recovery statistics. The website tracks a couple dozen metrics, including the percent of homes with electricity, number of people in shelters, and the number of open hospitals. For several of the main metrics, researcher Michael A. Johansson has been scraping daily figures from the dashboard and publishing them as a CSV file. Related: The Washington Post has been charting the recovery, and published a deep dive into the island’s ongoing power outages.

Subnational conflicts. University of Michigan–based researchers have created “a repository of micro-level, subnational event data on armed conflict and political violence around the world.” The project, dubbed xSub, standardizes information from 21 data sources, and includes conflicts in 139 countries between 1942 and 2016. For each administrative boundary (e.g., country, province, district) and data source, xSub’s data counts the number of violent incidents by year, month, week, or day. The numbers are also broken down by the sides involved, who initiated the conflict, and what types of force were used. [h/t Andy Halterman]

Patents and trademarks. The U.S. Patent and Trademark Office publishes a huge amount of bulk data, including detailed XML files that contain information about millions of patent/trademark applications, assignments, trials, and appeals. The agency also publishes a collection of “research datasets”, which distill those bulk XML files into easier-to-use tabular data. [h/t Rachael Tatman]

Sister, Sister. In the wake of the Second Vatican Council in the 1960s, Sister Marie Augusta Neal conducted an enormous opinion survey of Catholic “women religious.” More than 130,000 sisters responded to the 649 multiple-choice-question survey — the results of which the University of Notre Dame recently cleaned up and made available online. [h/t Kevin Schlottmann]

Get the idea? ConceptNet “is a freely-available semantic network, designed to help computers understand the meanings of words that people use.” It defines approximately 28 million “statements,” i.e., relationships between various things. For instance, ConceptNet indicates that a newsletter is a type of “report”, and that a computer can be used to “send email”. You can download the entire dataset, or access it via an API.