Data Is Plural

... is a weekly newsletter of useful/curious datasets.

2019.04.24 edition

Democracy, world leaders, ride-hailing, software time-estimates, and the Hunger Games.

Democracy. Varieties of Democracy bills itself as “a new approach to conceptualizing and measuring democracy” — one that “reflects the complexity of the concept of democracy as a system of rule that goes beyond the simple presence of elections.” The project scores countries annually on five high-level aspects of democracy, which are further broken down (by thousands of country-experts, based on a detailed codebook) into hundreds of more granular “indicators,” such as how often the government publicly attacks the judiciary, the extent to which authorities respect religious freedom, and the proportion of journalists who are women. Version 9 of the dataset, released earlier this month, covers 1789 to 2018 and includes 202 countries. [h/t John Polga-Hecimovich]

World leaders. The Archigos dataset provides historical data the leaders of nearly 200 countries between 1875 and 2015. The dataset — a collaboration between political scientists Hein Goemans, Kristian Skrede Gleditsch, and Giacomo Chiozza — includes basic demographic information, plus categorizations of how each leader came to power, how they lost it, and their post-office fate. Now you know: No UK prime minister has died in office since 1865; José María Velasco Ibarra became president of Ecuador five separate times, and removed by coup four times; Tunisian president Beji Caid Essebsi is 92 years old. [h/t Jeffrey Sachs]

Ride-hailing. Chicago has become the first city to publish detailed data from ride-hailing services, such as Uber and Lyft. Last week, officials released three datasets — on (anonymized) drivers, vehicles, and trips. The driver and vehicle datasets cover early 2015 through December 2018. The trip dataset covers only November and December 2018; even so, it includes more than 17 million rides. For each ride, the records contain the rough pickup and dropoff location, duration, the approximate fare and tip, and more. [h/t Sharon Machlis + Dan Nguyen + Karl Sluis + Michael A. Rice]

Software development time estimates. Derek M. Jones analyzes software-engineering data. Recently, he convinced a small software company to release a dataset documenting its internal time estimates, spanning 10 years, 20 projects, and 10,000+ tasks. For each task, the dataset indicates the number of hours it was predicted to take, how long it actually took, the (anonymized) developers it was assigned to, and more. [h/t Erik Bern]

Hunger Games survival.In a Cox proportional hazards model, which covariates are associated with the odds (or hazard ratios) being ever in your favor?” To find out, Brett Keller created spreadsheet of all 24 tributes in the 74th Hunger Games, including the districts from which they hailed, their ages, and how many days they survived.