Data Is Plural

... is a weekly newsletter of useful/curious datasets.

2023.10.11 edition

US income distributions, education reform, Kia/Hyundai thefts, Michigan air permit violations, and dog genomes.

US income distributions. The Income Distributions and Dynamics in America project combines confidential Census Bureau records and IRS tax forms to generate detailed, downloadable statistics regarding “income percentiles, shares, growth rates, persistence, and more for many U.S. demographic groups at national and state levels.” The statistics, which cover 1998 through 2019, are the result of a research partnership between the Minneapolis Fed’s Opportunity & Inclusive Growth Institute and the Census Bureau. The project’s interactive tools let you track individual and household income percentiles by year, state, race/ethnicity, sex, US/foreign-born status, and age group. Previously: Global income deciles from Kanishka B. Narayan et al. (DIP 2023.06.28). [h/t Alex Albright]

Education reform. The World Education Reform Database describes 10,000+ education policy changes reported by 180+ countries to international organizations, among other sources. The project, led by education policy scholars Patricia Bromley and Rie Kijima, focuses onsystemic reforms that envision a supra-school administrative level and aim to impact the wider education system, rather than small projects that target individual schools.” The database lists each reform’s country, year, name, and summary. Although some of the reforms in the database were introduced hundreds of years ago, the vast majority occurred in the last 50 years.

Kia/Hyundai thefts. “Cities around the U.S. are facing a staggering new normal when it comes to stolen cars,” writes Motherboard reporter Aaron Gordon, who has been digging into the boom in thefts of Kia and Hyundai vehicles, millions of which lack “engine immobilizers, a basic anti-theft device that is legally mandated in Canada and Europe.” Gordon has requested car theft data from more than 100 cities, and received it from 43 so far. He’s updating a spreadsheet that lists each city’s number of thefts each month, plus the number/percentage that were Kias or Hyundais. Related: Earlier this year, USAFacts obtained and published similar data from 23 US cities.

Michigan air permit violations. For local news organization Planet Detroit, freelance journalist Shelby Jouppi has built a daily-updating dashboard of air quality permit violations cited by Michigan’s Department of Environment, Great Lakes and Energy. The dataset lists 1,500+ violation notices since 2018; for each, it provides the notice date and findings, facility name and location, and more. To construct it, Jouppi had to scrape individual notice PDFs from the department’s website and then extract the information from those documents. Read more: “Southwest Detroit steel slag processor receives 12th air quality permit violation for fallout since 2018,” an article by Jouppi based on the data.

Dog genomes. The Dog10K project aims “to coordinate the global effort on genome sequencing in dogs and build a comprehensive resource for the canine community.” In a recent Genome Biology article, the team shared data and findings from 1987 individual animals — “1611 dogs (321 breeds), 309 village dogs, 63 wolves, and four coyotes.” The records include raw sequence data and variation analysis results. [h/t Kim Nguyen]