Data Is Plural

... is a weekly newsletter of useful/curious datasets.

2017.03.01 edition

Toddler vocab, immigration enforcement, vehicle specs, library checkouts, and gator hunting.

Words kids learn. Wordbank is an “open database of children’s vocabulary development.” So far, the Stanford-hosted project has gathered data from more than 71,000 standardized and anonymized vocabulary questionnaires across 23 languages. You could spend hours exploring the data online, charting how quickly children learn individual words, how quickly the same word (e.g., “grandma,” “abuela,” “ба́бушка”) is learned in different languages, and connections between words. You can download the data for each word or for each child’s vocabulary. Bonus: Wordbank has an R package and a GitHub repository. [h/t Hacker News user “Jasamba”]

Police officers as immigration enforcers. In an early executive order, Donald Trump instructed the Department of Homeland Security to expand its use of Section 287(g) of the Immigration and Nationality Act, which allows the federal government to deputize local law enforcement agencies in its search for undocumented immigrants. In response to FOIA requests, DHS has previously released data on the local agencies that participate in the 287(g) program. The Marshall Project has collated the DHS data, which includes the number of immigrants deported, for 2006 to 2013 (the most recent year available). During that timespan, “more than 175,000 people nationwide were deported under the program,” Anna Flagg writes. “More than 30,000 of them came from Maricopa County, Ariz., the most from any single jurisdiction.” [h/t Tom Meagher]

Vehicle specs. The National Highway Traffic Safety Administration provides an impressively rich API detailing every manufacturer, make, and model in its database. The API can translate cars’ Vehicle Identification Numbers into the nitty-gritty details that those VINs encode, including the plant where the vehicle was manufactured, number of doors, engine measurements, fuel type, and more. [h/t Justin Myers]

A decade-plus of Seattle library checkouts. Last month, the Seattle Public Library released a dataset tracking the total number of checkouts for each title by year and month from April 2005 to December 2016 (so far). The dataset isn’t limited to physical books; it also includes e-books, magazines, CDs, DVDs, and more. Last year, the three most popular physical books were Paula Hawkins’s The Girl on the Train (2,355 checkouts), Lauren Groff’s Fates and Furies (2,151 checkouts), and Ta-Nehisi Coates’s Between the World and Me (2,134 checkouts).

Gator hunting. Florida’s Fish and Wildlife Conservation Commission publishes data from its statewide recreational alligator hunt. For each alligator harvested between 2000 and 2015, the dataset includes the date, the hunting area, and the length of the carcass. (Legal hunting tools include crossbows, harpoons, spearguns, fishing poles, snatch hooks, and bang sticks — but not rifles, pistols, or other guns.) [h/t Christopher Groskopf + Neil Bedi + Eric Sagara]