Data Is Plural

... is a weekly newsletter of useful/curious datasets.

2024.02.14 edition

State trust lands, household composition, comics deconstructed, nursing home prices, and twisty roads.

State trust lands. “State trust lands just might be one of the best-kept public secrets in America,” according to Grist, which published a data-driven investigation on the topic last week. These lands — expropriated from Indigenous nations and managed by state governments for profit — “exist in 21 Western and Midwestern states, totaling more than 500 million surface and subsurface acres.” Grist’s inquiry focuses on those benefiting land-grant universities, the subject of a related High Country News investigation (DIP 2020.06.24). Using state property records and Forest Service data digitized from historical cession maps, Grist identified “more than 8.2 million acres of state trust parcels taken from 123 tribes, bands, and communities” that fund 14 such institutions. Their public data describe these 41,000+ pieces of land: each parcel’s size and location, rights type, land use, benefiting university, associated tribes, and cession details. [h/t Rachel Glickhouse]

Household composition. Juan Galeano et al.’s CORESIDENCE database provides 146 indicators of household arrangements in 156 countries and ~4,000 regions, spanning the years 1964 to 2021. The indicators range in specificity, from average household size to, e.g., the average number of non-relatives in 3-person households. They also include gender breakdowns, such as the proportion of 5-person households among female-headed households. The metrics are calculated using “global-scale individual microdata from four main repositories and national household surveys, encompassing over 150 million individual records representing more than 98% of the world’s population.”

Comics, deconstructed. The Tilburg University–based Visual Language Lab, led by comics scholar Neil Cohn, studies “all aspects of visual language, from the structure of individual drawings, emoji, or cartoons, to how we make meaning out of sequences of images like in comics.” The lab’s Visual Language Research Corpus provides detailed annotations of tens of thousands of panels in 300+ comic books and graphic novels (plus every Calvin & Hobbes strip). The dataset’s sources include material from multiple continents, time periods, and genres. The annotations examine “attentional framing structure and filmic shot scale, the situational changes across panels, page layouts, multimodality, visual morphology, and path structure,” among other characteristics. [h/t Cameron Yick]

Nursing home prices. Between December 2020 and March 2022, SeniorLiving.org researchers “attempted to contact 7,221 US senior facilities by telephone to obtain availability and service pricing.” The team got at least some pricing data for 3,000+ of the facilities. The results, available from SeniorLiving’s data portal, provide information about each provider, the team’s call attempts, and the average monthly price for five types of housing and care: skilled nursing with a private room, skilled nursing with a shared room, assisted living, independent living, and care tailored for people with Alzheimer’s disease and dementia. Related: The Centers for Medicare & Medicaid Services’ Skilled Nursing Facility Cost Report datasets, which contain annual metrics on finances, staffing, and care provided. [h/t Corie Wagner]

Twisty roads. Curvature uses OpenStreetMap data to map the world’s curviest roads. Built by motorcyclist and software developer Adam Franco, the open-source project “works by looking at the geometry of every road segment and adding up how much length of the road is sharp corners, broad sweeping curves, and straight areas.” Franco also provides a more detailed explanation, as well as data files scoring each road and curve segment. [h/t Giuseppe Sollazzo]