Data Is Plural

... is a weekly newsletter of useful/curious datasets.

2023.03.01 edition

Congressional votes and ideology, EPA-regulated facilities, programming languages, 20th-century occupations, and coconut thumps.

Congressional votes and ideology. The Voteview project “allows users to view every congressional roll call vote in American history,” and places those votes in the context of ideology estimates along a liberal-to-conservative spectrum. The core estimates come from DW-NOMINATE, a method developed by the project’s directors emeritus, Keith T. Poole and Howard Rosenthal. Voteview’s bulk data includes ideology estimates for every member of the House and Senate since 1789, every vote taken in either chamber, and every member’s position on those votes. [h/t Philip Bump]

EPA-regulated facilities. The US Environmental Protection Agency’s Facility Registry System “provides Internet access to a single source of comprehensive information about facilities, sites or places subject to environmental regulations or of environmental interest.” It includes each entity’s name, type, location, industry, regulatory programs, and more. That information, which spans millions of facilities, is “subjected to rigorous verification and data management quality assurance procedures.” The records also provide facilities’ ID numbers from other EPA systems, such as the agency’s Risk Management Program database featured in last week’s edition. [h/t Michael Allen]

Programming languages. PLDB is a database that describes several thousand programming languages, file formats, communications protocols, and other related concepts. Its downloads, available in several formats, provide information on the languages’ years announced, technical features, creators, countries and communities of origin, relevant books and URLs, popularity metrics, and more. [h/t Derek M. Jones]

20th-century occupations. Between 1939 and 1991, the US government published several iterations of the now-discontinued Dictionary of Occupational Titles, a precursor to the O*NET database (DIP 2017.09.27). The dictionaries included job descriptions, classification codes, and cross-references, but are mostly available only as scans. So Shahad Althobaiti et al. organized the manual transcription of five major editions into structured text files. A random sample of 1939’s titles: punch-press operator, seam dampener, base brander, box pleater, and necktie finisher.

Thump, thump, coconut. “Traditionally,” in the Philippines, “coconuts are classified into their maturity levels manually,” June Anne Caladcad and Eduardo Piedad Jr. write. “Traders often use their fingernails, knuckles, or the blunt end of the knife to tap the coconuts before assessing the sounds produced.” The authors and their colleagues have developed hardware and software to emulate that process, and used it to collect acoustic signal data from 129 premature, mature, and overmature coconuts, each mechanically knocked on each of its three ridges.