Data Is Plural

... is a weekly newsletter of useful/curious datasets.

2023.04.05 edition

Treasury transactions, more than 200 million European buildings, 911 service areas, lists of medical codes, and Midwestern mollusks.

Treasury transactions. The US Treasury’s Daily Treasury Statement dataset paints a near-real-time picture of the federal government’s purse. The eight tables provide data on “operating cash balance, deposits and withdrawals of cash, public debt transactions, federal tax deposits, income tax refunds issued,” and more. The table of deposits and withdrawals, for example, indicates the total value (rounded to the nearest $1 million) of transactions each day, by type and category (e.g., “Economic Recovery Programs,” “Defense Vendor Payments,” etc.). The records, available to download in bulk or via API, go back to October 2005. As seen in: “Can a billionaire die without anyone noticing? The mystery behind a remarkable $7 billion tax payment.” (Tim Fernholz, Quartz). [h/t Walt Hickey]

European buildings. EUBUCCO is a “database of individual building footprints for 200+ million buildings across the 27 European Union countries and Switzerland, together with three main attributes – building type, height and construction year – included for respectively 45%, 74%, 24% of the buildings.” The researchers, who describe their methodology in a recent paper, collected and standardized the information from 50 open government datasets and OpenStreetMap. They then “perform[ed] extensive validation analyses to assess the quality, consistency and completeness of the data in every country.” You can browse the data online, download it, and access the underlying code.

911 service areas. Public Safety Answering Points (PSAPs) are, essentially, the call centers to which 911 calls are routed. A while ago, the US government hired a contractor to compile a dataset of the service area boundaries for each of the country’s PSAPs, “the geographic area within which a 911 call placed using a landline is answered at the associated PSAP.” Although some of that information has likely changed since the dataset’s publication in 2009, the records may still be useful for certain purposes. Related: The FCC’s 911 Master PSAP Registry, which doesn’t include service boundaries but does provide the name, state, county, and ID of 6,000+ primary PSAPs, plus 2,700+ PSAPs listed as “secondary,” “duplicate,” or “orphaned.” [h/t Maddy Varner + Mike Thompson]

Lists of medical codes. OpenCodelists is “an open platform for creating and sharing codelists of clinical terms and drugs,” built by the University of Oxford’s OpenSAFELY team. The platform supports several coding systems, including ICD-10 and SNOWMED CT. The lists can refer to groups of symptoms, diagnoses, medications, and more. Public lists note their creators and coding system; you can view each list’s codes online, or download them as a CSV. [h/t Ben Goldacre]

Midwestern mollusks. The Illinois Natural History Survey’s mollusk collection contains 500,000+ specimens, cataloged into 90,000+ lots, some gathered more than a century ago. “The collection is strong in freshwater mussels (Unionoida), freshwater and terrestrial snails from the Midwestern U.S. and cone shells (Conoidea),” as well as “freshwater bivalves and gastropods from the Southeastern U.S., Central, and South America.” For each lot, the collection’s dataset lists the genus, species, number of specimens, date and location collected, and more. Related: INHS’s other collections. [h/t Meredith Broussard]