Data Is Plural

... is a weekly newsletter of useful/curious datasets.

2022.10.12 edition

Work-related injury counts, US hydrography, rebel leaders, file formats, and wine economics.

Work-related injury counts. The US Occupational Safety and Health Administration requires many (but not all) businesses to track employees’ work-related injuries and illnesses. Larger companies and those in high-risk industries must electronically submit annual counts to the agency. Thanks to freedom-of-information lawsuits by Reveal and Public Citizen, OSHA began to publish business-level data from those electronic submissions in 2020. The records, which go back to 2016, include each business’s name, location, industry, employee count, and employee hours worked, plus their reported number of deaths, injuries, skin disorders, respiratory conditions, poisonings, hearing loss, and other illnesses.

US hydrography. The National Hydrography Dataset, maintained by the US Geological Survey, “represents the water drainage network of the United States with features such as rivers, streams, canals, lakes, ponds, coastline, dams, and streamgages.” You can download the NHD geospatial files by hydrologic unit or state, or for the entire nation. Related: A dataset of waterfalls and rapids in the contiguous US, linked to the NHD and sourced partly from Bryan Swan and Dean Goss’s World Waterfall Database. [h/t Malcolm Tunnell + Christopher Ingraham]

Rebel leaders. Benjamin Acosta et al.’s Rebel Organization Leaders Database “provides a wide range of biographical information on all top rebel, insurgent, and terrorist leaders who were active in civil wars between 1980 and 2011.” It includes each leader’s name, gender, education, religion, languages spoken, number of children, years in role, country fought against, cause of death, and much more. The database covers 425 individuals fighting against 80+ countries; the project also features written profiles for a sample of them.

File formats. The US National Archives’ Digital Preservation Framework describes the agency’s risk assessments and recommended preservation plans for 600+ file formats. The framework’s documentation places each format into one of 16 categories, such as “digital audio,” “spreadsheets,” “navigational charts,” and “software and code.” In August, the agency added “linked open data” representations of the plans for each format. [h/t Elizabeth England]

Wine economics. Researchers at the University of Adelaide’s Wine Economics Research Centre have compiled several longitudinal datasets. One, for example, quantifies the total area devoted to growing each grape variety in each country, 1960–2016. Another compiles various market statistics (e.g., national wine production, imports, exports) going back to 1835. Related: The International Organisation of Vine and Wine maintains a database of global and national statistics going back to 1995. As seen in: Jack Zhao’s exploration of the Adelaide data.