Data Is Plural

... is a weekly newsletter of useful/curious datasets.

2022.07.27 edition

Wildfires around the world, hospital price lists, monkeypox strains, startup factories, and shark incidents.

Wildfires around the world. The Global Wildfire Information System, expanding on the work of European Forest Fire Information System, uses satellite data to provide weekly and annual estimates of the number of fires and area burned in 200+ countries. Its bulk data indicates monthly burned hectares by country, sub-country unit, and land type from 2002 to 2019, as well as the boundaries of individual fires from 2001 to 2020. It also publishes gridded spatial data relating to fire danger forecasts, active fires, emissions, and more. As seen in: El Diario’s analysis of forest fires in Spain. [h/t Olaya Argüeso Pérez]

Hospital price lists. Since January 2021, the US government has required hospitals to publish machine-readable files listing the standard charges for all items and services they provide. But there’s no standard format for these price lists (also known as “chargemasters”), no official central repository of them, and compliance has been lacking. Seeing those problems, the versioned-data platform DoltHub earlier this year ran a paid crowdsourcing campaign that pulled nearly 300 million prices from the published lists of roughly 1,800 hospitals into a single database. Related: Thanks to an earlier price transparency rule, California posts chargemasters for hundreds of hospitals, with records going back to 2011.

Monkeypox strains. Nextstrain, “an open-source project to harness the scientific and public health potential of pathogen genome data,” has begun analyzing genetic sequences from hundreds of monkeypox virus samples, the vast majority from infections in the past few months. The project provides metadata on each sample, including the date, country, variant, and mutation metrics, as well as detailed sequencing data from NCBI Virus. Previously: Coronavirus variant data from (DIP 2021.03.10). [h/t Karsten Johansson]

Startup factories. Venture studios are firms that build and launch startups. Jim Moran’s Venture Studio Index tracks 260+ of them, plus 1,200+ of the startups they’ve launched. The dataset, “collected manually by a team of researchers familiar with venture capital and the technology startup ecosystem,” includes founding years, locations, employee counts, relevant URLs, and more.

Shark bites. Madeline Riley et al. describe the Australian Shark-Incident Database, which contains details about 1,100+ shark bites (and attempted shark bites) between 1791 and early 2022, gathered by the Taronga Conservation Society using “questionnaires provided to shark-bite victims or witnesses, media reports,” and information from state agencies. Read more: “New dataset shows shark bites in Australia are increasing and researchers want to know why” (The Guardian).