Data Is Plural

... is a weekly newsletter of useful/curious datasets.

2017.04.19 edition

Prescriptions, vaccinations, cyclones, plant hardiness zones, and spelling self-corrections.

UK, US, Rx. The UK’s National Health Service publishes monthly data on drugs prescribed in England through the country’s single-payer health care system. (Drugs prescribed in Scotland, Wales, or Northern Ireland aren’t included.) For each prescriber-and-drug combination, the dataset includes the quantity and cost of prescriptions for each month since August 2010. The US publishes similar data about prescriptions issued through Medicare, but only on an annual basis and currently only covering 2013 and 2014. Related: ProPublica’s Prescriber Checkup, which uses the Medicare data to examine doctors’ prescribing patterns. Previously: A decade-plus of Australian prescription data (DIP 2016.08.24). [h/t Adam Crahen]

Vaccination rates by state. The CDC’s National Center for Immunization and Respiratory Diseases collects and publishes state-by-state vaccination rates for infants, kindergartners, teens, and adults — plus, flu vaccination rates for several age groups. Each dataset includes several years’ worth of data, with many going back to 2008 or 2009. Related:California Shows The Rest Of The Country How To Boost Kindergarten Vaccination Rates,” by my colleague Peter Aldhous, with additional county-level data from the Golden State. Previously: International vaccination rates and policies (DIP 2016.08.03).

Tropical cyclones. Through its International Best Track Archive for Climate Stewardship project, the National Oceanic and Atmospheric Administration publishes what it calls “the most complete global set of historical tropical cyclones available.” For each tropical cyclone — a category that includes typhoons, hurricanes, tropical depressions, and more — the dataset includes its position, wind speed, central pressure, and classification at six-hour intervals. The dataset is updated annually and includes some historical cyclones from as early as 1842. [h/t Daniel Miller]

Where plants grow best. The USDA’s Plant Hardiness Zone Map “is the standard by which gardeners and growers can determine which plants are most likely to thrive at a location.” The USDA and Oregon State, which have jointly developed the map, previously sold access to the underlying data through a vendor. But after the vendor shut down earlier this year, OSU began publishing the data free of charge (though with some licensing restrictions). The dataset is available as detailed shapefiles and as ZIP code–based spreadsheets. [h/t Waldo Jaquith + Lynn Cherny]

Spelling self-corrections. For a 2012 academic paper, researchers captured the keystrokes of paid volunteers as they typed descriptions of images. Whenever a participant used the backspace key to correct a word, the researchers added it to a dataset of self-corrections. Each of the 44,000 lines in the English-language version of the dataset contains the original mistake and the correction. The most common change was inon. Other common fixes included waling → walking and pople → people. [h/t Seth Stephens-Davidowitz]