Data Is Plural

... is a weekly newsletter of useful/curious datasets.

2017.12.27 edition

Historical credit ratings, marine traffic, mammographies, drug-free school zones, and Italian words for “watermelon.”

Historical credit ratings. The SEC requires Moody’s, Standard & Poor’s, and other “nationally recognized statistical rating organizations” to report their rating assignments and changes (e.g., upgrades, downgrades, withdrawals) going back to 2010. The agencies publish the reports as XBRL-formatted files, and update them monthly. But “because most researchers are unfamiliar with XBRL and cannot easily locate the history files, this valuable resource has seen limited use,” according to the Center for Municipal Finance’s, which now provides the reports as easier-to-use CSVs. [h/t]

Marine traffic. Ships use the internationally-standardized automatic identification systems (AIS) to broadcast their name, speed, direction, and other details. With a bit of radio hardware and software, anyone can collect the signals emitted by nearby vessels. AISHub aggregates AIS data from hundreds of volunteer signal-collectors around the world, and makes that data available via an API and online maps. The Finnish Transport Agency also provides an API of data collected by its AIS stations on the Baltic Sea and other local waters; Denmark’s government publishes free historical data of maritime traffic on Danish waters; and the Coast Guard publishes historical AIS data for U.S. coastal waters (currently only for 2009–2014). [h/t Topi Tjukanov + Miska Knapek]

A better mammography database. The Digital Database for Screening Mammography was first released two decades ago, in 1997. It contains data and images from 2,620 mammographies — a mix of normal, benign, and malignant cases. In a Scientific Data article published last week, a team of Stanford University researchers describe a series of improvements they’ve made to the original database; their Curated Breast Imaging Subset of DDSM has modernized the database’s image formatting, added detailed “region-of-interest” annotations, and converted the metadata into CSV files.

Drug-free school zones in Tennessee. As part of a recent investigation, reporters at Reason Magazine used public records law to obtain geospatial data on each of Tennessee’s 8,544 drug-free zones. In addition the geographic boundaries, the shapefile also includes each zone’s name and type (school, childcare, park, or library). [h/t CJ Ciaramella]

Italian for watermelon. Through a series of surveys, L’Atlante della Lingua Italiana QUOTidiana has been asking Italian speakers what words they use to describe various everyday things. The results for each question can be browsed as maps, or downloaded as XML files. When shown a picture of a watermelon, most respondents wrote “anguria,” but others responded with “cocomero,” “melone,” “citrone,” or “zipangulu.” [h/t Giuseppe Sollazzo]