Data Is Plural

... is a weekly newsletter of useful/curious datasets.

2021.01.27 edition

Tens of millions of flights, dot-gov characteristics, corporate risk-talk, Euro-polling, and the Codex Atlanticus.

Tens of millions of flights. The OpenSky Network crowdsources air traffic data, thanks to members who collect the radio signals that aircraft periodically broadcast. The nonprofit organization has published a “COVID-19 flight dataset,” which contains metadata for the tens of millions of flights those members observed in 2019 and 2020, with plans to update the dataset until the pandemic ends. It includes each flight’s call sign, aircraft model type, origin and destination airports, time first and last seen, and more. [h/t Evgeny Pogrebnyak]

Dot-gov characteristics. The US General Services Administration publishes a list of all registered .gov domains (DIP 2017.01.18). In 2011, 2014, and 2015, open-source advocate Ben Balter used software to scan each federally-managed domain “to sniff out information about [their] technology and capabilities.” For instance: Do the domains support HTTPS? Do they use Google Analytics? Earlier this month, Balter ran the scan again, “to serve as a snapshot of the state of government technology ahead of the incoming Biden administration.” You can browse and download the results.

Corporate risk-talk. The Firm-Level Risk project uses “textual analysis of quarterly earnings conference calls held by more than 11,000 listed firms in 81 countries” to construct company-by-company “measures of exposure, risk, and sentiment.” The dataset goes back nearly two decades and includes sub-measures for various themes, such as tax policy, Brexit, and COVID-19. [h/t Stephan Hollander]

Euro-polling. The European Opinion Polls as Open Data repository collects the results of party-preference polls in 34 countries. For each poll, it lists the polling firm and commissioners, when the fieldwork began and ended, its scope and sample size, and the topline numbers for each party. Related: Project maintainer Filip Van Laenen on “how simple things can turn out to be rather complicated.” [h/t Erik Gahner Larsen]

The da Vinci codex. The Codex Atlanticus “is the largest existing collection of original drawings and text by Leonardo da Vinci” — 1,119 pages assembled by the 16th–century sculptor Pompeo Leoni. Milan’s Biblioteca Ambrosiana and The Visual Agency have created an interactive graphic that lets you explore the pages by year and subject; you can also download that metadata through the graphic’s “About the project” section. [h/t Giuseppe Sollazzo]