Data Is Plural

... is a weekly newsletter of useful/curious datasets.

2021.08.25 edition

Legislator stock trades, national revenues, Africa building outlines, AI patents, and Formula One.

Legislator stock trades. US Congress members and candidates must report all stock purchases and sales exceeding $1,000, as well as those of their spouses and dependent children. Those records are technically available through the House’s and Senate’s financial disclosure portals, but neither provides bulk data. Software engineer Tim Carabat’s Senate Stock Watcher and House Stock Watcher websites fill that gap by making the transactions available to browse, query, and download. In the case of the House, where reports are still provided as PDFs, Carabat also coordinates the manual transcription of those files.

National revenues. UNU-WIDER’s Government Revenue Dataset “aims to present a complete picture of government revenue and tax trends over time.” The project, updated this month, currently covers 196 countries and goes back, in most cases, to the early 1980s. It draws on data from OECD and IMF reports and includes dozens of variables, such as total revenue, natural resource taxes, and foreign grants received. Previously: The OECD’s Global Revenue Statistics Database (DIP 2018.08.01). [h/t Lisa Chauvet & Marin Ferry + Erik Feiring]

Africa building outlines. Open Buildings, a project led by Google Research’s Ghana office, has published a dataset of 516 million building footprints in Africa, estimated from satellite imagery. The dataset, which you can explore online and download as CSVs, spans roughly 64% of the continent. It describes each estimated footprint’s coordinates, shape, and area, plus the detection algorithm’s degree of confidence. Previously: Footprints of buildings in the US (DIP 2018.07.18), and in Canada and New Zealand (DIP 2019.09.25).

AI patents. The US Patent and Trademark Office has built a series of machine-learning models to identify patents that involve AI technologies, such as natural language processing or computer vision. Its Artificial Intelligence Patent Dataset, released in June, focuses on eight of these technologies and provides predictions of their presence (or absence) in 13.2 million granted patents and patent applications since 1976, finding hits in 11% of the documents. [h/t Nicholas Rada]

Formula One. The Ergast Developer API provides seven decades of Formula One racing results, with details on each season, race, and result since 1950, each lap time since 1996, each pit stop since 2012, and more. In addition to querying the API, you can also explore the data online and download it in full. As seen in: FiveThirtyEight’s “Who’s The Best Formula One Driver Of All Time?” [h/t Eric Gardner + Cameron Yick + David Ortiz]