Data Is Plural

... is a weekly newsletter of useful/curious datasets.

2022.04.20 edition

Rare diseases, solar panels, electoral interventions, open source security, and spider news.

Rare diseases. Orphanet, established in 1997 by the French National Institute for Health and Medical Research, “aims to provide high-quality information on rare diseases,” defined as those affecting no more than 1 in 2,000 people in Europe. You can search the diseases, relevant drugs, patient organizations, and other resources. The affiliated Orphdata provides public downloads and an API for certain data, including disease prevalence, symptoms, and genetic links. Related: Orphanet and the European Bioinformatics Institute have developed a structured vocabulary defining relevant terms and their relations to one another. [h/t Simona Gamba et al. + Kevin Lewis]

Solar panels. The Berkeley Lab’s Tracking the Sun project examines US trends in residential and small non-residential solar panel installations. Its latest report describes more than 2 million such projects, based on records provided by state governments, utility companies, and other organizations. It features an interactive dashboard, summary tables, and a public dataset that lists each installation’s location, capacity, price, cost rebated, owner type, installer, physical orientation, component details, and other characteristics. A companion report and dataset examine utility-scale solar plants. [h/t Ed Vine]

Electoral interventions. Political scientist Dov H. Levin’s Partisan Electoral Intervention by the Great Powers dataset describes 117 attempts by the US and Russia/USSR to influence foreign elections between 1946 and 2000, drawing on congressional investigations, declassified histories, academic publications, and other sources. For each election, it indicates the intervening nation(s), whether their acts were overt or covert, whether they involved campaign funding, and more. Related: Lucan A. Way and Adam E. Casey’s dataset of Russian electoral interventions from 1991 to 2017. [h/t @Idl3]

Open source security. GitHub recently open-sourced its database of open source–related security advisories. Now the public can download its full contents and contribute additions and improvements. The database uses a standardized schema that Google staff introduced last year and that other major services have adopted; the project aggregates their reports. [h/t Grey Baker]

Spider news. Stefano Mammola et al. have “compiled an expert-curated global database on the online newspaper coverage of human-spider encounters” from 2010 to 2020, spanning 5,300+ articles from 81 countries. The database provides the location of each encounter, plus information about the “presence of photographs of spiders and bites, number and type of errors, consultation of experts, and a subjective assessment of sensationalism.” Read more: “The global spread of (mis)information on spiders,” by the researchers.