Data Is Plural

... is a weekly newsletter of useful/curious datasets.

2017.06.28 edition

Infectious diseases, people’s genes, real estate inventories, tax-exempt nonprofits, and politicians’ tweets.

Infectious diseases in Europe. The European Centre for Disease Prevention and Control’s Surveillance Atlas of Infectious Diseases lets you browse, map, and download data on the historical incidence of several dozen diseases — from anthrax to Zika — in each of the European Economic Area’s countries. Related: Keila Guimarães’s recent investigation into penicillin shortages, which uses the Centre’s data on syphilis cases.

People’s genes. OpenSNP is a website that lets people publish the results of their genetic tests (such as those sold by 23andMe, deCODEme, FamilyTreeDNA), “find others with similar genetic variations, [get] the latest primary literature on their variations, and help scientists find new associations.” Since 2012, users have uploaded more than 3,000 sets of genetic variants, which you can download individually or in bulk or access via OpenSNP’s API. Users can also list various personal traits, such as eye color, height, coffee consumption, and lactose intolerance. Useful primer: SNP stands for “single nucleotide polymorphism,” the NIH explains. They’re “the most common type of genetic variation”; each one “represents a difference in a single DNA building block, called a nucleotide.”

Real estate inventories. The National Association of Realtors publishes monthly real estate inventory data “at the national level, the 500 largest metropolitan areas, the 1,000 largest counties, and over 15,000 zip codes.” The data, based on the realtors’ multiple listing services, goes back five years and “tracks key market metrics including list prices, days on market, and total active inventory.” As of early June, six counties — Manhattan, plus five in California — had median listing prices above $1 million. Previously: The Census Bureau’s Annual Characteristics of New Housing (DIP 2016.06.22), international house prices (DIP 2017.02.08), millions of mortgages (DIP 2015.12.30), and millions more mortgages (DIP 2017.03.15). [h/t Reddit user bbekks]

Every federally tax-exempt nonprofit. The Internal Revenue Service publishes a file listing all “organizations eligible to receive tax-deductible charitable contributions” — currently more than 1 million charities, private foundations, and other groups. (Not all nonprofits apply for, or receive, tax-exempt status from the IRS; but all tax-exempt organizations are nonprofits.) Previously: Annual IRS 990 filings, in bulk (DIP 2016.06.22). [h/t Norbert Krupa + Derek Willis]

140-character politics. The recently-launched Tweets Of Congress is collecting and publishing daily archives of tweets by congressional representatives, caucuses, and committees. Meanwhile, the Trump Twitter Archive has collected more than 30,000 of @realDonaldTrump’s tweets, which you can search and download.