Data Is Plural

... is a weekly newsletter of useful/curious datasets.

2016.11.16 edition

Hate crimes, fake news on Facebook, Election Day on “the front page of the internet,” important Wikipedia pages, and every street tree in NYC.

Hate crimes in the United States. Since the 1990s, the FBI has collected data on hate crimes from local law enforcement agencies. On Monday, the bureau released data for 2015, reporting “5,850 criminal incidents and 6,885 related offenses, as being motivated by bias toward race, ethnicity, ancestry, religion, sexual orientation, disability, gender, and gender identity.” Those numbers are based on reports from 14,997 participating agencies. On the FBI’s website, you can view and download summary tables of the most recent data. You can also download incident-specific data for 1992 through 2014 from the National Archive of Criminal Justice Data. Unfortunately, as ProPublica noted yesterday, the FBI dataset is “deeply flawed”; more than 3,000 law enforcement agencies don’t participate in the program. [h/t John Templon]

Fake news on Facebook. Last month, colleagues at BuzzFeed News and I analyzed and fact-checked 1,000+ posts from hyperpartisan Facebook pages, and found a disturbingly high rate of fake news. Here’s the data. Facebook CEO Mark Zuckerberg has dismissed the possibility that fake news influenced the election, calling it a “pretty crazy idea”. Meanwhile, renegade Facebook employees have now formed an unofficial task force to battle fake news on the platform.

Election Day on “the front page of the internet.” Jason Baumgartner — a.k.a. Stuck_In_the_Matrix — has collected and published every submission and comment posted to Reddit from November 8th through November 10th. For each of the nearly 8 million comments, the dataset includes the message, the author, the subreddit it was posted to, the comment thread’s ID, and more. Previously: 1.7 billion Reddit comments, featured Nov. 25, 2015.

The most important entries on Wikipedia. Germany-based researcher Andreas Thalhammer has applied PageRank — the algorithm at the heart of Google’s origin story — to the world of Wikipedia. The result: the DBpedia PageRank dataset, which estimates the importance of each page based on the other pages that link to it. You can download the data directly, or query it online. (According to the metric, Aristotle, Plato, and Karl Marx are history’s three most Wiki-central philosophers.)

Every street tree in NYC. Earlier this month, New York City published the results of its decennial tree count. You can explore a map of every street tree in NYC — nearly 700,000 of ‘em — or download the corresponding dataset, which contains info on each tree’s species, circumference, health status, and other observations. (Note: That dataset appears to contain about one-third fewer trees than the map’s count, for reasons I can’t quite figure out.) Results of the 1995 and 2005 tree censuses are also available.