Data Is Plural

... is a weekly newsletter of useful/curious datasets.

2018.08.08 edition

Militarized disputes, spending at Trump properties, overlooked computer scientists, death and taxes in the Garden State, and (more) street trees.

Militarized disputes. The Militarized Interstate Dispute datasets provide details about more than 2,200 instances between 1816 and 2010 where a government “threatened, displayed, or used force against another” — including each dispute’s timing, participants, death count, result, and more. A supplementary database tracks the disputes’ locations. The datasets are part of the Correlates of War project, which was founded in 1963 and which strives for “the systematic accumulation of scientific knowledge about war.” [h/t Erik Beuck]

Spending at Trump properties. ProPublica is tracking the money that political campaigns and government agencies have reported spending at Donald Trump’s hotels, golf clubs, and restaurants. You can download the data, which includes the spender, property, date, amount, and listed purpose for each payment. From ProPublica’s notes: “Federal government spending is incomplete because many government agencies have actively fought requests to disclose spending at Trump properties. The data we have so far was released, in part, after lawsuits.”

Overlooked computer scientists. Researchers at Primer, a machine learning and natural language processing startup, have released a dataset describing more than 36,000 notable computer scientists, “only 15%” of which have Wikipedia biographies. The researchers trained their algorithms on a corpus of existing Wikipedia articles, Wikidata entries, news articles, and the Semantic Scholar Open Research Corpus. (The latter contains data on more than 39 million research papers in computer science, neuroscience, and biomedical science.) The results include each computer scientist’s name, basic metadata, academic papers, and snippets of news articles mentioning them. Related:Using Artificial Intelligence to Fix Wikipedia’s Gender Problem” (Wired). [h/t Sara Blask]

Death and taxes in the Garden State. The nonprofit organization Reclaim The Records recently obtained New Jersey’s death index, and has made it available to search and download. The records include structured data for 1,275,833 deaths in the state between 2001 and 2017, plus digitized images of the death index for 1901-1903, 1920-1929, and 1949-2000. The structured data contains each person’s name, date of birth, date of death, and death certificate number — plus, for the most recent records, the locations of birth and death. Also: NJ Advance Media has published data on 17 years of drug overdose deaths from the state’s Office of the State Medical Examiner, and property tax rolls for “all 2.3 million taxable parcels of land” in 2017. (Free registration required to download the files.) [h/t Benjamin Cooley + Martin Burch]

More street trees. London, Belfast, Vancouver, Washington (D.C.), Philadelphia, Boston, Cambridge (Mass.), Madison, Providence, San Francisco, Oakland, and Berkeley are among the many cities that publish data cataloguing the trees that line their streets. Previously: NYC’s street trees (DIP 2016.11.16). [h/t Jens von Bergmann + Sunlight Open Cities + u/willwardo]