Data Is Plural

... is a weekly newsletter of useful/curious datasets.

2017.05.31 edition

Government payrolls, federal lobbyists, state gun laws, per-sector financials, and NYC doggies.

Government payrolls. Last week at BuzzFeed News, we shared a vast trove federal payroll data. Those records — provided by Office of Personnel Management through the Freedom of Information Act — cover more than 40 years and millions of employees. The dataset includes salaries, titles, job types, and demographic variables. In many-but-not-all cases (per OPM’s data release policies), it also includes names. Previously, federal payroll data had been searchable online, but very little was available in downloadable, analysis-friendly formats. Also: Many states – including New York, California, Florida, New Jersey, Minnesota, Arkansas, South Carolina, and Washington proactively make payroll data available for download. (Some cities, such as Chicago, do, too.)

Government lobbying. U.S. lobbyists must notify Congress within 45 days of being retained by new clients. Every quarter after that, they’re required to file activity reports that detail the agencies they lobbied, the topics they covered, and the income they earned. Bulk downloads of both types of reports are available as XML files from the House (going back to 2004) and from the Senate (since 1999). Although they receive the same filings, each chamber “follows different data-cleaning, processing, and editing procedures before storing the data,” according to this recent GAO report.

State gun laws. A team of researchers at the Boston University School of Public Health has collected data on the presence/absence of 133 different types of firearm laws in each U.S. state, for each year between 1991 and 2016. The legal provisions are grouped into 14 categories, such as background checks, “Stand Your Ground” laws, and child access prevention. You can download a spreadsheet of the data, and also browse state-by-state summaries. Previously: The Correlates of State Policy Project (DIP 2016.07.06).

Industrial sector data. Aswath Damodaran — a professor of finance at the NYU’s business school — maintains a trove of data on per-sector financials, including effective tax rates, return on equity, and working capital ratios by industry. For most datasets, Damodaran publishes both current and historical versions. [h/t Tim McGovern]

NYC doggies. You might have seen New York City’s bubble map of dog names. It turns out that the underlying dataset — which includes the name, gender, age as of 2015, breed, and borough of more than 110,000 dogs — is available on GitHub. You can also download slightly older, but more detailed data from WNYC’s Dogs of NYC project. That data includes each dog’s coat colors, whether it had been spayed/neutered, and its ZIP code. Related: Similar pet license data from Tacoma, Wash., and Edmonton, Canada. [h/t Alex P. Miller + Dan Nguyen]