Data Is Plural

... is a weekly newsletter of useful/curious datasets.

2018.01.17 edition

NYC marriage licenses, OECD immigration policies, London air pollution, psychometric tests, and the Ghibliverse.

Three million NYC marriage licenses, reclaimed. Reclaim The Records launched in 2015 and became a 501(c)(3) non-profit last year. Its mission: To “identify important genealogical records sets that ought to be in the public domain but which are being wrongly restricted by government archives, libraries, and agencies.” The organization files freedom-of-information requests and lawsuits to get the data, and “then we digitize everything we win and put it all online for free, without any paywalls or usage restrictions, so that it can never be locked up again.” Most of the records they’ve received so far have arrived as PDFs or microfilm. But a 2016 court settlement with the NYC City Clerk’s Office netted the group — and the public — a dataset of 3 million NYC marriage licenses from 1950 to 1995.

Three decades of immigration policies. The Immigration Policies in Comparison (IMPIC) project has quantified the immigration regulations of 33 OECD countries between 1980 and 2010. The project, led by political sociologist Marc Helbling, dives deeply into the regulations related to four policy areas: labor migration, family reunification, asylum/refugees, and “co-ethnics.” You can find the dataset’s detailed codebook and methodology in this PDF. Related: Helbling’s summary of the project’s goals, approach, and initial findings (Migration Data Portal). [h/t David Brady]

London air pollution. The London Air Quality Network, run by researchers at King’s College London, gathers data on levels of nitrogen dioxide, ozone, fine particulate matter, and other pollutants from more than 100 monitoring sites. You can download the data as CSV files (for up to six metric and site combinations at a time) or fetch JSON and XML data from the site’s API. Related:London air pollution live data – where will be first to break legal limits in 2018?” (The Guardian). Previously: Air quality data from the EPA (DIP 2017.10.04), OpenAQ (DIP 2017.03.29), Berkeley Earth (DIP 2017.03.22), and the World Health Organization (DIP 2016.06.15). [h/t Gavin Freeguard]

Psychometric tests. The Open Source Psychometrics Project “provides a collection of interactive personality tests with detailed results that can be taken for personal entertainment or to learn more about personality assessment.” You can download results from more than 30 such tests, including the Big Five Personality Test, the Kentucky Inventory of Mindfulness Skills, and Bob Altemeyer’s Right-wing Authoritarianism Scale. Related:Most Personality Quizzes Are Junk Science. I Found One That Isn’t” (FiveThirtyEight). [h/t Chris Zioutas]

The Ghibliverse. The unofficial Studio Ghibli API contains structured information about the famed Japanese animation studio’s films (e.g., Princess Mononoke and Spirited Away), plus the characters, locations, and vehicles featured in them. You can also download a single file containing all the data.