Data Is Plural

... is a weekly newsletter of useful/curious datasets.

2022.07.20 edition

New voting laws, notable people, budget apportionments, digital trade provisions, and the World Cup.

New voting laws. The Voting Rights Lab has been tracking 2,000+ laws proposed in US state legislatures since 2021. The tracker focuses on “12 major issue areas relating to voter access and representation,” such as early voting, same-day registration, and ID requirements. It lists each bill’s state, number, author, date introduced, current status, and issue areas, plus a summary and the lab’s “assessment of whether the legislation is likely to improve or interfere with voter access or the administration of elections.” As seen in: “Has Your State Made It Harder To Vote?” (FiveThirtyEight) Related: States Newsroom’s Kira Lerner has compiled a spreadsheet of 120 new election-related criminal penalties, based partly on the tracker’s data.

Notable people. “A new strand of literature aims at building the most comprehensive and accurate database of notable individuals,” observe Morgane Laouenan et al., who contribute a “cross-verified database of 2.29 million individuals” mined from Wikidata and the English, French, German, Italian, Spanish, Portuguese and Swedish editions of Wikipedia. For each person, the dataset provides their birth and death dates, gender, citizenship, occupations, and other details. Previously: The MIT-based Pantheon dataset (DIP 2016.02.03), also based on Wikipedia and since updated. [h/t Philip Jung]

Budget apportionments. Congress, through a process called appropriations, chooses how much money goes to each US federal agency and program. But the Office of Management and Budget, through a process called apportionment, ultimately sets the rules for spending those funds, “typically limit[ing] the obligations [an agency] may incur for specified time periods, programs, activities, projects, objects, or any combination thereof.” Those binding decisions have generally not been available to the publicuntil last week, when OMB launched a database of apportionments for FY 2022, per a requirement in Congress’s 2022 spending bill. [h/t Caitlin Emma]

Digital trade provisions. Mira Burri et al.’s TAPED dataset, which “seeks to comprehensively trace developments in the area of digital trade governance,” categorizes 100+ relevant aspects of 300+ preferential trade agreements signed since 2000. The dataset indicates, for instance, that the Peru-Australia Free Trade Agreement contains binding agreements on personal data protection, nonbinding language on cybersecurity, and no provisions regarding net neutrality.

The World Cup. Josh Fjelstul’s World Cup Database, published this month, provides “extensively cleaned and cross-validated” information about each of the 21 FIFA World Cup tournaments played so far. Its 27 tables contain “approximately 1.1 million data points” regarding the teams that participated, their players and managers, the referees, match outcomes, goals, penalties, and more.