Data Is Plural

... is a weekly newsletter of useful/curious datasets.

2024.03.20 edition

Human trafficking, real-world vehicle emissions, state legislators, Meta Oversight Board decisions, and aviation waypoints.

Human trafficking. The Counter-Trafficking Data Collaborative’s Global Synthetic Dataset uses differential privacy techniques to represent “over 206,000 victims and survivors of trafficking identified across 190 countries and territories from 2002 to 2022.” The approach, developed in partnership with Microsoft Research, converts anonymized case records into “a new dataset in which records do not correspond to actual individuals, but which preserves the structure and statistics (i.e., utility) of the original data.” Each row indicates a (synthetic) individual’s gender, age group, citizenship, country of exploitation, duration of reported trafficking, traffickers’ means of control, types of exploitation, and the year the collaborative’s partners registered the case. Related: The collaborative’s Global Victim-Perpetrator Synthetic Dataset, which takes a similar approach to relationships between victims and perpetrators. [h/t Mariana Moreira + Lorraine Wong]

Real-world vehicle emissions. On Monday, the European Commission published its first report analyzing the real-world CO2 emissions of cars and vans, based on fuel consumption monitoring devices that the EU now requires. The report uses data received from 600,000+ vehicles. That sample is available to download, along with metrics aggregated by manufacturer and fuel type: average fuel consumption, emissions, and comparisons to standardized test results. Related: Data on millions of EU car registrations (and van registrations), including each vehicle’s fuel economy and emissions ratings. Previously: FuelEconomy.gov (DIP 2017.04.12), with data on decades of car models. [h/t Jan Willem Tulp + Xan Gregg]

State legislators. Nicholas Carnes and Eric Hansen’s 2023-4 State Legislators Dataset features “biographical information about state lawmakers who held office in 2023 and 2024 compiled from legislative and campaign websites and other online sources.” The dataset spans all 50 states and includes 7,300+ lawmakers. “The project’s principal aim was to record the current or most recent main occupation (outside of elected office) held by each member,” the authors write, “but the dataset also includes information about a wide range of characteristics including race, gender, and education.” A version for 2021–22 is also available. Previously: State legislator financial disclosures (DIP 2017.12.13) and ideology scores (DIP 2020.01.01). [h/t Derek Willis]

Meta Oversight Board decisions. Meta’s independent Oversight Board reviews a selection of the company’s content-moderation decisions and has the power to overturn them. The board publishes its rulings online, as does Meta itself; neither, however, provides a download link. But Information Is Beautiful has compiled a spreadsheet of the board’s 80+ decisions through early February, supporting a visualization of the cases’ topics and outcomes over time. [h/t Data Science Community Newsletter]

Aviation waypoints. For his recent exploration of the FAA’s aviation maps, Beautiful Public Data’s Jon Keegan has turned the agency’s list of 67,000+ navigation waypoints into a downloadable dataset. “Often these waypoint names will reflect the culture, food or sports teams of the city they are near,” Keegan writes. “Off the coast of New England, there is LBSTA and WHALE. Boston’s sports legacy gave us BOSOX, BRUWN, CELTS, PATSS, FENWY, ORRRR and BORQE. Salem has WITCH, and Plymouth has PLGRM.”