1. médialab Sciences Po
  2. News
  3. Guerrilla scraping: research ethics, internet law and programming tricks for critical computational sociology

Guerrilla scraping: research ethics, internet law and programming tricks for critical computational sociology

The next medialab seminar will welcome Marion Lieutaud, sociologist and post-doctoral researcher, and Sophie Stalla-Bourdillon, researcher and co-director of the Brussels Privacy Hub. Their presentation will focus on the difficulties sociologists face in accessing data from digital work platforms and alternative strategies for conducting ethical and independent quantitative research.

Event, Research Seminar

Salle B.010, 1 Place Saint-Thomas d'Aquin, 75007 Paris

Abstract

Starting from a reflection on the experience of collecting internet research data on digital labour platforms, we discuss the resistance and legal intimidation that can be presented by for-profit online platforms towards sociologists’ efforts to collect quantitative data and to quantitatively investigate platform work. As the windows of researcher-platform cooperation embodied by APIs narrows, digital labour platforms (e.g. Uber, Deliveroo, TaskRabbits, etc...) enforce an ever-stricter data monopoly on large sections of the labour market which they mediate and oversee – this despite the creative approach taken by some jurisdictions, such as the EU lawmaker, with a view to force very large platform to share more data with researchers. Where qualitative research remains possible though by no means easy (see e.g. Fairwork research (Spilda et al. 2022)), we discuss whether quantitative and computational sociologists in particular may have to develop a ‘guerilla’ approach to online research and online data hosted by private platforms. We refer in particular to situations where cooperation and data sharing is deemed impossible, and more autonomous, hazardous, adversarial and/or furtive approaches may come to be seen as necessary.

We review existing guidelines on internet-based social research in a variety of British and EU universities and conduct expert interviews with social scientists in the UK and in the European Union who specialize in web-scraping and online datamining for their research; we use this to investigate whether such ‘guerilla’ approaches are already common practice; to what extent ethical and legal guidance and disciplinary practices issued by research institutions (or lack therefore) help solve conflicts on the ground and/or drive them towards such approaches; what ethical, legal and practical steps or ‘tricks of the trade’ researchers set in place to protect themselves and the individuals under study; and finally the extent to which this is impacting the substance and practice of internet research. The final part of the project is an attempt at crafting practical methodological paths towards ethical, legally safe(r) and socially essential independent quantitative sociological scrutiny into online platforms and platform-mediated work. This entails leveraging with care privacy-preserving data processing techniques informed by risk assessment, adopting a research-friendly interpretation of the scope of intellectual property rights consistent with recent case law, and approaching with a degree of combativity restrictive terms of use and platform gatekeeping practices. We translate this into programming suggestions on how to go about creatively using and writing code for online data harvesting.

Biography

Marion Lieutaud is a sociologist and post-doctoral researcher in the Methodology Department at the London School of Economics and Political Science (LSE). She is also an anti-poverty representative on the UCU branch trade union committee at LSE.

Sophie Stalla-Bourdillon works at the interface of Law and Technology at BrusselsPrivacyHub and at UoS, and previously at Immuta.

Practical information

The session will take place on Thursday, May 22, 2025, from 2:00 PM to 4:00 PM, in person and in English at Sciences Po, Room B.010, 1 Place Saint-Thomas-d'Aquin, Paris 75007.

Registration is mandatory via this link.