1. médialab Sciences Po
  2. Productions
  3. Legibot: A comprehensive theory-driven annotated corpus and SLM for French legislative data

Legibot: A comprehensive theory-driven annotated corpus and SLM for French legislative data

Pierre-Carl Langlais, Annina Claesson, Manon Berriche, Andreï Mogoutov, Jean-Philippe Cointet

We introduce Legibot, a large-scale, LLM-assisted framework for analyzing legislative debate practices in the French National Assembly during the current legislature (since 8 July 2024). Building on the Discourse Quality Index and adjacent deliberative-democratic scholarship, we develop a rich annotation scheme that captures a range of dimensions, including tone, adherence to procedural and deliberative norms, epistemic claims, argumentative structure, emotion, and performative acts. We further train and release a lightweight supervised language model (SLM) specialized for these annotation tasks, and we deploy the fine-tuned model on the complete dataset comprising 407,126 individual sentences drawn from 149,934 interventions. Both the annotated corpus and the SLM are made available to the research community to support reproducibility and follow-up work. Applied analyses using this multidimensional representation reveal systematic variation in justificatory practices, engagement with opponents, and affect across party lines and agenda types, and show that multidimensional discourse features predict floor reactions and certain legislative outcomes above topical controls. The study demonstrates how theory-guided annotations, supported by LLMs, allow for bridging normative concepts of deliberation with scalable text analysis in a core European legislature. As such, it also contributes to the current literature on the added value and limitations of generative AI in the social sciences.