MetAt - Febuary 13, 2024 logbook
Share our methodological expertise and skills.
Salle K.011, 1 place Saint Thomas d'Aquin 75007 Paris
What is MetAt?
MetAt is a research methods support workshop: every month, a three-hour slot to help you resolve the methodological difficulties you encounter in the course of a scientific project.
Who is MetAt for?
MetAt is aimed at anyone needing occasional support in using a research tool or method. All profiles are welcome: students, doctoral students, researchers, research engineering professionals and others, inside and outside Sciences Po, with no restrictions on status or affiliation.
How to register?
Registration is compulsory via the form available on the MetAt page
Session of 13/02/2024
Venue: Sciences Po
Number of participants: 10
Supervisors: Audrey Baneyx, Béatrice Mazoyer, Marion Frelat, Kelly Christensen, Claire Ecotiere, Maxime Crépel, Benjamin Ooghe-Tabanou, Guillaume Plique, Diego Antolinos Basso, Félix Alié, Emilien Schultz, Robin De Mourat, Antoine Machut, Guillaume Levrier, Yuma Ando
Mapping the terrain of sickle cell disease
Second support for a Sciences Po master's student wishing to compare maps co-produced with interviewees, to represent the living areas of sickle-cell anaemia. The support took the form of a discussion on the consequences of choosing a survey protocol and the role of (re)adjustment. The supervisors suggested interview methods, mapping possibilities and a survey approach based on "with" rather than "on". Students should seek the advice of a research designer to pursue their project.
Youtube data collection
First coaching of a participant who came to be coached on a collection of metadata and transcripts of videos from a sample of YouTube urls. The sample consisted of videos in several languages, including Hindi, Telugu and English.
The supervisor suggested installing Minet in three places: (1) the Windows computer, (2) the Ubuntu virtual machine, and (3) the bash terminal of a remote R server. The last was preferred because it lets the commands run on the server, but it was the most difficult to install because of the installation permissions in the server profile. Finally, taking advantage of Python version 3.8 already installed on the server, virtualenvwrapper and minet could be installed in a virtual environment. An introduction to the use of Minet to collect data from YouTube was given, and then the supervisor guided the participant to generate a Youtube API key and use it to collect metadata from channels and videos, as well as to collect comments.
To install Minet in the terminal of an R server, where the user does not have many rights but where a python version higher than 3.7 is already installed, it would be possible to try to take advantage of "virtualenvwrapper", an old manager of virtual environments. To use it, enter "source virtualenvwrapper.sh" then "workon env" (for example, an environment called "minet" in which minet is installed with "pip install minet").
Fine-tune de BERT
Support for a post-doctoral researcher to better understand the whole process of BERT and how the model efficiency is measured. They discussed how to improve Bert fine tuning for liberal/illiberal text classification. The main point identified was that the Bert model was for the moment fine-tuned on 0-shot generated data from chatgpt, and there was no ground truth data to test the quality of data/model. So some tips are to annotate a ground truth for being able to get a measure of quality ; try the Augmented Scientist Python library to finetune the model and compare the result ; test if active learning could be a solution to accelerate the manual annotation process (mail send to access). They had some discussions on the different milestone for comparing national contexts and different sources.
Youtube monitoring tool
The participants came with a tool already developed on StreamLit for viewing transcripts of videos captured 4 times a day using Youtube trends. The discussion focused on methods for highlighting key words that stand out over a given period in relation to the rest of the corpus, as well as on topic detection tools and the construction and visualisation of networks.
Analyse the central bank discourse
Support for a researcher and a post-doctoral researcher in the textual analysis of data from central banks. The support took the form of a discussion on the use of the paragraph as a marker for argumentative analysis, in the sense that parsing and ocr-isation of documents in PDF format (proprietary format) are complex to optimise for a time-limited project. Secondly, a dataframe containing the phrases of interest has been created. Work will need to be done on extracting footnotes, on the regex that identify phrases of interest, and on the qualitative annotation method for phrase-context sets.
This is the second time we have worked with a digital artist on a project to transform parliamentary debates into artistic installations, particularly musical ones, in a former washhouse in the Béarn region. The person being supported had made good progress since her previous participation in the MetAt, and had questions about the choice, presentation and articulation of the various metrics generated from the debates on the one hand, and access to MPs' data on the other, for which the mentor put her in touch with the team from the Regards Citoyens association responsible for monitoring the NosDéputés.fr website.
Mapping controversies using NLP tools, Europresse and Cortext
Support for two students who expressed the need to use NLP tools to create maps of controversies, following initial attempts using the AtalsTI tool that were not entirely satisfactory. The supervisor helped them to extract a Europresse corpus in html (initially in pdf), to extract the terms, to produce a cooccurrence network with Cortext, to make a contingency matrix, then a corpus demography and blumpchart also with Cortext.
This initial introduction was very useful for their needs on a test corpus. The session ended with the setting up of a protocol for processing complete corpora.
Path from targets with Hyphe
Support for a master’s student who had a Hyphe corpus (who was never trained on Hyphe) with 20 starting subreddits and wanted to see if these opened a path to more toxic “incel-type” subreddits (he had 4 “target” subreddit). The methodology was slightly rekt from the ground up (reddit pages change a lot over time, so a page crawled at a given time might have a very short shelf life), so they looked into trying to find path from the known targets (4 incel subreddits) to the sources. Turns out this is quite hard to do because hyphe doesn’t let you export links. They used graphology to parse a gexf (Graph Exchange XML Format) and do the first step and find what targets incel SR nodes.
Observable notebook with all the code here : https://observablehq.com/d/1a6806a0c9e1a6bb
The solution here is probably to look at the users of the incel subreddits and see in what other subreddits they posted over time. There is arguably not much of a point of using hyphe in that context when you can just track what spaces users engage with.