1. médialab Sciences Po
  2. News
  3. MetAt - December 10, 2024 logbook

MetAt - December 10, 2024 logbook

Share our methodological expertise and skills.

Event, Workshop

Campus Grands Moulins, Université Paris Cité

NOTA BENE



What is METAT?


METAT is a research methods support workshop: every month, a three-hour slot to help you resolve the methodological difficulties you encounter in the course of a scientific project. 

Who is METAT for?


METAT is aimed at anyone needing occasional support in using a research tool or method. All profiles are welcome: students, doctoral students, researchers, research engineering professionals and others, inside and outside Sciences Po, with no restrictions on status or affiliation.

How to register?


Registration is compulsory via the form available on the METAT page.

 

Session of 10/12/2024

Location: Campus Grands Moulins de l’Université Paris Cité, Hall C du Bâtiment de la Halle aux Farines, Esplanade Pierre Vidal Naquet, 75013, Paris.

Supervisors: Guillaume Plique, Maxime Crépel, Charlotte Dion, Adèle Etaix, Diego Antolinos Basso, Benjamin Ooghe-Tabanou, Julien Pontoire, Carlo Santagiustina, Béatrice Mazoyer, Blazej Palat, Audrey Baneyx, Sandra Hamiche.

Internet data collection

Support for a student seeking to collect and process data from forums and facebook groups on bedbugs, bearing in mind that these are short and often anonymous texts. A first attempt in data collection was made using the medialab tool minet, a python library and command-line webmining tool, but its scraping tool was obsolete. A second scraping attempt was then done from another field, Doctissimo posts, using the shelob tool. Methodological support was then provided for manual data collection using spreadsheets.

Analysis of parliamentary and political speeches on radicalization 

Support for a Master's student in Relations internationales wishing to analyze a corpus of political speeches on radicalization, in particular at the French National Assembly, in Python, taking into account the psychological factor of the radicalization phenomenon. The supervisor introduced her to the basics of natural language processing (NLP). For this, various types of data were considered beforehand, such as the Vie publique website in addition to those already in the student's possession (excel file of debates). An encoding problem has been encountered with XML. This needs to be resolved at export time. The structure of the two databases was discussed. Finally, a test was carried out on Cortext for processing, once the database had been completed.

Cleaning and analysis of data collected from insta poets

Support for a researcher wishing to map online poets or “insta poets”, using a methodology at the crossroads of digital literature studies, information and communication sciences, sociology and contemporary history. She/He had data in spreadsheet format drawn from the results of a Google Forms questionnaire with many open-ended questions, and needed to clean it up and analyze it easily, for example with statistics or networks. The supervisors began by introducing to Open Refine features for cleaning and exploring the values in his/her data file. They then looked together at Table 2 Net and Nansi for network analysis. Iramuteq was then suggested for exploratory textual analysis of respondents' answers to questions. The possibilities for data processing are opened up by the multiplicity of tools presented. To do this, we need to keep our initial working hypotheses in mind, and take the time to discover and learn about existing processing methods.

X (ex Twitter) data collection on sexual violence

Support for a student seeking to collect data on sexual violence from social networks, in particular tweets mentioning a specific user over a given period. The mentors helped collect and process the data from X. As X's API is no longer free, a tool developed by the Digital Methods Initiative (DMI) called Zeeschuimer was used. Supervisors and students downloaded and installed Firefox and the Zeeschuimer extension. Using Zeeschuimer, several hundred tweets were collected and downloaded in .ndjson format. The .ndjson data was then converted to a .csv tabular format using Konbert. Finally, WSL, Cargo and XAN were installed to explore the data and perform lexicometric and statistical analyses. Intermediate and final results were saved in separate .csv files for further analysis by the student, who was also trained to run all XAN commands for he or she to carry out similar analyses independently.

Data collection on bots

Support for a Ph.D. student who wanted to identify bots and trolls in several social network corpuses as part of his/her thesis on RT France's audiences. During a previous MetAt (and elsewhere), He/She had collected comments from social networks (Twitter, Facebook, Telegram, YouTube, Odyssee, etc.) concerning RT France, and was looking to identify bots and trolls in these corpora. Supervisor and Ph.D. student brainstormed together on different ways of identifying bots. Information was collected via minet on users' YouTube profiles. A discussion was held about different tools for processing text (CorText) and images (Panoptic), and methods for learning Python. The Ph.D. student will test Panoptic on his/her images and try to implement the bot detection indicators mentioned.

Exploratory data analysis 

Support for a Ph.D. student wishing to be trained in atlas.ti to code and analyse his/her data. The supervisor guided the student through the installation of the R programming environment, including the package needed for some exploratory analysis of tabular data. The dataset was then formatted for geometric analysis, using multiple and joint correspondence analysis methods. Finally, a discussion was held on the exploration of tabular data in relation to other sources of information in the thesis. 

Data coding and security in sociology of health

Help provided to a Ph.D. student seeking to develop a data analysis grid in the sociology of health, as well as advice on the best way to guarantee medical confidentiality and the rights and freedoms of the patients and doctors observed and interviewed in the course of her/his research. Finally, from a methodological and theoretical point of view, he/she was seeking to explain better his/her position (as a researcher suffering from the same pathology as his/her interviewees). The Ph.D student's data anonymization procedure was verified and validated. Nevertheless, the supervisors pointed out that the data collected by the student was all the more sensitive in that it concerned medical confidentiality. In fact, the research method is based on ethnography, combining observations during medical consultations, interviews with patients and doctors (identifiable by their word and because the name of the hospital is mentioned as a case study) and questionnaires. He/She was advised to consult his university's ethics committee, a step already taken, and to avoid storing his/her data on unencrypted USB keys (or external hard drives). His/Her data should also be deleted within a reasonable time after the defense. Secondly, although a draft data analysis grid had already been drawn up, the data was presented in summary form in Excel. The supervisors therefore trained him/her in the creation of pivot tables. Finally, a number of bibliographical recommendations were made, particularly in the fields of cultural studies, studies of sub- and counter-cultures, feminist studies and interpretive anthropology on the relationship between the researcher and his or her respondents and research subject.