Data Engineer position for 2 years on online misinformation spread
The Sciences Po médialab is hiring a research engineer to feed with tools and data two projects on misinformation spread online (2 years position).
This position is part of two projects both focused on misinformation online: 75% of the time will be dedicated to the DE FACTO project and 25% to the SPSM project.
About the projects
The DE FACTO project, led by Dominique Cardon, is a collaboration between médialab, AFP, CLEMI and xWiki, funded by the European Commission for 3 years. The goal of the project is to develop France’s national EDMO hub, meant to observe, analyze and answer online misinformation through the collaboration of journalists, academics and educators. On its research side, the project aims at producing detailed independent analysis of the themes, actors and propagation pathways of online misinformation in France, as well as measuring its impact on the public opinion and on the agenda and regulation of news outlets.
The SPSM project is led by Achim Edelmann and funded by the McCourt Institute for 3 years. It intends to study the role of scientific and political actors in the spread of misinformation in the U.S. and France on both public and private online social networks. Public platforms will be analyzed through innovative Social Media Analysis and NLP techniques while private messaging will be tackled by implementing a set of randomized controlled experiments to test how forms of scientific and political endorsements curb or foster the sharing of misinformation.
Key missions of the position
The primary mission of this position is to develop and improve software tools, to conduct data collection operations in a large variety of digital platforms, and to participate in their analyses. The data engineer will take part in the following activities:
- aggregate, enrich and maintain a live database of fake news articles, built from a variety of curated sources such as DeFacto’s fact-check media partners, Facebook’s fact-checkers partners and other US fact-checking agencies;
- enrich and develop médialab’s open source tools catalog to allow wide tracking, harvesting and text analysis of data from social media (Facebook, Twitter, YouTube, Instagram, TikTok…) and the public web;
- track, retrieve and process social reactions (comments, shares, likes…) to fake news articles on the different public platforms;
- run NLP/ML analyses, events detection algorithms and network analyses on the articles and their reactions to evaluate, visualize and measure the circulation, endorsements and impacts of misinformation online;
- implement randomized controlled web/online experiments to test how endorsements from social institutions curb or foster the sharing of misinformation across public and private messaging platforms (WhatsApp, Telegram…);
- contribute to the preparation and redaction of academic articles and reports on misinformation spread using all of the above.
You can check examples of our existing data collection and analysis tools on our Github.
Desired skills & experience
- Python is our team’s prefered programming language, more are welcome, R as an extra would be appreciated
- Writing open source code collaboratively with git
- Experience working with web data collection (collection tools, scraping, APIs)
- Experience working with heavy datasets on remote servers
- Experience using Python and/or R NLP and ML libraries to extract themes and classify or categorize contents and their authors
- Experience with social network data and analysis and/or graph theory
- Some experience with indexing technologies such as ElasticSearch appreciated
- Fluent French and English
Minimal experience requested: 2 years
You will join the tech team at médialab SciencesPo (Paris 7) and work with researchers, software engineers, designers, and data scientists. This is a good opportunity to learn and contribute to a wide range of open source data science tools in the domains of data collection, natural language processing, machine learning and network analysis, with applications to sociopolitical analysis.
Research engineers at médialab are encouraged to participate as co-authors in the production of academic articles and to lead their own publications if they wish to do so.
This is a 24-months position, starting as early as possible depending on availability.
Competitive salary depending on experience, complemented with full health insurance, 40 days paid leave per year (+5 RTT) and restaurant-tickets. Remote work is possible 2 days per week (naturally extendable during pandemic times).
Send a CV and cover letter detailing your motivation and relevant skills for the position as well as links to code and/or scientific publications by June 1st 2022 to this email address while including “[DEFACTO-SPSM]” in the subject: email@example.com
The candidate can start as early as July 2022. However, there is some flexibility. Please indicate in your application tentative dates by which you would be available to start.