La cartographie des traces textuelles comme méthodologie d’enquête en sciences sociales
Jean-Philippe Cointet
Publications – Thesis
This manuscript is situated in the trading zone where big data and social sciences meet. In this instance, “big data” refers to two inter-twined transformations. One transformation is the wealth of digital traces, often produced online, that allow us to trace individual behaviors at scales and resolutions never seen before. The other transformation is the blossoming of news analytical tools inspired by machine learning. We will focus on a very particular kind of data analysis in social sciences, namely automatic content analysis. This research report starts with a retrospective look at the history of content analysis methods for social sciences. The matrix operations of factorial methods are detailed, sociological assumptions underlying co-word analysis are discussed, the practice of sociological investigation with Prospero is described, and so on. A general typology is introduced to distinguish these approaches in terms of the sociological theories that they build on such as their strategies for modeling the enunciation, and modes of calculation and intelligibility of the social they open. Using the same grid, the more recent approaches of artificial intelligence and computer sciences are analyzed : in particular, topic modeling and word embedding. In the second chapter, network mapping is defended as a method in its own right, and systematically compared with the other approaches. The last chapter is an opportunity to examine how digital traces produced online is likely to change the way empirical investigation of textual corpora is lead by social scientists. How the notions of speakers, enunciation and more broadly the very epistemology of social science practice is shifted with the advent digital traces ? Mixing historical critical analysis and methodological description, this original manuscript is also populated with numerous references to empirical projects carried out during the last eight years, that illustrate the diversity of the practice of corpus analysis.