Personalized prediction and machine learning methods in tools for web computation

Dominique Cardon

Publications – Communication

septembre 2017

Cardon, Dominique. 2017. "Personalized prediction and machine learning methods in tools for web computation." Paper presented at the International Conference “Governing by Prediction ? Models, data and algorithms in and for governance”, Université Paris Est, FRANCE, September 11-13. ⤤

One of the main characteristics of the modes of computation known as big data concern the generalization of machnie learning methods. They offer to calculate the society in a way that do not match the requirements of centrality, univocity and generality of statistical methods, plotting individuals around a statistical mean. Techniques of personalized prediction do away with any form of totalization in the representation of society. They break with traditional statistical methods. They do not aim to produce a central measure embracing all individual situations, but to measure from every situation a personalized totality. They do not manufacture a univocal representation, but a modular one, that varies depending on individual positions. They do not aim to produce a shared generality overcoming all statistical individuals, mais they aim for more localized truths . They claim for themselves a new capacity to produce personalized predictions. As the historiography of statistics has shown, the deployment of these vast apparatuses to quantify societies emerged hand in hand with the probabilistic understanding that if social phenomena were not ruled deterministically, it was nevertheless possible to interpret society based on observable regularities . As Ian Hacking has shown, the development of statistics can not be disconnected from the rise of democratic, liberal societies, whereby individual freedom and autonomy is compensated, for institutions seeking to govern societies, by the production of objectively derived regularities . The probabilistic paradigm thus replaced natural laws and their inherent causalism, offering a technique of uncertainty reduction which, by the end of the vast program of classification and categorization of populations, produced an image of the distribution and regularities of more or less statistically normal behaviours. The vast enterprise of recording, quantification and measure of society that unrolled in the XIXth century, thus allowed establishing the credibility of social statistics, and overall trust in numbers. Practically speaking, the investment in and maintenance of a codified system of regular recording, and epistemologically speaking, the distribution of statistical occurrences around mean values, the frequentist method in social statistics thus contributed to make “constant causes” more robust. Embedded in institutional and technical apparatuses, they acquired a kind of exteriority. They became the trusted basis on which one could establish correlations about nearly any social phenomenon, and infer causes too. Now, it is a different method and model, that of the probability of causes, that so-called Bayesian techniques, long marginalized in the history of statistical methods, offer to re-open, making it possible again that “accidental” rather than “constant” causes, become the basis of new sorts of statistical inferences. To broadly characterise the historical turn that is occurring with the advent of this now mode of statistical reasoning, one could say that it replaces the normal distribution by the “empty matrix” . In digitized environments, the proliferation of data recordings lead to a massive increase in the number of variables that are available for computation. Even if matrices within which those variables are computed remain empty, calculations continue to follow the notion that, in certain contexts, rare and improbable variables may have some effect on some correlations. This paradigm thus revives inductive techniques of data analysis, and avoids engaging in the reduction and stabilization of the space of relevant variables. Causes thus become inconstant, and get combined by the computer in changing ways, depending on the local objectives imposed by the various users that seek to predict their environment. This shift towards personalized prediction implies that the causes of individual behaviors become much more uncertain. The recording of multiple, disparate behaviours may, in certain circumstances, depending on the context, produce a causality that is sufficient to explain, in a relevant manner, the acts of individuals.