An In-depth Analysis of the Linguistic Characteristics of Science Claims on the Web and their Impact on Fact-checking
Salim Hafid, Sebastian Schellhammer, Yavuz Selim Kartal, Thomas Papastergiou, Stefan Dietze, Sandra Bringay, Konstantin Todorov
Web claims, seen as assertions shared on the web and eligible for fact-checking, are at the heart of online discourse. They have been studied extensively on a variety of downstream tasks such as fact-checking, claim retrieval, bias detection, argument mining or viewpoint discovery. On the other hand, claims originating from scientific publications have also been the subject of several downstream NLP tasks. However, research carried out so far has yet to focus on scientific web claims, which are scientific claims made on the web (e.g., on social media and news articles). The process of detecting and fact-checking a claim from the web can be very different depending on whether the claim is scientific or not, thus making it crucial for the developed datasets, methods, and models to make a distinction between the two. With this work, we aim at understanding what makes this distinction necessary, by understanding the linguistic differences between scientific and non-scientific claims on the web, and the impact those differences have on existing downstream tasks. To do so, we manually annotate 1,524 web claims from established benchmarks for fact-checking-related tasks, and we run statistical tests to analyze and compare linguistic features of each group. We find that scientific claims on the web use more analytical speech, but also use more sentiment-related speech, more expressions of physical motion, and have distinct parts of speech (PoS) and punctuation styles. We also conduct experiments showing that BERT-based language models perform worse on scientific web claims by up to 17 F1 points for several downstream tasks. To understand why, we develop a novel methodology to map predictive tokens of language models to explainable linguistic features and find that language models fail to detect a specific subset of predictive features of scientific web claims. We conclude by stating that language models aimed at studying scientific web claims ought to be trained on scientific web discourse, as opposed to being trained only on generic web discourse or only on scientific text from scientific publications.