Beyond Fake News Detection: a Community-based Study of the Multicultural Nature of Information Disorder

Gemelli, Sara; Giulia Di Cristina,; Zhang, Yiran; Md Azizul Hoque,; Alberto De La Torre Solís,; Mohamad Mojtaba Behboudi Eshkiki,; Efimov, Nikolai; Everstova, Mariia; Caterina Maria Cappello,; Maziar Kianimoghadam Jouneghani,; Latifi, Payam; Mahboudi, Yashar; Mohseni, Farzaneh; Placenti, Dario; Caselli, Tommaso; Sanguinetti, Manuela; Scarpellini, Aurora; Zanchi, Chiara; Naseem, Usman; Marco Antonio Stranisci,; Frenda, Simona

doi:10.63317/4iyhqziwo6ri

Recognizing disinformation is a challenging task for humans and AI systems. News can be false, misleading, or harmful, and its interpretation often depends on the cultural context of the audience. However, existing datasets rarely account for these contextual and cultural differences, as they are typically not designed from the perspective of news consumers. To address this gap, in this paper, we present the Information Disorder (InDor) corpus, a multilingual dataset of news articles in English, Farsi, Italian, and Russian, annotated for information disorder detection and explanation. The corpus was developed through a participatory process involving contributors from diverse cultural and professional backgrounds, who engaged in data collection, annotation, and evaluation of Large Language Model (LLM) performance on the task. Our findings highlight that false and manipulated news manifest differently across cultural settings, and that current LLMs fail to adequately capture this complexity. This underscores the need for culturally aware computational approaches in the study of information disorder. Additional material and the InDor dataset can be found in the GitHub repo: https://github.com/citizen-dataset/InDor. WARNING: The InDor corpus may contain content that is offensive, including racist, sexist, or violent language.