Ethical Scaling for Content Moderation: Extreme Speech and the (In)Significance of Artificial Intelligence

In this paper, we present new empirical evidence to demonstrate the near impossibility for existing machine learning content moderation methods to keep pace with, let alone stay ahead of, hateful language online. We diagnose the technical shortcomings of the content moderation and natural language processing approach as emerging from a broader epistemological trapping wrapped in the liberal-modern idea of the ‘human,’ and provide the details of the ambiguities and complexities of annotating text as derogatory or dangerous, in a way to demonstrate the need for persistently involving communities in the process. This decolonial perspective of content moderation and the empirical details of the technical difficulties of annotating online hateful content emphasize the need for what we describe as “ethical scaling”. We propose ethical scaling as a transparent, inclusive, reflexive and replicable process of iteration for content moderation that should evolve in conjunction with global parity in resource allocation for moderation and addressing structural issues of algorithmic amplification of divisive content. We highlight the gains and challenges of ethical scaling for AI-assisted content moderation by outlining distinct learnings from our ongoing collaborative project, AI4Dignity.


Listening to Affected Communities to Define Extreme Speech: Dataset and Experiments.

Building on current work on multilingual hate speech (e.g., Ousidhoum et al. (2019)) and hate speech reduction (e.g., Sap et al. (2020)), we present XTREMESPEECH, a new hate speech dataset containing 20,297 social media passages from Brazil, Germany, India and Kenya. The key novelty is that we directly involve the affected communities in collecting and annotating the data – as opposed to giving companies and governments control over defining and combatting hate speech. This inclusive approach results in datasets more representative of actually occurring online speech and is likely to facilitate the removal of the social media content that marginalized communities view as causing the most harm. Based on XTREMESPEECH, we establish novel tasks with accompanying baselines, provide evidence that cross-country training is generally not feasible due to cultural differences between countries and perform an interpretability analysis of BERT’s predictions.


AI4Dignity Policy Brief: Artificial Intelligence, Extreme Speech, and the Challenges of Online Content Moderation.

In this policy brief, we will outline the challenges facing AI-assisted content moderation efforts, and how the collaborative coding framework proposed by the ERC Proof-of-Concept project “AI4Dignity” offers a way to address some of the pertinent issues concerning AI deployment for content moderation.


Artificial intelligence and the cultural problem of online extreme speech.

A short foray into an AI-based platform’s effort to tackle hate speech reveals its promise, but also the enormous inherent challenges of language and context. Debunking the “magic wand” vision of AI moderation, Sahana Udupa calls for a collaborative approach between developers and critical communities.


Decoding Hate podcast

This six-episode podcast series explores the interplay between freedom of expression, hate speech and artificial intelligence (AI), hosted by Katie Pentney, a Canadian lawyer specializing in human rights. In episode 5 “Moderating Global Voices” she sits down with Sahana Udupa to talk about the contextual challenges of fighting extreme speech, the need for broader perspectives in content moderation, and her exciting AI4Dignity project.


Self-diagnosis and self-debasing: A proposal for reducing corpus-based bias in NLP

Based on our findings, Timo Schick, Sahana Udupa and Hinrich Schütze propose a decoding algorithm that reduces the probability of a model producing problematic text given only a textual description of the undesired behaviour. This algorithm does not rely on manually curated word lists, nor does it require any training data or changes to the model’s parameters. While our approach does by no means eliminate the issue of language models generating biased text, we believe it to be an important step in bringing scalability to people centric moderations.


Click here for more publications from the For Digital Dignity research group.

TV Interview Brasil: Hate speech moderation and social media 

In August 2022 Brazilian television channel TV Cultura interviewed Sahana Udupa in their program on hate speech moderation and social media. Udupa highlighted the limitations of AI-assisted content moderation and how culturally coded expressions tend to escape content filters. She discussed the findings of AI4Dignity, a European Research Council funded proof of concept project, which has created a collaborative process model for involving communities in bringing cultural and contextual nuance to machine learning models. The program is available here.


Leave a Reply