Listening to Affected Communities to Define Extreme Speech: Dataset and Experiments.

Building on current work on multilingual hate speech (e.g., Ousidhoum et al. (2019)) and hate speech reduction (e.g., Sap et al. (2020)), we present XTREMESPEECH, a new hate speech dataset containing 20,297 social media passages from Brazil, Germany, India and Kenya. The key novelty is that we directly involve the affected communities in collecting and annotating the data – as opposed to giving companies and governments control over defining and combatting hate speech. This inclusive approach results in datasets more representative of actually occurring online speech and is likely to facilitate the removal of the social media content that marginalized communities view as causing the most harm. Based on XTREMESPEECH, we establish novel tasks with accompanying baselines, provide evidence that cross-country training is generally not feasible due to cultural differences between countries and perform an interpretability analysis of BERT’s predictions.


AI4Dignity Policy Brief: Artificial Intelligence, Extreme Speech, and the Challenges of Online Content Moderation.

In this policy brief, we will outline the challenges facing AI-assisted content moderation efforts, and how the collaborative coding framework proposed by the ERC Proof-of-Concept project “AI4Dignity” offers a way to address some of the pertinent issues concerning AI deployment for content moderation.


Artificial intelligence and the cultural problem of online extreme speech.

A short foray into an AI-based platform’s effort to tackle hate speech reveals its promise, but also the enormous inherent challenges of language and context. Debunking the “magic wand” vision of AI moderation, Sahana Udupa calls for a collaborative approach between developers and critical communities.


Decoding Hate podcast

This six-episode podcast series explores the interplay between freedom of expression, hate speech and artificial intelligence (AI), hosted by Katie Pentney, a Canadian lawyer specializing in human rights. In episode 5 “Moderating Global Voices” she sits down with Sahana Udupa to talk about the contextual challenges of fighting extreme speech, the need for broader perspectives in content moderation, and her exciting AI4Dignity project.


Self-diagnosis and self-debasing: A proposal for reducing corpus-based bias in NLP

Based on our findings, Timo Schick, Sahana Udupa and Hinrich Schütze propose a decoding algorithm that reduces the probability of a model producing problematic text given only a textual description of the undesired behaviour. This algorithm does not rely on manually curated word lists, nor does it require any training data or changes to the model’s parameters. While our approach does by no means eliminate the issue of language models generating biased text, we believe it to be an important step in bringing scalability to people centric moderations.


Click here for more publications from the For Digital Dignity research group.