In this policy brief, we will outline the challenges facing AI-assisted content moderation efforts, and how the collaborative coding framework proposed by the ERC Proof-of-Concept project “AI4Dignity” offers a way to address some of the pertinent issues concerning AI deployment for content moderation.
A short foray into an AI-based platform’s effort to tackle hate speech reveals its promise, but also the enormous inherent challenges of language and context. Debunking the “magic wand” vision of AI moderation, Sahana Udupa calls for a collaborative approach between developers and critical communities.
This six-episode podcast series explores the interplay between freedom of expression, hate speech and artificial intelligence (AI), hosted by Katie Pentney, a Canadian lawyer specializing in human rights. In episode 5 “Moderating Global Voices” she sits down with Sahana Udupa to talk about the contextual challenges of fighting extreme speech, the need for broader perspectives in content moderation, and her exciting AI4Dignity project.
Based on our findings, Timo Schick, Sahana Udupa and Hinrich Schütze propose a decoding algorithm that reduces the probability of a model producing problematic text given only a textual description of the undesired behaviour. This algorithm does not rely on manually curated word lists, nor does it require any training data or changes to the model’s parameters. While our approach does by no means eliminate the issue of language models generating biased text, we believe it to be an important step in bringing scalability to people centric moderations.