In an increasingly online world, content moderation in social media has become immensely important. However, existing hate speech detection systems are riddled with racial biases introduced during annotation, which are reinforced and propagated by models trained on such data. In this talk, I will first present the inadequacies of current methods for debiasing hate speech detection. I will show how the subjectivity of this task design leads to debiasing failures. Next, I will focus on uncovering the origin of bias in toxic language detection. I will demonstrate how annotators’ demographics and beliefs influence their toxicity ratings, and how ignoring such societal context can lead to biased outcomes. Overall, I will argue for the value of rethinking traditional the hate speech classification task, and the need for richer context in hate speech datasets.
Swabha Swayamdipta is an Assistant Professor of Computer Science and a Gabilan Assistant Professor at the University of Southern California. Her research interests are in natural language processing and machine learning, with a primary interest in the estimation of dataset quality, the semi-automatic collection of impactful data, and evaluating how human biases affect dataset construction and model decisions. At USC, Swabha leads the Data, Interpretability, Language and Learning (DILL) Lab. She received her PhD from Carnegie Mellon University, and was then a postdoc at the Allen Institute for AI. Her work has received outstanding paper awards at ICML 2022 and NeurIPS 2021 as well as an honorable mention for the best overall paper at ACL 2020.