University of Waterloo researchers have developed a new machine learning method that can detect hate speech on social media platforms with 88% accuracy, saving employees hundreds of hours of emotionally damaging work.
This method, called the Multi-Modal Discussion Transformer (mDT), unlike previous hate speech detection methods, can not only understand the relationship between text and images, but also place comments in a larger context. This is especially useful for reducing false positives due to culturally sensitive language, which are often incorrectly flagged as hate speech.
“We really hope that this technology will help reduce the emotional cost of manually screening humans for hate speech,” said Liam Hebert, a Waterloo computer science PhD student and first author of the study. “We believe that taking a community-centric approach to AI applications can help create safer online spaces for everyone.”
Researchers have been building models to analyze the meaning of human conversation for years, but historically these models have struggled to understand nuanced conversations or contextual statements. Previous models were able to identify hate speech with an accuracy of 74%, lower than the accuracy they were able to achieve in the Waterloo study.
“Context is very important when it comes to understanding hate speech,” Hebert said. “For example, a comment like ‘That’s disgusting!’ may be harmless in itself, but when it’s a reaction to a photo of a person from a marginalized group and a pizza with pineapple, the meaning changes dramatically.
“Understanding these distinctions is easy for humans, but training a model to understand the contextual connections of the discussion, including taking into account images and other multimedia elements within them, is actually a very difficult problem.”
Unlike previous efforts, the Waterloo team built and trained their model based on a dataset comprised of individual hate comments as well as the context of those comments. The model was trained on 18,359 labeled comments and 8,266 Reddit discussions from 850 communities.
“More than 3 billion people use social media every day,” Hebert said. “The influence of these social media platforms has reached unprecedented levels, and there is a great need to detect hate speech at scale to build a respectful and safe space for everyone.”
The study, Multimodal Discussion Converter: Integrating Text, Image, and Graph Converters for Detecting Hate Speech in Social Media, was recently presented in the proceedings of the 38th AAAI Conference on Artificial Intelligence.