Oregon State University doctoral students and Adobe researchers have developed a new, cost-effective training technology for artificial intelligence systems that aims to reduce social bias.
Eric Slyman of OSU's College of Engineering and Adobe researchers call the new method FairDeDup (short for fair deduplication). Deduplication means removing redundant information from the data used to train AI systems, thereby reducing the high computational cost of training.
Researchers say data sets collected from the internet often contain biases that exist in society. When these biases are systematized into trained AI models, they can help perpetuate unfair ideas and behaviors.
Understanding how deduplication affects the prevalence of bias can help mitigate negative effects, such as when an AI system is asked to show pictures of CEOs, doctors, etc., even though the intended use case is to show diverse people, it may automatically show only pictures of white men.
“We named it FairDeDup as a pun on SemDeDup, an early, cost-effective method that we improved upon for fairness,” said Slyman. “Previous research has shown that removing this redundant data allows for accurate AI training with fewer resources, but we found that this process can also exacerbate the harmful social biases that AI often learns.”
Slimane presented the FairDeDup algorithm at the IEEE/CVF Conference on Computer Vision and Pattern Recognition in Seattle last week.
FairDeDup works by thinning the dataset of image captions collected from the web through a process called pruning. Pruning means selecting a subset of data that is representative of the entire dataset, and when done in a content-aware manner, pruning allows for informed decisions about which parts of the data to keep and which to remove.
“FairDeDup removes redundant data while incorporating a controllable, human-defined dimension of diversity to mitigate bias,” Slyman said. “Our approach is not only cost-effective and accurate, but also enables fairer AI training.”
In addition to occupation, race, and gender, other biases perpetuated during training may include those related to age, geography, and culture.
“By addressing bias during dataset cleaning, we can create more socially just AI systems.” Sliman said. “Our work does not force AI to follow our notions of fairness, but rather creates pathways that guide AI to behave fairly when contextualized within some of the settings and user bases in which it is deployed. “It lets you define what is fair in your own setting instead of having other large datasets decide for you.”
Slimane's collaborators include Stephen Lee, an assistant professor in OSU's College of Engineering, and Scott Cohen and Kushal Kappel of Adobe.