A new way to teach artificial intelligence (AI) to understand human line drawings, even for non-artists, has been developed by a team from the University of Surrey and Stanford University.
The new model approaches human-level performance in recognizing scene sketches.
Dr. Yulia Gryaditskaya, lecturer at Surrey's Center for Vision, Speech and Signal Processing (CVSSP) and Surrey Institute for People-Centred AI (PAI), said:
“Sketching is a powerful language of visual communication – sometimes even more expressive and flexible than spoken language.
“Developing tools to understand sketches is a step toward more powerful human-computer interaction and more efficient design workflows. For example, you can sketch something to retrieve or generate an image.”
People of all ages and backgrounds use drawings to explore and communicate new ideas. However, AI systems have historically had difficulty understanding sketches.
We need to teach AI how to understand images. Typically, this involves a labor-intensive process of collecting labels for every pixel in an image. The AI then learns from these labels.
Instead, the team taught the AI using a combination of sketches and written explanations. It learned how to group pixels and match them to one of the categories in the description.
As a result, AI demonstrated a much richer, more human-like understanding of these pictures than previous approaches. Kites, trees, giraffes, and other objects were correctly identified and labeled with 85% accuracy. This outperformed other models that relied on labeled pixels.
Not only can you identify objects in complex scenes, but you can also identify which pen strokes were used to depict each object. The new method works well for informal sketches made by non-artists, as well as drawings of objects for which they were not explicitly trained.
Professor Judith Fan, Assistant Professor of Psychology at Stanford University, said:
“Drawing and writing are among the most quintessentially human activities and have long been useful in capturing people’s observations and ideas.
“This work represents exciting progress toward AI systems that understand the essence of the ideas people are trying to convey, regardless of whether they use photos or text.”
This research forms part of Surrey’s People-Centric AI Research Institute, specifically its SketchX programme. SketchX uses AI to try to understand how we see the world through the way we draw it.
Professor Song I-je, co-director of the Human-Centered AI Research Center and leader of SketchX, said:
“This research is a prime example of how AI can improve basic human activities like sketching. By understanding rough drawings with near-human accuracy, this technology has the potential to enhance people’s natural creativity, regardless of their artistic ability. “It has tremendous potential.”
The findings will be presented at the 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition. It will be held in Seattle from June 17-21, 2024.