He then asked me to read a hypothetical YouTuber's script in varying tones, dictating the spectrum of emotions I should convey. It should be read first in a neutral and informative way, then in an encouraging way, then in an annoying and complaining way, and finally in an exciting and persuasive way.
“hello everyone. Welcome back. elevate her With host Jess Mars. I'm so glad you're here. We try to cover topics that hit home very subtly and honestly. It’s about dealing with criticism in our spiritual journey.” I tried to visualize myself yelling about something to my partner in a version of me reading off a teleprompter and complaining at the same time. . “Don’t you always seem to hear critical voices wherever you look?”
Don't be trash, don't be trash, don't be trash.
“It was really nice. When I saw that, I thought, ‘Ah, this is true.’ 'She's definitely complaining,' Oshinyemi said encouragingly. Next time, he suggests, you might add some judgment.
We shoot multiple takes with different variations of the script. Some versions allow you to move your hands. Other times, Oshinyemi asks you to hold a metal pin between your fingers, like I do. Oshinyemi says this is about testing the “edge” of one's technical capabilities when it comes to communicating with the hands.
David Barber, a professor of machine learning at University College London who was not involved in Synthesia's research, says that historically, making AI avatars look natural and match mouth movements to speech has been a very difficult challenge. Because the problem is more than just the movement of the mouth. You need to think about your eyebrows, every muscle in your face, your shoulder shrug, and the many different small movements that humans use to express themselves.
Synthesia has been working with actors to train models since 2020, and their doubles make up 225 base avatars that customers can animate with their own scripts. But to train the latest generation of avatars, Synthesia needed more data. Over the past year I have worked with around 1,000 professional actors in London and New York. (Synthesia says it discloses some of the data it collects for academic research purposes but does not sell it.)
Previously, actors were paid each time they used their avatars, but now companies pay an upfront fee to train AI models. Synthesia uses avatars for three years, at which point actors are asked if they want to renew their contracts. Then they come to the studio to create a new avatar. If not, the company will delete the data. Synthesia's enterprise customers can also send someone into the studio to create their own custom avatar to do much of what I'm doing.