in Star Trek: The Next Generation, Captain Picard and the crew of the USS Enterprise prepare for missions and have fun utilizing the holodeck, an empty room capable of creating 3D environments, simulating everything from a lush jungle to Sherlock Holmes' London. The deeply immersive and fully interactive holodeck creation experience is infinitely customizable using only language. The crew simply asks the computer to create an environment, and that space appears on the holodeck.
Today, virtual interactive environments are also used to train robots prior to real-world deployment through a process called “Sim2Real.” But virtual interactive environments have been in surprisingly short supply. “Artists create these environments manually,” says Mark Yatskar, assistant professor and associate professor of computer and information science (CIS), and Yue Yang, a doctoral student in the lab of Chris Callison-Burch. “It can take those artists a week to build a single environment,” Yang added, noting all the decisions involved, from spatial layout to object placement to the colors used in rendering.
The lack of a virtual environment is a problem when trying to train robots to navigate the real world in all its complexity. Neural networks, the systems powering today's AI revolution, require enormous amounts of data, which in this case means simulations of the physical world. “Generative AI systems like ChatGPT are trained on trillions of words, and image generators like Midjourney and DALLE are trained on billions of images,” Callison-Burch says. “We only have a fraction of the 3D environments to train so-called ‘embodied AI.’ Using generative AI techniques to develop robots that can safely navigate real-world environments requires creating millions or even billions of simulated environments. .”
We introduce Holodeck, an interactive 3D environment creation system co-created by Callison-Burch, Yatskar, Yang, and Lingjie Liu (CIS Assistant Professor Aravind K. Joshi) and collaborators at Stanford, Washington University, and the Allen Institute. Artificial Intelligence (AI2). It is named after it. star trek The first Holodeck uses AI to interpret user requests to create virtually unlimited indoor environments. “We can use language to control it,” Yang says. “You can easily describe the desired environment and train the implemented AI agent.”
Holodeck leverages knowledge embedded in large language models (LLMs), ChatGPT-based systems, and other chatbots. “Language is a very concise representation of the entire world,” Yang says. In fact, LLMs have been shown to have a surprisingly high level of knowledge about spatial design, thanks to the vast amounts of text they collect during their training. Essentially, Holodeck works by engaging the LLM in a conversation using a series of carefully structured hidden queries to categorize user requests by specific parameters.
Just as Captain Picard asked Star Trek's Holodeck to simulate a speakeasy, researchers could ask Penn's Holodeck to create “a 1b1b apartment for a researcher with a cat.” The system executes this query in several steps. First create the floor and walls, then the doorways and windows. Next, Holodeck searches the Objaverse, a vast library of pre-made digital objects, to discover the types of furniture you can expect in your space, such as coffee tables and cat towers. Finally, the Holodeck queries a layout module that the researchers designed to constrain the placement of objects so that toilets do not extend horizontally from the wall.
To evaluate Holodeck's capabilities in terms of realism and accuracy, the researchers used both Holodeck and ProcTHOR, a previous tool created by AI2, to generate 120 scenes and had hundreds of Penn Engineering students prefer them without knowing which scenes were generated. You have asked to indicate which version. With what tools. For all criteria, including asset selection, layout consistency, and overall preference, students consistently rated environments created on Holodeck more favorably.
The researchers also tested the Holodeck's ability to generate scenes that are less common in robotics research and more difficult to create manually than inside apartments, such as stores, public spaces and offices. Comparing Holodeck's output with ProcTHOR's output, which was generated using human-created rules rather than AI-generated text, the researchers once again found that human evaluators preferred scenes created by Holodeck. This preference held across a wide range of indoor environments, from science labs to art studios, locker rooms, and wine cellars.
Finally, the researchers used the scenes generated by the Holodeck to “fine-tune” the embedded AI agent. “The ultimate test of Holodeck will be whether we can use it to help robots interact more safely with their environment by preparing them to inhabit places they have never been before,” Yatskar says.
Across a variety of types of virtual spaces, including offices, daycare centers, gyms, and arcades, Holodeck has had a marked and positive impact on agents' ability to navigate new spaces.
For example, if the agent was pre-trained using ProcTHOR (the agent performed about 400 million virtual steps), the agent successfully discovered the piano in the music room only about 6% of the time, whereas in the 100+ cases generated by the Holodeck, the agent successfully discovered the piano in the music room. Fine-tuned using the dog's music room.
Professor Yang said, “Research on residential space has been stagnant in this field for a long time.” “But there are so many different environments out there. Efficiently creating many environments to train robots has always been a huge challenge, but Holodeck provides this capability.”