OpenAI-enabled robotics company 1X has released a video showing several wheeled service robots operated by a voice-controlled natural language interface that seamlessly move from one simple task to another while organizing office space.
Halodi Robotics was founded in 2014 to develop general-purpose robots that can work alongside humans in the workplace. Originally based in Norway, the company set up a second base of operations in California in 2019, when we first spotted a pre-production prototype of a wheeled humanoid named Eve.
Halodi became 1X and partnered with OpenAI in 2022 to “combine robotics and AI and lay the foundation for embodied learning.” Although the company has bipeds in the pipe as well as human-like hands, much of the current development focus seems to be training Eve to be useful in the workplace. Physical space allows her to get real work done at work and in the world at large.”
1X reports that a natural language interface has now been created that allows operators to control multiple humanoids using voice commands. Robotic assistants complete complex tasks by linking multiple learned movements.
Voice commands and connected actions | 1X AI Update
Last March, the company started helping people remove items from shopping bags, decide where to place them, wipe up spills, and fold shirts.
1X pointed out that within a relatively small multi-task model, improving the behavior of a single task can negatively impact the behavior of other tasks within that model. You can solve this problem by increasing the number of parameters, but this increases training time and slows down development.
Instead, building a mix of voice-controlled natural language interfaces allows operators to “connect short horizontal functions into longer models across multiple smaller models.” These single task models can be merged into target condition models as development progresses into a unified model with the ultimate goal of automating high-level tasks using AI.
“Commanding robots with these high-level language interfaces provides a new user experience for data collection,” the company’s Eric Zhang said in a blog post. “Instead of using VR to control a single robot, operators can instruct multiple robots in a high-level language and have low-level policies execute low-level actions to realize these higher-level goals. High-level actions are frequently transferred , the operator can also control the robot remotely.”
1X states that the Eve humanoid in the video above is not remotely controlled and that all movements are controlled by a neural network. There are no computer-generated graphics or “cuts, video speed-ups or scripted trajectory playback.” The next step is to integrate vision language models such as GPT-4o, VILA, and Gemini Vision into the system.
Source: 1X