OpenAI Powered Humanoid Robot, “Figure 01,” Shows More Promise Than Other Humanoid Robots.

Figure recently shared a demo of its humanoid robots engaging in real-time chat, showcasing its ability to perform tasks and household chores. Backed by big tech players like OpenAI, Microsoft, Nvidia and Jeff Bezos, the robot can talk in real time and perform ordinary tasks like a human.

‘Figure 01’ performs “speech-to-speech” reasoning using OpenAI’s pre-trained multi-modal model VLM. It relies entirely on voice communication for responses and demonstrates the ability to learn tasks by observing others.

https://www.youtube.com/@figureai

VLM stands for Vision-Language Model. It’s a type of AI model that can process and understand both visual information (like images) and textual information (like text) at the same time. This allows VLMs to perform tasks that require understanding the relationship between what’s seen and what’s said.

Recently, robotic startups have been trying to bring artificial intelligence and robots together, especially using technologies such as VLM.

Robotic startups are using VLMs (Vision-Language Models) to help robots understand the world better. These models improve how robots perceive their surroundings by analyzing visual data from sensors like cameras and LiDAR. This allows robots to recognize objects more accurately, navigate complex environments, and interact more effectively with their surroundings.

With VLMs, robots can also manipulate objects more precisely. By understanding how to interact with what they see, robots can grasp objects with greater accuracy, perform tasks more skillfully, and adapt to different situations more easily.

Another benefit of VLMs is improved collaboration between humans and robots. These models help robots understand human instructions and gestures, enabling them to work alongside humans in tasks like assembly lines or search and rescue operations.

OpenAI + humanoid robots — we’re collaborating with @Figure_robot to expand our multimodal models to robotic perception, reasoning, and interaction. https://t.co/YOk24AqYZf https://t.co/AFt8xsOmt4
— OpenAI (@OpenAI) February 29, 2024

VLMs also enhance robots’ ability to navigate safely. By understanding their environment through vision and language, robots can identify potential hazards and navigate around them more effectively.

Some examples of specific applications for robotic startups using VLMs include warehouse robots that can identify and pick items from shelves based on size, shape, and fragility, delivery robots that can understand traffic signals and navigate around pedestrians and obstacles, domestic robots that can recognize objects and clutter to clean efficiently, and healthcare robots that can assist doctors during surgery by interpreting medical imagery and responding to voice commands.