Google says Gemini AI is making its robots smarter

Google is training its robots with Gemini AI so they can get better at navigation and completing tasks. The DeepMind robotics team explained in a new research paper how using Gemini 1.5 Pro’s long context window — which dictates how much information an AI model can process — allows users to more easily interact with its RT-2 robots using natural language instructions.

This works by filming a video tour of a designated area, such as a home or office space, with researchers using Gemini 1.5 Pro to make the robot “watch” the video to learn about the environment. The robot can then undertake commands based on what it has observed using verbal and / or image outputs — such as guiding users to a power outlet after being shown a phone and asked “where can I charge this?” DeepMind says its Gemini-powered robot had a 90 percent success rate across over 50 user instructions that were given in a 9,000-plus-square-foot operating area.

Researchers also found “preliminary evidence” that Gemini 1.5 Pro enabled its droids to plan how to fulfill instructions beyond just navigation. For example, when a user with lots of Coke cans on their desk asks the droid if their favorite drink is available, the team said Gemini “knows that the robot should navigate to the fridge, inspect if there are Cokes, and then return to the user to report the result.” DeepMind says it plans to investigate these results further.

The video demonstrations provided by Google are impressive, though the obvious cuts after the droid acknowledges each request hide that it takes between 10–30 seconds to process these instructions, according to the research paper. It may take some time before we’re sharing our homes with more advanced environment-mapping robots, but at least these ones might be able to find our missing keys or wallets.