Research from Brown University combines language, human gestures, and computer vision to improve object search by robots, with 89% average success in simulations and inspiration from how dogs interpret pointing, looks, and intentions in interactions with people.
Robots capable of locating objects through language, gestures, and vision achieved 89% average success in simulations at Brown University, in a study accepted for HRI 2026, scheduled for March in Edinburgh.
Robots learn from dogs to interpret human commands
The advancement addresses a common difficulty in the domestic and professional use of machines: understanding incomplete requests. For a person, asking for a key, a cup, or a tool seems simple. For a robotic system, the task involves ambiguity, movement, similar objects, and imperfect clues.
The team at Brown University developed the LEGS-POMDP, a system that combines language, human pointing, and visual observation. The inspiration came from research at the Brown Dog Lab on how dogs interpret gestures and looks, especially when humans point to something.
-
Winter 2026 already has a start date in Brazil, but the advance of El Niño could completely change the expected climate pattern with more rain, less intense cold, and a reduced risk of snow in the South of the country.
-
In Germany, engineers drill kilometers of rock to set up a giant underground radiator that draws the hottest heat ever achieved in a geothermal well.
-
Pancreatic cancer pill surprises oncologists by doubling survival in phase 3 study and turning scientific data into a rare scene of emotion
-
New Fiat EV, priced at R$ 77,000, will bring a reinterpretation of the 147 and a consumption equivalent to 70 km/l.
The proposal does not treat the gesture as an exact line. The pointing is modeled as a probability cone, closer to real human behavior. Thus, the robot estimates a probable target area, rather than assuming the finger indicates a perfectly precise direction.
This detail is central because people rarely communicate like technical manuals. They speak in an abbreviated manner, point approximately, change positions, and may partially hide the object they desire. The system attempts to transform this unstable scenario into calculated decisions.
How the system decides where to search
The name LEGS-POMDP refers to a probabilistic structure based on a partially observable Markov decision process. In practice, it helps the machine act when it does not have all the necessary information about the environment, the object, or human intention.
Instead of deciding too quickly, the system maintains hypotheses about the identity and location of the sought item. These hypotheses are updated as new clues appear, including verbal description, gesture direction, and visual reading of the scene.
The combination allows the robot to better explore the space before concluding the search. It can adjust the viewpoint, review a possibility, and delay the final choice until gathering stronger evidence about where the correct object is.
In the experiments, multimodal integration outperformed approaches based solely on language or gestures. The result reinforces the idea that human communication depends on the sum of signals, not a single isolated instruction.
Tests indicate progress, but still with limits
An average rate of 89% was recorded in simulations described as demanding. The team also conducted tests with a real quadruped robot, used as qualitative validation of the approach. The research will be presented at HRI 2026, from March 16 to 19, 2026.
The use of a vision-language model expands the system’s ability to interpret scenes. Thus, the machine can relate verbal descriptions, spatial constraints, and visible objects, even when there is disorganization, similarity between items, or obstacles in the way.
The suggested applications involve everyday and industrial environments. In a home, robots could search for medications on a cluttered counter or find glasses among scattered items. In a workshop, they could retrieve parts and tools without excessively precise commands.
Even so, the results do not mean that fully intuitive mechanical assistants are already available. The 89% figure comes from simulations, while physical tests indicate robustness but do not eliminate the challenges of real, varied, and unpredictable environments.
The progress helps bring laboratories closer to everyday situations, where simple requests always carry noise, pauses, and inaccuracies.
The main advancement is in the way of dealing with uncertainty. By observing dogs, human gestures, and natural language, robotics gains a path to create machines less dependent on rigid commands and more capable of interpreting intentions in context.
Click here to check the study.

-
1 person reacted to this.