Chinese model achieved a performance of 96.0 in 50 tasks, surpassed 95.0 in random environments, learns from videos, understands human commands, attempts to correct failures during real actions, and targets use in industrial, commercial, and domestic environments
China has introduced an advancement that could change the way robots learn and work. The Motubrain is an artificial intelligence model created to function as a single brain for robots, integrating vision, language, and action into the same system.
The report was published by Interesting Engineering, a news site about engineering and technology. The technology was developed by ShengShu Technology and seeks to replace separate systems with a single structure capable of perceiving the environment, understanding orders, and acting.
The practical impact lies in the possibility of robots performing longer and more flexible tasks. The AI model for robots has already been presented with a performance of 63.77 in WorldArena, an average of 96.0 in 50 tasks in RoboTwin 2.0, and the ability to execute up to 10 atomic actions in sequence.
-
More than 200 robots are already descending up to 6,000 meters to track the hidden heat in the deep ocean, but scientists warn that 1,200 floats would be needed to observe the complete change in one of the planet’s least monitored regions.
-
As if the Siberian permafrost were a carbon bomb with its trigger hidden underwater, scientists discover that rivers and lakes are opening thawing pathways within the frozen ground and can release ancient methane from layers that climate models still cannot accurately see.
-
The long Labor Day weekend starts with 30°C and ends near 0°C in the Santa Catarina mountains, and anyone who doesn’t check the forecast before packing may struggle on their trip.
-
A city in the Serra Catarinense has been without sufficient rain for months, and the R$ 21 million loss has already devastated soybean, corn, and bean crops, while residents are left without water.
Motubrain functions as a general brain for robots
Thus, the Motubrain was created to combine various functions into a single intelligence. Instead of using one system to see, another to plan, and another to move, the robot now works with an integrated structure, a brain.
This means that the machine can observe the environment, understand an instruction, and choose an action without switching programs at each stage. This union is what makes the model important for robotics with artificial intelligence.
The proposal also seeks to reduce the dependency on brain systems designed for a single task. Many robots work well in repeated situations but struggle when the scenario changes. The Motubrain tries to improve this adaptation.
For companies, this could pave the way for more useful robots in factories, commerce, and homes. The advancement still depends on tests and real application, but it points to machines less limited by rigid commands.
Model learns from videos, commands, and actions simultaneously
The AI model for robots learns from three types of information: video, language, and action. The video helps the system see patterns. The language allows understanding commands. The action shows how the robot should move.
In practice, the system learns by observing scenes, receiving instructions, and analyzing movements. This combination helps the robot create a broader notion of what is happening around it.
The Motubrain also uses unlabeled videos, simulation data, and recordings of tasks performed by various robots. Unlabeled videos are images without manual markings made by people.
This strategy reduces the need for someone to explain every detail to the machine. The system tries to recognize patterns of movement and behavior from the available data.
Tests show 63.77 in WorldArena and 96.0 in 50 tasks
The performance of Motubrain drew attention in evaluations used to measure robots and artificial intelligence models. The system achieved 63.77 in WorldArena and an average of 96.0 in 50 tasks in RoboTwin 2.0.
The model was also presented as the only one to surpass 95.0 in random environments. This point is important because random environments are more challenging. In them, the robot needs to deal with changes and less predictable situations.
Interesting Engineering, a news site about engineering and technology, brought the numbers and the key points of the advancement. The publication also highlighted the project’s connection with ShengShu Technology’s previous experience in generative video, through the platform Vidu.
Generative video is a technology related to the creation and prediction of scenes in video. In Motubrain, this foundation helps the system understand how objects, spaces, and actions can change over time.
Robot can perform up to 10 steps in a single sequence
One of Motubrain‘s strongest points is its ability to execute multi-phase tasks. The system can perform up to 10 atomic actions in a single sequence.
An atomic action is a simple step within a larger task. Picking up an object, moving a piece, or dropping something in another place are examples of this type of action.
Many current robotic systems usually handle only 2 or 3 actions in sequence. Therefore, reaching 10 steps represents a significant leap for more complex tasks.
This capability can bring robots closer to real-world activities. In environments such as factories, stores, and homes, a task rarely depends on just one simple movement.
The AI brain tries to repeat the task when something goes wrong
Motubrain also showed the ability to react during execution. In practical tests, when an attempt failed in the middle of a task, the brain system was able to recognize the problem and try again.
One example involves the act of picking up an object. If the first attempt failed, the robot could adjust the action and repeat the movement without having received specific training for that error.
This point is important because the real world is full of unforeseen events. Objects change places, surfaces hinder movements, and simple tasks can fail due to small details.
Jun Zhu, founder of ShengShu Technology, summarized the project’s idea with the phrase: “A true world model must then be able to build a unified representation of the real world and predict how it evolves.”
Robotics companies are already on Motubrain’s path
ShengShu Technology states that Motubrain is already being used by robotics companies in active training programs. The environments cited include industrial, commercial, and domestic areas.

The partnerships involve companies such as Astribot, SimpleAI, and Anyverse Dynamics. The intention is to expand the model’s presence in different uses of robotics.
The project also received significant financial support. ShengShu secured a Series B round of US$ 293 million led by Alibaba Cloud.
This amount strengthens the bet on embedded artificial intelligence systems. This type of AI functions within physical machines, such as robots, and not just on screens or applications.
Unified architecture attempts to replace robots full of separate parts
Motubrain’s proposal is to replace the logic of separate modules with a single brain system. The architecture uses three flows to integrate different information, such as image, language, and movement.
In simple terms, these three flows function as paths through which the robot interprets what it sees, what it receives as a command, and what it needs to do.
The company also argues that more advanced robots need to unite perception, reasoning, prediction, generation, and action in a single structure. The statement reinforces this vision: “We believe that general world models should not be built as stitched modules, but as a unified architecture that brings together perception, reasoning, prediction, generation, and action into a single system.”
This path can make robots more prepared for varied tasks. Still, large-scale adoption depends on safety, cost, integration with existing machines, and results outside of tests.
Motubrain shows a new phase of robotics with artificial intelligence
Motubrain puts China in the spotlight in the race for more flexible robots. The model combines vision, language, and action, achieves 96.0 across 50 tasks, surpasses 95.0 in random environments, and executes up to 10 steps in sequence.
The promise is not just to create robots that, therefore, obey orders. The goal is to bring machines closer to real tasks, with more adaptation, more movement sequences, and greater ability to correct failures.
This advancement could change the relationship between robots and work in factories, commerce, and homes. But what about you, would you trust a robot with this type of intelligence to help with daily tasks, or do you think this technology still needs to mature a lot?

Be the first to react!