China creates a unique brain for robots that combines vision, language, and action, executes up to ten steps by itself, and promises to change factories, commerce, and homes.

Written by Flavia Marinho

Published on 30/04/2026 at 17:58

Be the first to react!

Chinese model achieved a performance of 96.0 in 50 tasks, surpassed 95.0 in random environments, learns from videos, understands human commands, attempts to correct failures during real actions, and targets use in industrial, commercial, and domestic environments

China has introduced an advancement that could change the way robots learn and work. The Motubrain is an artificial intelligence model created to function as a single brain for robots, integrating vision, language, and action into the same system.

The report was published by Interesting Engineering, a news site about engineering and technology. The technology was developed by ShengShu Technology and seeks to replace separate systems with a single structure capable of perceiving the environment, understanding orders, and acting.

The practical impact lies in the possibility of robots performing longer and more flexible tasks. The AI model for robots has already been presented with a performance of 63.77 in WorldArena, an average of 96.0 in 50 tasks in RoboTwin 2.0, and the ability to execute up to 10 atomic actions in sequence.

ARTICLE CONTINUES BELOW

Motubrain functions as a general brain for robots

Thus, the Motubrain was created to combine various functions into a single intelligence. Instead of using one system to see, another to plan, and another to move, the robot now works with an integrated structure, a brain.

Watch the video

This means that the machine can observe the environment, understand an instruction, and choose an action without switching programs at each stage. This union is what makes the model important for robotics with artificial intelligence.

The proposal also seeks to reduce the dependency on brain systems designed for a single task. Many robots work well in repeated situations but struggle when the scenario changes. The Motubrain tries to improve this adaptation.

For companies, this could pave the way for more useful robots in factories, commerce, and homes. The advancement still depends on tests and real application, but it points to machines less limited by rigid commands.

Model learns from videos, commands, and actions simultaneously

The AI model for robots learns from three types of information: video, language, and action. The video helps the system see patterns. The language allows understanding commands. The action shows how the robot should move.

In practice, the system learns by observing scenes, receiving instructions, and analyzing movements. This combination helps the robot create a broader notion of what is happening around it.

The Motubrain also uses unlabeled videos, simulation data, and recordings of tasks performed by various robots. Unlabeled videos are images without manual markings made by people.

This strategy reduces the need for someone to explain every detail to the machine. The system tries to recognize patterns of movement and behavior from the available data.

Tests show 63.77 in WorldArena and 96.0 in 50 tasks

The performance of Motubrain drew attention in evaluations used to measure robots and artificial intelligence models. The system achieved 63.77 in WorldArena and an average of 96.0 in 50 tasks in RoboTwin 2.0.

The model was also presented as the only one to surpass 95.0 in random environments. This point is important because random environments are more challenging. In them, the robot needs to deal with changes and less predictable situations.

Interesting Engineering, a news site about engineering and technology, brought the numbers and the key points of the advancement. The publication also highlighted the project’s connection with ShengShu Technology’s previous experience in generative video, through the platform Vidu.

Generative video is a technology related to the creation and prediction of scenes in video. In Motubrain, this foundation helps the system understand how objects, spaces, and actions can change over time.

Robot can perform up to 10 steps in a single sequence

One of Motubrain‘s strongest points is its ability to execute multi-phase tasks. The system can perform up to 10 atomic actions in a single sequence.

An atomic action is a simple step within a larger task. Picking up an object, moving a piece, or dropping something in another place are examples of this type of action.

Many current robotic systems usually handle only 2 or 3 actions in sequence. Therefore, reaching 10 steps represents a significant leap for more complex tasks.

This capability can bring robots closer to real-world activities. In environments such as factories, stores, and homes, a task rarely depends on just one simple movement.

The AI brain tries to repeat the task when something goes wrong

Motubrain also showed the ability to react during execution. In practical tests, when an attempt failed in the middle of a task, the brain system was able to recognize the problem and try again.

One example involves the act of picking up an object. If the first attempt failed, the robot could adjust the action and repeat the movement without having received specific training for that error.

This point is important because the real world is full of unforeseen events. Objects change places, surfaces hinder movements, and simple tasks can fail due to small details.

Jun Zhu, founder of ShengShu Technology, summarized the project’s idea with the phrase: “A true world model must then be able to build a unified representation of the real world and predict how it evolves.”

Robotics companies are already on Motubrain’s path

ShengShu Technology states that Motubrain is already being used by robotics companies in active training programs. The environments cited include industrial, commercial, and domestic areas.

The partnerships involve companies such as Astribot, SimpleAI, and Anyverse Dynamics. The intention is to expand the model’s presence in different uses of robotics.

The project also received significant financial support. ShengShu secured a Series B round of US$ 293 million led by Alibaba Cloud.

This amount strengthens the bet on embedded artificial intelligence systems. This type of AI functions within physical machines, such as robots, and not just on screens or applications.

Unified architecture attempts to replace robots full of separate parts

Motubrain’s proposal is to replace the logic of separate modules with a single brain system. The architecture uses three flows to integrate different information, such as image, language, and movement.

In simple terms, these three flows function as paths through which the robot interprets what it sees, what it receives as a command, and what it needs to do.

The company also argues that more advanced robots need to unite perception, reasoning, prediction, generation, and action in a single structure. The statement reinforces this vision: “We believe that general world models should not be built as stitched modules, but as a unified architecture that brings together perception, reasoning, prediction, generation, and action into a single system.”

This path can make robots more prepared for varied tasks. Still, large-scale adoption depends on safety, cost, integration with existing machines, and results outside of tests.

Motubrain shows a new phase of robotics with artificial intelligence

Motubrain puts China in the spotlight in the race for more flexible robots. The model combines vision, language, and action, achieves 96.0 across 50 tasks, surpasses 95.0 in random environments, and executes up to 10 steps in sequence.

The promise is not just to create robots that, therefore, obey orders. The goal is to bring machines closer to real tasks, with more adaptation, more movement sequences, and greater ability to correct failures.

This advancement could change the relationship between robots and work in factories, commerce, and homes. But what about you, would you trust a robot with this type of intelligence to help with daily tasks, or do you think this technology still needs to mature a lot?

China creates a unique brain for robots that combines vision, language, and action, executes up to ten steps by itself, and promises to change factories, commerce, and homes.

Chinese model achieved a performance of 96.0 in 50 tasks, surpassed 95.0 in random environments, learns from videos, understands human commands, attempts to correct failures during real actions, and targets use in industrial, commercial, and domestic environments

Motubrain functions as a general brain for robots

Model learns from videos, commands, and actions simultaneously

Tests show 63.77 in WorldArena and 96.0 in 50 tasks

Robot can perform up to 10 steps in a single sequence

The AI brain tries to repeat the task when something goes wrong

Robotics companies are already on Motubrain’s path

Unified architecture attempts to replace robots full of separate parts

Motubrain shows a new phase of robotics with artificial intelligence

With billions in loans, China is rebuilding Nigeria’s railway network from scratch, abandoning the British colonial tracks that have turned to scrap and creating a corridor of over 1,300 km to take the load off the roads of Africa’s largest economy.

Forget petroleum-based asphalt: a Scottish company transforms 684,000 plastic bottles into one kilometer of road that is 60% more resistant and up to 10 times more durable, and the technology has already started replacing traditional asphalt in more than 30 countries.

China is building an artificial island in the middle of the sea, covering 20 km², to construct the largest offshore airport in the world, the Dalian Jinzhouwan. This $4.3 billion project is anchored on more than 3,000 pillars embedded in the rock beneath the ocean and is scheduled to open in 2035.

Austrian engineers have developed an inflatable concrete technology that transforms flat slabs into domes using only air pressure, eliminating scaffolding and temporary forms, and potentially changing the way bridges, tunnels, and curved roofs are constructed worldwide.

A robot named Walter does the work of five bricklayers per hour and can save the construction industry, where the average age of professionals is 46 years and almost no one wants to learn the trade anymore, in the United Kingdom.

Spain installs two giant pipelines 2.2 km off the coast to draw water from the Mediterranean and supply a megafacility capable of producing 200 million liters of drinking water per day.

Without wheels, iron, or machines, teams of up to 32 men would have dragged 2.3-ton blocks over wet sand up a ramp embedded in the pyramid itself, raising the structure in 20 to 27 years, according to a new computer model that reignites a millennia-old debate.

Why are people parking hundreds of telescopes worth more than $10,000 in the middle of nowhere?

New Xiaomi device heats water in 3 seconds, automatically makes ice cubes, and also allows temperature adjustment via smartphone between 40°C and 90°C.

American donated US$ 5,000 to a Chinese farmer to plant trees in the desert, and decades later the money turned into a forest with more than 50,000 trees.

The United States purchased for $125 million a ship that Shell used for drilling oil in the Arctic, spent another $25 million refurbishing it, and renamed it Storis because the largest economy on the planet can no longer build an icebreaker on its own.

Carnivorous dinosaur in China left footprints at just 1 km/h and surprised researchers in the field with more than 5,000 tracks preserved for over 100 million years.

China creates a unique brain for robots that combines vision, language, and action, executes up to ten steps by itself, and promises to change factories, commerce, and homes.

Chinese model achieved a performance of 96.0 in 50 tasks, surpassed 95.0 in random environments, learns from videos, understands human commands, attempts to correct failures during real actions, and targets use in industrial, commercial, and domestic environments

Motubrain functions as a general brain for robots

Model learns from videos, commands, and actions simultaneously

Tests show 63.77 in WorldArena and 96.0 in 50 tasks

Robot can perform up to 10 steps in a single sequence

The AI brain tries to repeat the task when something goes wrong

Robotics companies are already on Motubrain’s path

Unified architecture attempts to replace robots full of separate parts

Motubrain shows a new phase of robotics with artificial intelligence

With billions in loans, China is rebuilding Nigeria’s railway network from scratch, abandoning the British colonial tracks that have turned to scrap and creating a corridor of over 1,300 km to take the load off the roads of Africa’s largest economy.

Forget petroleum-based asphalt: a Scottish company transforms 684,000 plastic bottles into one kilometer of road that is 60% more resistant and up to 10 times more durable, and the technology has already started replacing traditional asphalt in more than 30 countries.

China is building an artificial island in the middle of the sea, covering 20 km², to construct the largest offshore airport in the world, the Dalian Jinzhouwan. This $4.3 billion project is anchored on more than 3,000 pillars embedded in the rock beneath the ocean and is scheduled to open in 2035.

Austrian engineers have developed an inflatable concrete technology that transforms flat slabs into domes using only air pressure, eliminating scaffolding and temporary forms, and potentially changing the way bridges, tunnels, and curved roofs are constructed worldwide.

A robot named Walter does the work of five bricklayers per hour and can save the construction industry, where the average age of professionals is 46 years and almost no one wants to learn the trade anymore, in the United Kingdom.

Spain installs two giant pipelines 2.2 km off the coast to draw water from the Mediterranean and supply a megafacility capable of producing 200 million liters of drinking water per day.

Without wheels, iron, or machines, teams of up to 32 men would have dragged 2.3-ton blocks over wet sand up a ramp embedded in the pyramid itself, raising the structure in 20 to 27 years, according to a new computer model that reignites a millennia-old debate.

Why are people parking hundreds of telescopes worth more than $10,000 in the middle of nowhere?

New Xiaomi device heats water in 3 seconds, automatically makes ice cubes, and also allows temperature adjustment via smartphone between 40°C and 90°C.

Carnivorous dinosaur in China left footprints at just 1 km/h and surprised researchers in the field with more than 5,000 tracks preserved for over 100 million years.

A 400-meter colossus powered by natural gas has just set sail from Shanghai to Europe, carrying over 24,000 containers. It is the world’s largest dual-fuel container ship and can cross the planet without refueling even once.

A new study overturns the central idea we learned about planets and suggests that Earth is the exception rather than the rule; most rocky worlds in the galaxy may not have a core or mantle, just a single turbulent fluid extending to the center.

Why are people parking hundreds of telescopes worth more than $10,000 in the middle of nowhere?

New Xiaomi device heats water in 3 seconds, automatically makes ice cubes, and also allows temperature adjustment via smartphone between 40°C and 90°C.

American donated US$ 5,000 to a Chinese farmer to plant trees in the desert, and decades later the money turned into a forest with more than 50,000 trees.

The United States purchased for $125 million a ship that Shell used for drilling oil in the Arctic, spent another $25 million refurbishing it, and renamed it Storis because the largest economy on the planet can no longer build an icebreaker on its own.

Carnivorous dinosaur in China left footprints at just 1 km/h and surprised researchers in the field with more than 5,000 tracks preserved for over 100 million years.