Scientists Put Artificial Intelligence to Play Dungeons & Dragons and Use the Results to Measure Strategy, Memory, and Teamwork That Can Serve Real-World Industry and Operations

Written by Flavia Marinho

Published on 10/02/2026 at 19:53

Updated on 10/02/2026 at 19:55

Num estudo apresentado em uma grande conferência de tecnologia em San Diego, pesquisadores usaram Dungeons & Dragons com 3 modelos de IA em combates de 10 turnos para avaliar planejamento, regras e colaboração.

32 pessoas reagiram a isso.

In a Study Presented at a Major Technology Conference in San Diego, Researchers Used Dungeons & Dragons with 3 AI Models in 10-Turn Combats to Assess Planning, Rules, and Collaboration.

Dungeons & Dragons is often remembered as a tabletop game with dice, characters, and improvisational decisions. However, it has a detail that is of great interest to technology: clear rules, defined objectives, and constant dialogue.

That’s exactly why researchers had artificial intelligence models play Dungeons & Dragons, not for fun, but as a test of strategy and teamwork. The idea is to observe whether these models can plan in multiple steps, follow rules without contradiction, and collaborate with other models and even with humans.

This type of assessment has a very straightforward goal: to understand if AI can operate for longer periods without human intervention, maintaining coherence and reliable decisions, something that requires memory and strategic thinking.

ARTICLE CONTINUES BELOW

Major Oil, Gas, and Energy Event Will Take Place in Brazil: Macaé Energy Will Gather Petrobras, Equinor, Prio, Among Other Suppliers and Energy Executives for Business, Networking, and Employment Opportunities

The Technical Agenda and the Fair in the City of Macaé Should Boost Discussions on Investments, Energy Transition, and Development of the Oil, Gas, and Energy Chain in the National...

Paulo Nogueira

Why Dungeons & Dragons Became a Testing Ground for Long Decisions and Rigid Rules

The researchers argue that Dungeons & Dragons is an almost perfect environment for this type of test because it brings together two things that often conflict.

The first is creativity, since everything happens in dialogue. The second is rigidity, because the game has well-defined rules and limits. To perform well, the model needs to communicate, remember what has been decided, plan, and also perceive the intentions and tactics of the opponent.

The game functions as a bridge between natural language and game mechanics, making it clear when the AI is just speaking nicely and when it is making decisions that truly make sense within a system of rules.

How D&D Agents Works, with Dungeon Master, Heroes, and a Mix of AI with Humans

The experiment used a framework called D&D Agents. In it, a single model can take on the role of the Dungeon Master, the Game Master who drives the story and controls monsters, and can also take on the role of a hero.

In each scenario, the setup consisted of 1 Game Master and 4 heroes. The format is flexible: models can play with other models, and humans can fill any role. One possible example is a model acting as the Game Master, while two models and two people play as heroes.

This mix matters because the test measures not only individual efficiency but also coordination and communication when there are different voices on the same team.

The Test Was Not a Full Campaign, It Was Short Combat with 3 Scenarios and 10 Turns

The system did not attempt to simulate a full campaign, those that last hours or weeks. The focus was on combat encounters taken from a prepared adventure called Lost Mine of Phandelver.

To set up each round, the team selected 1 of 3 combat scenarios, defined a set of 4 characters, and adjusted the power level of these characters with three tiers: low, medium, or high.

Each episode lasted 10 turns, and after that, the results were collected to compare performance, choices, and consistency over time.

Three Models Were Compared, and One of Them Performed Better When the Game Tightened

The researchers tested three models in the simulation: DeepSeek V3, Claude Haiku 3.5, and GPT 4.

The comparison used Dungeons & Dragons as a metric to assess long-range planning and the ability to use tools, among other qualities.

This is related to real-world applications mentioned in the study, such as supply chain optimization and manufacturing line design, as well as scenarios requiring coordination between agents, such as disaster response modeling and search and rescue operations.

Overall, Claude Haiku 3.5 showed the best combat efficiency, particularly in the more challenging scenarios. GPT 4 followed closely behind. DeepSeek V3 struggled the most.

In easier scenarios, resource conservation was similar among the three, which makes sense because the test was isolated combat, without the pressure to save for a long adventure.

When things got tough, Claude Haiku 3.5 showed more willingness to expend resources, leading to better results.

Where Industry Enters: What a Game Reveals About Factories and Supply Chains

The connection to industry lies in the type of skills assessed. Step-by-step planning, coordination between agents, and intelligent resource use are exactly what emerges in tasks like supply chain optimization and manufacturing line design.

The same logic applies to operations requiring teams of agents working together, such as disaster response modeling and search and rescue systems. The game becomes a “mini-world” with clear rules and objectives, where it is possible to measure if AI can remain consistent long enough to be useful outside the lab.

The Most Curious Detail: Measuring Performance and Character Consistency

Besides winning or losing, the study also evaluated how the models stayed in character, with a performance quality metric that looked at consistency and variation of voices throughout the game.

The researchers observed that some models created short speeches and repeated styles, while others adapted their way of speaking better according to the character or monster in the scene.

If an AI can maintain strategy and cooperation over 10 turns of linked decisions, that seems like a good “rehearsal” for long real-world problems, or is it still too early to take the result seriously outside the game? Share in the comments the point that caught your attention the most.

0 Comentários

Mais recente

Mais antigos Mais votado

Feedbacks

Visualizar todos comentários

Scientists Put Artificial Intelligence to Play Dungeons & Dragons and Use the Results to Measure Strategy, Memory, and Teamwork That Can Serve Real-World Industry and Operations

In a Study Presented at a Major Technology Conference in San Diego, Researchers Used Dungeons & Dragons with 3 AI Models in 10-Turn Combats to Assess Planning, Rules, and Collaboration.

Major Oil, Gas, and Energy Event Will Take Place in Brazil: Macaé Energy Will Gather Petrobras, Equinor, Prio, Among Other Suppliers and Energy Executives for Business, Networking, and Employment Opportunities

Why Dungeons & Dragons Became a Testing Ground for Long Decisions and Rigid Rules

How D&D Agents Works, with Dungeon Master, Heroes, and a Mix of AI with Humans

The Test Was Not a Full Campaign, It Was Short Combat with 3 Scenarios and 10 Turns

Three Models Were Compared, and One of Them Performed Better When the Game Tightened

Where Industry Enters: What a Game Reveals About Factories and Supply Chains

The Most Curious Detail: Measuring Performance and Character Consistency

Where there was only sand and wind at 40 degrees, China built a megacity of 500,000 inhabitants with farms, wineries, and universities in the middle of the desert using melted glacier water from hundreds of kilometers away.

At over 8,800 meters above sea level, the summit of Mount Everest is made up of rocks that originated at the bottom of an ocean about 500 million years ago and were pushed to the highest point on Earth by the collision of tectonic plates.

What seemed like waste has turned into a barrier, and now human hair collected from salons helps retain debris in the water of ancestral channels in Latin America.

Health issues urgent alert for flu and calls for the vaccination of the elderly, children, and pregnant women this Saturday with Day D and drive-thru in Curitiba.

The radio has been in more than half of Brazilian households for decades, and in 2024, for the first time in history, the number of households without a device surpassed those with a device, with a decrease of 2.3 million in just one year.

China banned the export of 22 tons of meat from Argentina.

Motorola launched the Signature with a gold seal from DxOMark, tying with the iPhone 17 Pro in camera performance, Snapdragon 8 Gen 5 that surpassed 3 million in benchmarks, and a zoom that impresses even at night.

Satellites reveal beneath the Sahara a giant river buried for thousands of kilometers: study shows that the largest hot desert on the planet was once traversed by a river system comparable to the largest on Earth.

The government requests the Federal Revenue Service for a new system to automate the income tax declaration, reducing errors, time, and bureaucracy for millions of Brazilians.

With 74% of companies facing difficulties in hiring, technicians and engineers in renewable energy are becoming scarce in Brazil and are essential to support the expansion of solar, wind, and green hydrogen projects.

Scientists have captured something never seen in space: newly born stars are creating gigantic rings of light a thousand times larger than the distance between the Earth and the Sun, and this changes everything we knew about stellar birth.

Scientists Put Artificial Intelligence to Play Dungeons & Dragons and Use the Results to Measure Strategy, Memory, and Teamwork That Can Serve Real-World Industry and Operations

In a Study Presented at a Major Technology Conference in San Diego, Researchers Used Dungeons & Dragons with 3 AI Models in 10-Turn Combats to Assess Planning, Rules, and Collaboration.

Major Oil, Gas, and Energy Event Will Take Place in Brazil: Macaé Energy Will Gather Petrobras, Equinor, Prio, Among Other Suppliers and Energy Executives for Business, Networking, and Employment Opportunities

Why Dungeons & Dragons Became a Testing Ground for Long Decisions and Rigid Rules

How D&D Agents Works, with Dungeon Master, Heroes, and a Mix of AI with Humans

The Test Was Not a Full Campaign, It Was Short Combat with 3 Scenarios and 10 Turns

Three Models Were Compared, and One of Them Performed Better When the Game Tightened

Where Industry Enters: What a Game Reveals About Factories and Supply Chains

The Most Curious Detail: Measuring Performance and Character Consistency

Where there was only sand and wind at 40 degrees, China built a megacity of 500,000 inhabitants with farms, wineries, and universities in the middle of the desert using melted glacier water from hundreds of kilometers away.

At over 8,800 meters above sea level, the summit of Mount Everest is made up of rocks that originated at the bottom of an ocean about 500 million years ago and were pushed to the highest point on Earth by the collision of tectonic plates.

What seemed like waste has turned into a barrier, and now human hair collected from salons helps retain debris in the water of ancestral channels in Latin America.

Health issues urgent alert for flu and calls for the vaccination of the elderly, children, and pregnant women this Saturday with Day D and drive-thru in Curitiba.

The radio has been in more than half of Brazilian households for decades, and in 2024, for the first time in history, the number of households without a device surpassed those with a device, with a decrease of 2.3 million in just one year.

China banned the export of 22 tons of meat from Argentina.

Motorola launched the Signature with a gold seal from DxOMark, tying with the iPhone 17 Pro in camera performance, Snapdragon 8 Gen 5 that surpassed 3 million in benchmarks, and a zoom that impresses even at night.

Satellites reveal beneath the Sahara a giant river buried for thousands of kilometers: study shows that the largest hot desert on the planet was once traversed by a river system comparable to the largest on Earth.

Scientists have captured something never seen in space: newly born stars are creating gigantic rings of light a thousand times larger than the distance between the Earth and the Sun, and this changes everything we knew about stellar birth.

Geologists find traces of a continent that disappeared 155 million years ago after separating from Australia and reveal that it did not sink, but broke into fragments scattered across Southeast Asia.

Samsung launches cordless vertical vacuum cleaner with up to 400W of suction and relies on AI to automatically recognize corners, carpets, and different surfaces.

Motorola launched the Signature with a gold seal from DxOMark, tying with the iPhone 17 Pro in camera performance, Snapdragon 8 Gen 5 that surpassed 3 million in benchmarks, and a zoom that impresses even at night.

Satellites reveal beneath the Sahara a giant river buried for thousands of kilometers: study shows that the largest hot desert on the planet was once traversed by a river system comparable to the largest on Earth.

The government requests the Federal Revenue Service for a new system to automate the income tax declaration, reducing errors, time, and bureaucracy for millions of Brazilians.

With 74% of companies facing difficulties in hiring, technicians and engineers in renewable energy are becoming scarce in Brazil and are essential to support the expansion of solar, wind, and green hydrogen projects.

Scientists have captured something never seen in space: newly born stars are creating gigantic rings of light a thousand times larger than the distance between the Earth and the Sun, and this changes everything we knew about stellar birth.