1. Home
  2. / Science and Technology
  3. / Scientists Put Artificial Intelligence to Play Dungeons & Dragons and Use the Results to Measure Strategy, Memory, and Teamwork That Can Serve Real-World Industry and Operations
Reading time 4 min of reading Comments 0 comments

Scientists Put Artificial Intelligence to Play Dungeons & Dragons and Use the Results to Measure Strategy, Memory, and Teamwork That Can Serve Real-World Industry and Operations

Written by Flavia Marinho
Published on 10/02/2026 at 19:53
Updated on 10/02/2026 at 19:55
Dungeons & Dragons - jogo - modelos - estrategia -
Num estudo apresentado em uma grande conferência de tecnologia em San Diego, pesquisadores usaram Dungeons & Dragons com 3 modelos de IA em combates de 10 turnos para avaliar planejamento, regras e colaboração.
  • Reação
  • Reação
  • Reação
  • Reação
  • Reação
  • Reação
32 pessoas reagiram a isso.
Reagir ao artigo

In a Study Presented at a Major Technology Conference in San Diego, Researchers Used Dungeons & Dragons with 3 AI Models in 10-Turn Combats to Assess Planning, Rules, and Collaboration.

Dungeons & Dragons is often remembered as a tabletop game with dice, characters, and improvisational decisions. However, it has a detail that is of great interest to technology: clear rules, defined objectives, and constant dialogue.

That’s exactly why researchers had artificial intelligence models play Dungeons & Dragons, not for fun, but as a test of strategy and teamwork. The idea is to observe whether these models can plan in multiple steps, follow rules without contradiction, and collaborate with other models and even with humans.

This type of assessment has a very straightforward goal: to understand if AI can operate for longer periods without human intervention, maintaining coherence and reliable decisions, something that requires memory and strategic thinking.

Why Dungeons & Dragons Became a Testing Ground for Long Decisions and Rigid Rules

The researchers argue that Dungeons & Dragons is an almost perfect environment for this type of test because it brings together two things that often conflict.

The first is creativity, since everything happens in dialogue. The second is rigidity, because the game has well-defined rules and limits. To perform well, the model needs to communicate, remember what has been decided, plan, and also perceive the intentions and tactics of the opponent.

The game functions as a bridge between natural language and game mechanics, making it clear when the AI is just speaking nicely and when it is making decisions that truly make sense within a system of rules.

How D&D Agents Works, with Dungeon Master, Heroes, and a Mix of AI with Humans

The experiment used a framework called D&D Agents. In it, a single model can take on the role of the Dungeon Master, the Game Master who drives the story and controls monsters, and can also take on the role of a hero.

In each scenario, the setup consisted of 1 Game Master and 4 heroes. The format is flexible: models can play with other models, and humans can fill any role. One possible example is a model acting as the Game Master, while two models and two people play as heroes.

This mix matters because the test measures not only individual efficiency but also coordination and communication when there are different voices on the same team.

The Test Was Not a Full Campaign, It Was Short Combat with 3 Scenarios and 10 Turns

The system did not attempt to simulate a full campaign, those that last hours or weeks. The focus was on combat encounters taken from a prepared adventure called Lost Mine of Phandelver.

To set up each round, the team selected 1 of 3 combat scenarios, defined a set of 4 characters, and adjusted the power level of these characters with three tiers: low, medium, or high.

Each episode lasted 10 turns, and after that, the results were collected to compare performance, choices, and consistency over time.

Three Models Were Compared, and One of Them Performed Better When the Game Tightened

The researchers tested three models in the simulation: DeepSeek V3, Claude Haiku 3.5, and GPT 4.

The comparison used Dungeons & Dragons as a metric to assess long-range planning and the ability to use tools, among other qualities.

This is related to real-world applications mentioned in the study, such as supply chain optimization and manufacturing line design, as well as scenarios requiring coordination between agents, such as disaster response modeling and search and rescue operations.

Overall, Claude Haiku 3.5 showed the best combat efficiency, particularly in the more challenging scenarios. GPT 4 followed closely behind. DeepSeek V3 struggled the most.

In easier scenarios, resource conservation was similar among the three, which makes sense because the test was isolated combat, without the pressure to save for a long adventure.

When things got tough, Claude Haiku 3.5 showed more willingness to expend resources, leading to better results.

Where Industry Enters: What a Game Reveals About Factories and Supply Chains

The connection to industry lies in the type of skills assessed. Step-by-step planning, coordination between agents, and intelligent resource use are exactly what emerges in tasks like supply chain optimization and manufacturing line design.

The same logic applies to operations requiring teams of agents working together, such as disaster response modeling and search and rescue systems. The game becomes a “mini-world” with clear rules and objectives, where it is possible to measure if AI can remain consistent long enough to be useful outside the lab.

The Most Curious Detail: Measuring Performance and Character Consistency

Besides winning or losing, the study also evaluated how the models stayed in character, with a performance quality metric that looked at consistency and variation of voices throughout the game.

The researchers observed that some models created short speeches and repeated styles, while others adapted their way of speaking better according to the character or monster in the scene.

If an AI can maintain strategy and cooperation over 10 turns of linked decisions, that seems like a good “rehearsal” for long real-world problems, or is it still too early to take the result seriously outside the game? Share in the comments the point that caught your attention the most.

Inscreva-se
Notificar de
guest
0 Comentários
Mais recente
Mais antigos Mais votado
Feedbacks
Visualizar todos comentários
Flavia Marinho

Flavia Marinho é Engenheira pós-graduada, com vasta experiência na indústria de construção naval onshore e offshore. Nos últimos anos, tem se dedicado a escrever artigos para sites de notícias nas áreas militar, segurança, indústria, petróleo e gás, energia, construção naval, geopolítica, empregos e cursos. Entre em contato com flaviacamil@gmail.com ou WhatsApp +55 21 973996379 para correções, sugestão de pauta, divulgação de vagas de emprego ou proposta de publicidade em nosso portal.

Share in apps
0
Adoraríamos sua opnião sobre esse assunto, comente!x