Multi-Agent Factorio

FLE v0.2 Release Notes

May 8, 2025

Factorio Learning Environment Team
Multi-Agent Coordination
Cooperative factory building
Reflection & Backtracking
unlocks stronger automation
Vision agent demonstration
Vision Agents
for enhanced spatial reasoning

Hey everyone,

It's Mart, Neel and Jack from the Factorio Learning Environment team.

We believe in evaluating AI agents in unbounded, open-ended and highly dynamic settings - such that we can observe a uniquely strong signal of model capability even as they surpass human performance in various domains.

Since our initial release, we have been working hard to expand the environment to support multi-agent scenarios, reasoning models and MCP for human-in-the-loop evals.

We have also spent time experimenting with different ways to elicit more performance out of agents in the game, namely tools for vision and reflection.

Today, we are proud to release v0.2, which includes several exciting new features and improvements.

Thanks for checking this out.

Multi-Agent Support

We now leverage Factorio’s native multiplayer mechanics to support multi-agent scenarios. Agents can communicate in a broadcast or peer-to-peer fashion, and tasks can now be constructed with custom instructions for different agents. These features allow evaluation of cooperation, conflict and collusion between agents with partial observability.


In this release, agents fully yield to each other when planning and taking actions in order to minimize coordination challenges. Despite the lack of true concurrency, we find agents can struggle to fully account for each others’ actions, leading to novel errors potentially difficult to recover from.


We are excited for the possibilities of interesting experiments in capability and safety research that this work opens up.



Multi-agent Support. Here we see agents cooperatively dividing up responsibilities and communicating updates in order to accomplish a common goal.

Reasoning Models + MCP

We now support reasoning models over MCP, with tool invocation within reasoning chains, which you can connect to with your client of choice. While we use Claude Code / Desktop by preference, you can find others here. You can now run entire agent trajectories from your terminal, or stop to help agents out when they get stuck. You can also turn on 'research mode' from OpenAI and Anthropic, to allow the agents to plan more deeply.


Reasoning models with MCP integration. The environment now supports deeper planning capabilities through integration with reasoning-enabled models from major providers. This enables more thoughtful factory design and long-term planning, with the option for human intervention when agents encounter difficulties.

Reflection and Backtracking

Agents are unable to efficiently reason and act in broken factory states.


This often happens when an agent program errors out midway through execution, resulting in a "half-completed" game state. Continuing from these broken states results in the agents degenerating into error-correcting rabbit holes, where they unsuccessfully try to fix the game state over a long time horizon, while making no meaningful progress towards the goal.


To decrease the probability of agents ending up in these intermediate broken factory states, we implemented a simple yet effective backtracking system where the agent iteratively improves upon their programs using environment execution feedback until the program executes successfully (or the maximum number of iterations is reached).


If a program execution creates an error, the agent is able to synthesise an improved program using the error message and the history of explored programs. The improved program is crucially executed from the pre-error game state and the final game state is updated for a new program only if the execution is error-free.


This backtracking system resulted in a 6% increase in lab-play performance using Claude Sonnet 3.5, both due to improved consistency in solving lower-level tasks, and solving previously-unsolved tasks (i.e creating an automation science factory / electronic circuit factory). The exact results can be found in the table below and implementation of the backtracking system can be found here


Clean State Advantage

Agents always work from a clean game state, avoiding the compounding errors that previously occurred when working with partially broken factories.

Exploratory Learning

Agents can explore multiple approaches to achieve goals using environmental feedback without having to continue from messy game states.

Improved Context Window

Context windows of agents include only successful traces, thus improving few-shot learning and reducing confusion from failed attempts.

Make 16 Electronic Circuits
per minute
Craft 16 Automation Science
per minute
Backtracking and reflection improves reliability. The agent can undo failed programs, to ensure that the game state remains relatively clean. This improves agent outcomes and reliability.

Vision Agents

We now include visual agents, which can render simplified snapshots of the game map with annotations for enhanced spatial reasoning - even when running on a headless server where the Factorio client is not connected.


Anecdotally however, current vision models are not yet strong enough to significantly improve benchmarked performance.


Multi-agent team collaborating on resource gathering
Game State
in the Factorio client
Multi-agent team collaborating on resource gathering
Visual Observation + Legend
an agent can use to reason over

Model Testing Results

We conducted initial lab-play testing with Claude 3.7 and Gemini Pro 2.5 in v0.2.0


Model Automation Record Lab Play (%)
Claude 3.7 Sonnet ⭐ 29.1
Claude 3.5 Sonnet w. reflexion ⭐ 28.1
Claude 3.5 Sonnet 21.9
Gemini-2.5-pro (March 3rd) ⭐ 18.4
GPT-4o 16.6
Deepseek-v 15.1
Gemini-2-Flash 13.0
LLama-3.3-70b 5.2
GPT-4o-Mini 4.2
⭐ Models highlighted with stars are new additions

What's Next

We are super excited to work on the next phase of this environment, including:

A2A Integration

Agent-to-agent integration with agents from across the internet, allowing diverse models to collaborate and compete in the Factorio environment.

Expanded Multi-Agent Scenarios

Support for countless combinations of multi-agent scenarios, including specialized teams, competitive environments, and mixed-capability collaborations.

Training Infrastructure

Train Paperclip Maximisers on our (modest) cluster with optimized training pipelines for reinforcement learning from human feedback.

Benchmark Challenges

Beat our SOTA results for agents interacting in the game with new optimization techniques and collaborative strategies.

Join Our Team

Join our team and contribute to one of the AI research community's most challenging problems - building open-ended / unsaturateable evals for post-AGI frontier models.

Thanks to Wube and the Factorio team for building such a great game.