Environment Overview
FLE is an agent evaluation environment built on the game of Factorio, a popular resource management simulation game.
The REPL Pattern
Agents interact with FLE through code synthesis using a REPL (Read-Eval-Print-Loop) pattern:
Observation: The agent observes the world through the output streams (stderr/stdout) of their last program
Action: The agent generates a Python program to perform their desired action
Feedback: The environment executes the program, assigns variables, adds classes/functions to the namespace, and provides an output stream
Example Interaction
Action:
# 1. Get iron patch and place mining drill
drill = place_entity(
entity=Prototype.MiningDrill,
position=nearest(Resource.IronOre),
direction=Direction.NORTH
)
# 2. Add output storage
chest = place_entity_next_to(
entity=Prototype.IronChest,
reference_position=drill.drop_position,
direction=Direction.SOUTH
)
# 3. Verify automation chain and observe entities
sleep(10) # Sleep for 10 seconds
assert drill.status == EntityStatus.WORKING
print(get_entities())
Feedback:
>>> [ BurnerMiningDrill(fuel=Inventory({'coal': 4}),
>>> name='burner-mining-drill',
>>> direction=Direction.DOWN,
>>> position=Position(x=-28.0, y=-61.0),
>>> energy=2666.6666666667,
>>> tile_dimensions=TileDimensions(tile_width=2.0, tile_height=2.0),
>>> status=EntityStatus.WORKING,
>>> neighbours=[Entity(name='iron-chest', direction=DOWN, position=Position(x=-27.5 y=-59.5)],
>>> drop_position=Position(x=-27.5, y=-59.5),
>>> resources=[Ingredient(name='iron-ore', count=30000, type=None)]),
>>> Chest(name='iron-chest',
>>> direction=Direction.UP,
>>> position=Position(x=-27.5, y=-59.5),
>>> energy=0.0,
>>> tile_dimensions=TileDimensions(tile_width=1.0, tile_height=1.0),
>>> status=EntityStatus.NORMAL,
>>> inventory=Inventory({'iron-ore': 75}))]
Available Tools
Agents are provided with the Python standard library and an API comprising Tools Overview that they can use.
Tools are functions that:
Perform a game action
Return a typed object (e.g. an
Inventory)Can be stored as a named variable in the Python namespace for later use
The Namespace
The namespace acts as an episodic symbolic memory system. Saved objects represent an observation of the environment at the moment of query.
This enables agents to:
Maintain complex state representations
Build hierarchical abstractions as factories scale
Reference previous observations and computations
Observations
Agents observe stdout and stderr - the output streams of their program.
Agents may intentionally:
Print relevant objects to construct observations
Print computations and intermediate results
Use
print()strategically to monitor state
Error Handling
Mistakes in code or invalid operations raise typed exceptions with detailed context that is written to stderr.
This enables agents to:
Reactively debug their programs after execution
Proactively use runtime assertions during execution to self-verify actions
Learn from detailed error messages
Custom Functions and Classes
Agents can enhance their internal representation of the game state by defining:
Utility functions for reuse throughout an episode, to encapsulate previously successful logic
Classes in the namespace to better organize the data retrieved from the game
These definitions persist in the namespace across actions within an episode.
Action Space
The action space is defined as:
{
'agent_idx': Discrete(instance.num_agents), # Index of the agent taking the action
'game_state': Text(max_length=1000000), # Optional: game state to reset to
'code': Text(max_length=10000) # Python code to execute
}
Observation Space
The observation space includes:
raw_text: Output from the last actionentities: List of entities on the mapinventory: Current inventory stateresearch: Research progress and technologiesgame_info: Game state (tick, time, speed)score: Current scoreflows: Production statisticstask_verification: Task completion statusmessages: Inter-agent messages (for multi-agent scenarios)serialized_functions: Available functionstask_info: Information about the taskmap_image: Base64 encoded PNG image
Environment Methods
Standard Gym Interface
All FLE environments follow the standard OpenAI Gym interface:
# Reset the environment
obs = env.reset(options: Dict[str, Any], seed: Optional[int] = None) -> Dict[str, Any]
# Take a step
obs, reward, terminated, truncated, info = env.step(action: Action)
# Clean up
env.close()
Architecture
┌─────────────────┐
│ Agent │
│ (Synthesizes │
│ Python Code) │
└────────┬────────┘
│
▼
┌─────────────────────────────┐
│ Learning Environment │
│ ┌─────────────────────┐ │
│ │ Interpreter │ │
│ │ - Executes code │ │
│ │ - Manages │ │
│ │ namespace │ │
│ └──────┬──────────────┘ │
│ │ │
│ ┌──────▼──────────────┐ │
│ │ client.py │ │
│ │ (Tool Interface) │ │
│ └──────┬──────────────┘ │
└─────────┼──────────────────┘
│ Remote TCP Call
▼
┌─────────────────────────────┐
│ Factorio Server │
│ ┌─────────────────────┐ │
│ │ server.lua │ │
│ │ (Game Logic) │ │
│ └──────┬──────────────┘ │
│ │ │
│ ┌──────▼──────────────┐ │
│ │ Factorio Engine │ │
│ │ (Game Simulation) │ │
│ └─────────────────────┘ │
└─────────────────────────────┘
Task Types
FLE provides two main evaluation settings:
Lab-Play
24 structured tasks with fixed resources, testing specific capabilities:
Circuits: Advanced circuits, electronic circuits, processing units
Science Packs: Automation, logistics, chemical, military, production, utility
Components: Batteries, engines, inserters, gears, low density structures
Raw Materials: Iron ore, iron plates, steel plates, plastic bars
Oil & Chemicals: Crude oil, petroleum gas, sulfuric acid, sulfur
Military: Piercing rounds, stone walls
Most tasks require 16 items per 60 seconds; fluid tasks require 250 units per 60 seconds.
Open-Play
An unbounded task of building the largest possible factory on a procedurally generated map. This tests:
Long-term planning
Resource optimization
Scaling strategies
Error recovery
Next Steps
Explore the Gym Environment Registry to see all available tasks
Learn about Tools Overview available to agents
See Quickstart for usage examples