Environment Overview

FLE is an agent evaluation environment built on the game of Factorio, a popular resource management simulation game.

The REPL Pattern

Agents interact with FLE through code synthesis using a REPL (Read-Eval-Print-Loop) pattern:

Observation: The agent observes the world through the output streams (stderr/stdout) of their last program
Action: The agent generates a Python program to perform their desired action
Feedback: The environment executes the program, assigns variables, adds classes/functions to the namespace, and provides an output stream

Example Interaction

Action:

# 1. Get iron patch and place mining drill
drill = place_entity(
    entity=Prototype.MiningDrill,
    position=nearest(Resource.IronOre),
    direction=Direction.NORTH
)
# 2. Add output storage
chest = place_entity_next_to(
    entity=Prototype.IronChest,
    reference_position=drill.drop_position,
    direction=Direction.SOUTH
)
# 3. Verify automation chain and observe entities
sleep(10) # Sleep for 10 seconds
assert drill.status == EntityStatus.WORKING
print(get_entities())

Feedback:

>>> [ BurnerMiningDrill(fuel=Inventory({'coal': 4}),
>>>                     name='burner-mining-drill',
>>>                     direction=Direction.DOWN,
>>>                     position=Position(x=-28.0, y=-61.0),
>>>                     energy=2666.6666666667,
>>>                     tile_dimensions=TileDimensions(tile_width=2.0, tile_height=2.0),
>>>                     status=EntityStatus.WORKING,
>>>                     neighbours=[Entity(name='iron-chest', direction=DOWN, position=Position(x=-27.5 y=-59.5)],
>>>                     drop_position=Position(x=-27.5, y=-59.5),
>>>                     resources=[Ingredient(name='iron-ore', count=30000, type=None)]),
>>>   Chest(name='iron-chest',
>>>         direction=Direction.UP,
>>>         position=Position(x=-27.5, y=-59.5),
>>>         energy=0.0,
>>>         tile_dimensions=TileDimensions(tile_width=1.0, tile_height=1.0),
>>>         status=EntityStatus.NORMAL,
>>>         inventory=Inventory({'iron-ore': 75}))]

Available Tools

Agents are provided with the Python standard library and an API comprising Tools Overview that they can use.

Tools are functions that:

Perform a game action
Return a typed object (e.g. an Inventory)
Can be stored as a named variable in the Python namespace for later use

The Namespace

The namespace acts as an episodic symbolic memory system. Saved objects represent an observation of the environment at the moment of query.

This enables agents to:

Maintain complex state representations
Build hierarchical abstractions as factories scale
Reference previous observations and computations

Observations

Agents observe stdout and stderr - the output streams of their program.

Agents may intentionally:

Print relevant objects to construct observations
Print computations and intermediate results
Use print() strategically to monitor state

Error Handling

Mistakes in code or invalid operations raise typed exceptions with detailed context that is written to stderr.

This enables agents to:

Reactively debug their programs after execution
Proactively use runtime assertions during execution to self-verify actions
Learn from detailed error messages

Custom Functions and Classes

Agents can enhance their internal representation of the game state by defining:

Utility functions for reuse throughout an episode, to encapsulate previously successful logic
Classes in the namespace to better organize the data retrieved from the game

These definitions persist in the namespace across actions within an episode.

Action Space

The action space is defined as:

{
    'agent_idx': Discrete(instance.num_agents),  # Index of the agent taking the action
    'game_state': Text(max_length=1000000),      # Optional: game state to reset to
    'code': Text(max_length=10000)               # Python code to execute
}

Observation Space

The observation space includes:

raw_text: Output from the last action
entities: List of entities on the map
inventory: Current inventory state
research: Research progress and technologies
game_info: Game state (tick, time, speed)
score: Current score
flows: Production statistics
task_verification: Task completion status
messages: Inter-agent messages (for multi-agent scenarios)
serialized_functions: Available functions
task_info: Information about the task
map_image: Base64 encoded PNG image

Environment Methods

Standard Gym Interface

All FLE environments follow the standard OpenAI Gym interface:

# Reset the environment
obs = env.reset(options: Dict[str, Any], seed: Optional[int] = None) -> Dict[str, Any]

# Take a step
obs, reward, terminated, truncated, info = env.step(action: Action)

# Clean up
env.close()

Architecture

┌─────────────────┐
│     Agent       │
│ (Synthesizes    │
│  Python Code)   │
└────────┬────────┘
         │
         ▼
┌─────────────────────────────┐
│  Learning Environment       │
│  ┌─────────────────────┐   │
│  │   Interpreter       │   │
│  │   - Executes code   │   │
│  │   - Manages         │   │
│  │     namespace       │   │
│  └──────┬──────────────┘   │
│         │                   │
│  ┌──────▼──────────────┐   │
│  │   client.py         │   │
│  │   (Tool Interface)  │   │
│  └──────┬──────────────┘   │
└─────────┼──────────────────┘
          │ Remote TCP Call
          ▼
┌─────────────────────────────┐
│   Factorio Server           │
│  ┌─────────────────────┐   │
│  │   server.lua        │   │
│  │   (Game Logic)      │   │
│  └──────┬──────────────┘   │
│         │                   │
│  ┌──────▼──────────────┐   │
│  │  Factorio Engine    │   │
│  │  (Game Simulation)  │   │
│  └─────────────────────┘   │
└─────────────────────────────┘

Task Types

FLE provides two main evaluation settings:

Lab-Play

24 structured tasks with fixed resources, testing specific capabilities:

Circuits: Advanced circuits, electronic circuits, processing units
Science Packs: Automation, logistics, chemical, military, production, utility
Components: Batteries, engines, inserters, gears, low density structures
Raw Materials: Iron ore, iron plates, steel plates, plastic bars
Oil & Chemicals: Crude oil, petroleum gas, sulfuric acid, sulfur
Military: Piercing rounds, stone walls

Most tasks require 16 items per 60 seconds; fluid tasks require 250 units per 60 seconds.

Open-Play

An unbounded task of building the largest possible factory on a procedurally generated map. This tests:

Long-term planning
Resource optimization
Scaling strategies
Error recovery

Next Steps

Explore the Gym Environment Registry to see all available tasks
Learn about Tools Overview available to agents
See Quickstart for usage examples