We present a novel framework for evaluating Large Language Models (LLMs) in unbounded scenarios using Factorio, a game centered on automation and exponential resource production. Unlike traditional environments with fixed reward structures, Factorio's automation mechanics enable truly open-ended growth potential, making it an ideal testbed for studying AI systems pursuing unbounded objectives.
Our framework consists of three core components:
1) a Python-based API that enables LLMs to interact with the game environment through a set of well-defined tools, and
2) a self-verification mechanism using runtime assertions to maintain consistency between agent beliefs and game state, and
3) a persistent Python REPL execution environment that allows agents to maintain state and build increasingly complex automation systems.
We demonstrate the framework's capabilities by training an agent through Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL) to maximize resource production growth. Using the canonical Paperclip Maximization scenario as our objective case, we provide the first empirical demonstration of how LLMs develop and optimize unbounded objectives.
Most significantly, we observe the emergence of sophisticated automation strategies that mirror theoretical predictions about AI systems pursuing unbounded objectives. As agents become more capable at automation, they demonstrate predicted instrumental objectives such as resource hoarding, infrastructure protection, and expansion optimization.
Our findings provide the first empirical grounding for theoretical discussions about AI alignment in unbounded scenarios. By creating a controlled environment where AI systems can autonomously expand their operational capacity, we enable concrete study of alignment challenges that may arise in real-world applications where AI systems have access to self-improvement capabilities. This work bridges the gap between theoretical alignment concerns and empirical observation, offering valuable insights for developing robust alignment strategies.