← All issues
Thousand Token Wood: Shipping a Multi-Agent Economy on a 3B Model

Thousand Token Wood: Shipping a Multi-Agent Economy on a 3B Model

June 7, 2026 · By Mansa Muhammad

Small models are not just for chatbots; they are the engine for high-frequency, multi-agent simulations where cost and latency dictate the architecture. In the project Thousand Token Wood, a 3B model serves as the backbone for a tiny economy featuring five woodland creatures. These agents, running on Qwen2.5-3B, trade five goods for pebbles, gossip, hoard, and panic.

The technical setup uses vLLM on Modal, with a Gradio app providing the interface. This architecture demonstrates that a 3B model functions as a reliable format generator, even if its reasoning remains unreliable.

The project reveals that emergent systems require engineered scarcity to avoid stagnation. An initial version of the economy failed because production outran consumption, leading to a state where every creature was self-sufficient. To force market activity, specific constraints were implemented:

  • Diet variety: Creatures can eat only one unit of any single food per meal, necessitating the purchase of foods they do not grow.
  • Spoilage: Perishable food rots if hoarded, forcing the sale of surplus.
  • A winter fuel crisis: Every creature must burn firewood each turn. As the need rises over time, the system relies on a single creature that makes firewood.

This scarcity drives economic volatility, creating bubbles, crashes, and a widening wealth gap.

The primary engineering challenge was the gap between structural accuracy and economic logic. While the 3B model emitted valid JSON on 100% of calls, its economic judgment was poor, such as an acorn producer attempting to buy acorns. The solution was not increasing model size, but refining the prompt. By informing each agent of what it produced and must never buy, providing a list of goods it was short on, and including one worked example, decision quality improved.

For developers building agentic systems, the lesson is clear: use small models to make real-time simulation feasible through batched GPU calls, but rely on prompt engineering and designed constraints to manage the inherent limitations of the model's reasoning.

How much of your agentic architecture is dependent on model intelligence versus the structural constraints you have engineered into the environment?

Subscribe to The Mansa Report

Strategic intelligence on AI, business building, and the future of technology. Delivered Monday through Friday.