Recycling Robot
Sutton & Barto's recycling robot — search, wait or recharge under a battery MDP.
The recycling-robot Markov decision process. From a high or low battery state the robot picks an action under a Bold or Timid policy: search (best reward, may drain the battery), wait (safe, smaller reward) or recharge. Each action transitions to a short outcome state whose probabilities encode the environment's response — finding cans, or being rescued at a steep penalty — so the stochastic transition function is expressed directly in the model.
A minimal, exact MDP for studying policy under risk. The two-stage action→outcome encoding is a reusable pattern for any stochastic transition, and the Bold-versus-Timid reward gap is visible directly in the exported reward and rescue counters.
A live sample of the dataset this scenario generates.
Linked tables with guaranteed referential integrity.
Generated REST endpoints. Also exposed as MCP tools.
OSI-compatible definition, emitted with the dataset.
# recycling-robot.osi.yaml — emitted automatically semantic_model: name: "recycling-robot" source: "duckdb://recycling-robot.db" entities: - name: robot primary_key: id dimensions: - name: state type: categorical - name: t type: time measures: - name: row_count agg: count - name: active agg: sum filter: "state = 'ACTIVE'"
More worlds.
Game of Life
Conway's automaton as a perfectly observable, deterministic grid world.
London Underground
A live tube graph — eleven lines, hundreds of trains, platforms held as a mutex.
Pac-Man
A self-playing arcade game — ghosts chase a flood-filled distance field.