// philknows.com  /  dev journal

DEVJOURNAL

A running log of sessions, decisions, and breakthroughs building the SNES AI agent — one iteration at a time.

Baron Castle Mapping & WE10

collapse
// context
Where We Left Off
WE9 was the last stable agent before this session — a Python loop that captured SNES frames via OpenCV, sent them to a local Ollama LLaVA 13B model, and translated the response into button presses via Arduino Micro. The core problem: Cecil wandered without purpose. The agent had no spatial memory, no sense of where it was, and no understanding of what the game was asking it to do between frames. This session was about fixing that at the architecture level.
// artifacts built
Four Files, Ground Up
The session produced four new artifacts — a complete replacement of WE9, not a patch on top of it.
WholeEnchilada10.py
Main agent loop. Replaced Ollama with Claude Vision API. Full context injection per location.
game_literacy.json
10-location map of Baron Castle. Exits, NPCs, triggers, interactables — built from live narration.
vision_heuristics.md
Visual signatures for game states. Dialog box geometry corrected from prior design.
decision_tree.md
Human decision logic encoded as priority order: dialog → battle → trigger → NPC → explore.
// architecture
Key Changes in WE10
Claude Vision API replaces LLaVA 13B. The local model produced unreliable structured JSON — WE10 drops Ollama entirely and routes vision calls directly to claude-opus-4-6. Reliability for the action schema is non-negotiable.

OpenCV pre-filter for dialog detection. Before making any API call, the Python loop scans the top third of the screen for a white-bordered, dark-blue-filled rectangle. If found, it fires an A press immediately — no Claude call needed. Saves latency on the most common game interaction.

Location-aware context injection. game_literacy.json is loaded at startup. Every Claude prompt now includes the exits, NPCs, interactables, and story triggers for the agent's current detected location. The model is no longer flying blind.

Accordion summarization. Every 30 actions, the older portion of the action log compresses into a bullet summary. The full raw log would blow the context window — this keeps recent history sharp while preserving older narrative.

Blocked path memory. Confirmed dead ends are logged per location. The agent won't retry a wall it already found.
Claude Vision API OpenCV game_literacy.json Accordion Summarization Blocked Path Memory NPC Hint Re-injection
// live play discoveries
What the Game Taught Us
Baron Castle was narrated in real time to build the location database. These are the findings that changed how the agent is designed:
Dialog boxes are upper-center, not bottom. The prior design assumed bottom-of-screen placement (JRPG convention). FF4 puts them in the top third. The OpenCV pre-filter was rewritten around this.
Room labels flash on entry. A small text box appears briefly when entering a room. This is the most reliable room identity signal — faster than parsing the background tileset.
Wall switches break tile pattern. Interactive switches are visually out of alignment with surrounding tiles. Press A when the pattern breaks.
Cecil can disappear behind scenery. If the map is still scrolling, he's still moving — keep pressing the direction. Don't mistake occlusion for a wall.
World map: explore all four directions first. Don't commit to a path on first exit. Narrate what's visible in each direction before choosing.
Some NPCs reveal exits after dialog. Talking to certain characters moves them aside, unlocking a previously blocked waypoint. NPC interaction is a navigation tool, not just flavor.
// game_literacy.json
Baron Castle — 10 Locations Mapped
The location database was built entirely from live gameplay narration this session. Each entry encodes exits (with visual descriptors), NPCs, interactables, story triggers, and navigation notes.
Throne Room
1F — Section A
1F — Section B
Baron Exterior East
Baron Exterior West
Gate Area
Cid's Area
NPC Room
Cecil's Bedroom
World Map
// current status
Ready to Run

WE10 is written and waiting. The agent code is complete, Baron Castle interior is fully mapped in game_literacy.json, and all four supporting artifacts are done. The Claude Vision API replaces LLaVA entirely. The OpenCV dialog pre-filter is in place.

Next session: narrate world map exploration, add waypoints and exit descriptors to game_literacy.json, then run WE10 for the first time on real hardware.

// hardware chain
Signal Path — Unchanged
SNES
OSSC 1.8
Elgato 4K
Python / OpenCV
Claude Vision API
Arduino Micro
SNES Controller Port

First Live Run & A* Pathfinding

collapse
// context
It Actually Ran
WE10 hit real hardware for the first time this session. The goal was simple: get the agent running end-to-end, observe what broke, fix it, and add A* pathfinding on top. Three bugs surfaced immediately — all diagnosed and patched mid-session. By the end, Cecil was navigating Baron Castle autonomously on a real SNES.
// bugs fixed
Three Fixes, One Session
Title screen not advancing. The stuck override was replacing A with a directional after just 2 frozen frames — but the title screen is static by design, so no_change_count hit 2 immediately and cancelled every A press before it could register. Fix: raised the override threshold from ≥ 2 to ≥ 5.
A and X buttons swapped in Arduino firmware. The in-game menu kept opening when the agent tried to interact with NPCs — the classic tell for a button mismatch. Arduino had pins 8 and 9 mapped in the wrong order. Fixed directly in the .ino firmware, then added a software safety layer: BUTTON_REMAP = {"A": "X", "X": "A"} in the Python press() function so Claude's logical button names stay correct regardless of firmware quirks.
START leaking through. Claude would occasionally return START despite the prompt explicitly forbidding it. Fix: removed START from VALID_BUTTONS entirely — physically impossible to send now.
Stuck Threshold ≥5 Arduino Pin Remap BUTTON_REMAP Safety Layer START Blocked
// new system
A* Pathfinding Added
Cecil's biggest failure mode was walking directly into walls and retrying endlessly. A* pathfinding was added to WholeEnchilada10.py to give the agent spatial awareness at the tile level — so when Claude picks a direction, the path is actually clear before the button gets pressed.
detect_game_region()
Scans frame for non-black bounding box at startup. Falls back to hardcoded OSSC 4x values if detection fails. Confirmed: 1920×1080, 64px per SNES tile, 16×14 grid.
build_walkability_grid()
Per-tile brightness >60 and Laplacian edge variance <500 = walkable. Dark tiles with high edge density = walls. Cecil's tile (col 8, row 7) always forced walkable.
record_collision() / apply_collision_map()
Screen doesn't change after a directional press → tile recorded as blocked. Persists to game_state.json per location_id across sessions.
astar()
Cecil is always grid center (8,7) — FF4's camera follows him. Claude picks a direction; A* finds a clear path 4 tiles that way, or reroutes to nearest walkable. Logs [PATHFIND] on reroutes.
// tooling
Live Debug Visualization
An OpenCV overlay window — "WE10 Pathfinding Debug" — renders on top of the live game frame every tick. Color key: green = walkable, red = blocked, yellow = Cecil, blue = A* planned path. Grid lines show tile boundaries. Not yet observed in a live run — that's the next session goal.
// confirmed from live run
What Actually Worked
Agent passed the title screen — A confirmed correct after the threshold fix.
Room label detection working — location context loading correctly from game_literacy.json.
Agent navigated freely inside Baron Castle and attempted NPC interaction.
A/X button confusion identified and fixed mid-session without a full restart.
// current status
Running. Pathfinding Untested.

WE10 runs end-to-end on real hardware. Cecil moves, the vision loop is live, room context loads correctly, and the three launch bugs are patched. Pathfinding code is written but the debug visualization hasn't been observed in a live run yet.

Next session: run with the debug window open, tune walkability thresholds for Baron Castle, verify A* is correctly reading walls vs floors, and watch whether collision learning actually reduces wall-bumping over time.

// hardware chain
Signal Path — Confirmed Specs
SNES
OSSC Kaico 1.8 (4x)
Elgato 4K60 (1920×1080)
Python / OpenCV
Claude Vision API
Arduino Micro (COM5)
SNES Controller Port

The Teacher Awakens

collapse
// pivot
A New Architecture
WE10 grew a second brain today. Instead of relying solely on Claude Vision API to figure out FFIV from scratch on real hardware, we introduced a parallel Speedy Teacher Model — a reinforcement learning agent running on emulated hardware that teaches the Vision student how to play. The teacher has ground truth. The student has eyes. Together they close the gap.
// teacher / student
Two Models, One Game
Speedy (Teacher) runs locally on the gaming PC via BizHawk emulator. It reads raw SNES WRAM directly — HP, position, map ID, story flags, terrain adjacency — learns optimal play through self-play at machine speed, and transmits knowledge to Vision via a custom intermediate language called GSL.
Vision (Student) continues running on real 1991 SNES hardware via the existing pipeline. It receives enriched context from Speedy before each decision. Vision still owns screen interpretation. Speedy owns game knowledge.
BizHawk WRAM
GSL Message
SpeedyNet (CUDA)
ACT: Response
BizHawk Joypad
// custom language
GSL — Game State Language
Not built for humans — built for AI. Token-efficient, semantically dense, zero ambiguity. Every 30 frames BizHawk pushes a 4-line GSL block over TCP to Python. Python responds with a single action string. BizHawk executes it.
@S — State
Map ID, plane, overworld flag, X/Y position, facing, vehicle, terrain type, all 4 adjacent tiles.
#N — Navigation
Story progress, map transition flag, movement flag.
%P — Party
All active party members: name, level, HP/MP ratio, status flags.
!F — Flags
Battle, dialog, menu, cutscene — binary game state flags for fast decision branching.
// neural network
SpeedyNet — 52,493 Parameters
A custom policy-gradient network running on RTX 4070 Ti via PyTorch CUDA. Lean but not thin — enough capacity to learn navigation, battle awareness, terrain reading, and story progress simultaneously.
Input (40)
Dense 256
Dense 128
Dense 64
Output (12)
LeakyReLU Policy + Value Heads Experience Replay 50K RTX 4070 Ti CUDA
// reward model
Custom Reward Table
Designed from scratch — no borrowed assumptions. The reward signal reflects exactly what good FFIV play looks like.
+250
Enter a boss room (one-time)
+100
Enter any new map ID (one-time)
+10
Move toward unvisited exit
+1
Net forward movement
−1
Revisit an already-stepped tile
−5
Standing still for 3+ ticks
−50
Party wipe → auto reset to save state
// walkthrough rag
355 Sections Indexed
Just like a real player Googling "what do I do next in FFIV" when stuck, Speedy queries a RAG-indexed walkthrough when it detects reward stagnation. 355 sections parsed from General Tips through end-game — location names, party requirements, key items, boss strategies. Read-only. The walkthrough speaks for itself.
// first training run
It's Training Live
By end of session Speedy was training on the RTX 4070 Ti with Cecil walking around Baron Castle. The loss curve is active, epsilon is decaying, and the reward signal is firing correctly.

[train] step=760 | loss=82.15 | eps=0.684 | reward=−6 | revisit_tile + standing_still

Still deep in random exploration at epsilon 0.68. The reward loop is stuck on revisit and standing still penalties — Speedy hasn't discovered that moving to new tiles feels good yet. Hold frames tuned from 8 → 15 and standing still threshold tightened from 10 → 3 ticks to force more decisive movement per action.

// session screenshot — step 760, ε=0.684, baron castle
Speedy training session - step 760
Step 760 ε = 0.684 16 Tiles Visited Weights Saving
// current status
Teacher is Awake. Learning to Walk.

Full pipeline is live: BizHawk → GSL → SpeedyNet (CUDA) → ACT response → joypad injection. The network trains every 10 steps, saves weights every 500, and resets automatically on party wipe.

Next session: wire in BizHawk save state control for autonomous resets, uncap emulator speed for self-play at machine speed, and begin distilling Speedy's learned knowledge into GSL teaching signals for the Vision student.

Only One Ear Was Listening

collapse
// wrong assumption
The Architecture Was Backwards
The session started with a broken assumption baked into the architecture. The original setup had Python acting as the TCP client and BizHawk as the server — which is backwards. BizHawk's Lua API exposes comm.socketServerSend() and comm.socketServerResponse(), meaning BizHawk is always the client. It connects out to whatever is listening. Getting BizHawk to open the connection at all required passing both URL flags at launch — get and post independently:
start "BizHawk" EmuHawk.exe --url-get=http://127.0.0.1:9001 --url-post=http://127.0.0.1:9001
// still broken
Empty Strings. Syntax Error. Session Stalls.
Even with the flags, comm.socketServerResponse() kept returning empty strings on every call. A protocol fix was queued in Lua — but a syntax error at line 154 in speedy_sender.lua blocked testing before it could be validated. The raw TCP socket approach was fighting us at every layer: timing races, two-port juggling, response polling that never quite landed. It was the wrong tool for what we were actually doing.
// the real fix
Throw Out the Sockets. Use HTTP.
The breakthrough was stepping back and reading the BizHawk Lua API properly. comm.httpPost(url, body) exists. It's synchronous. BizHawk POSTs the GSL payload, blocks until Python responds, gets the action back in the response body. No socket timing. No polling loop. No two-port juggling. Flask on port 9001 listening at /act — that's the whole server.
// speedy_sender.lua
local URL = "http://127.0.0.1:9001/act"
-- synchronous: BizHawk blocks until Python responds
local ok, response = pcall(comm.httpPost, URL, gsl)
// speedy.py
@app.route("/act", methods=["POST"])
def act():
    state = parse_gsl(request.get_data(as_text=True))
    return process_state(state), 200
BizHawk doesn't need --socket_ip or --socket_port launch flags for this. Load the ROM, run the script. comm.httpPost handles the rest.
Flask is a lightweight Python web framework — it lets you spin up an HTTP server in a handful of lines, mapping URL routes to Python functions. Here it meant we could replace the entire custom socket protocol with a single decorated function: BizHawk POSTs to /act, Flask calls process_state(), returns the action string. No handshake logic, no buffer management, no read/write timing to get right. HTTP is a solved problem. Flask just exposes it. That's exactly why it helped — we stopped writing plumbing and got back to writing the actual model.
// it works
Cecil Is Moving. The Model Is Learning.
With speedy.py and speedy_sender.lua running against each other over HTTP, the loop closed. BizHawk reads WRAM every 30 frames, builds a GSL payload encoding position, terrain, party state, and story flags, POSTs it to Flask, gets an ACT:DIRECTION back, holds the button for 30 frames, repeats. SpeedyNet is training. Epsilon decays from 1.0 as the replay buffer fills. Loss is stabilizing. Cecil is on screen and moving under model control.

Architecture: HTTP POST/response — fully synchronous, no race conditions. ✓

Training: SpeedyNet live — replay buffer filling, loss stabilizing, epsilon decaying. ✓

Cecil: moving on screen under model control. ✓

Next session: wire in save state control for autonomous resets, uncap emulator speed for self-play at machine speed.

// proof
Cecil On Screen
This is what it looks like when it works. SpeedyNet pushing inputs, BizHawk executing them, Cecil moving. First live run under model control.
// next entry
SPEEDY — SAVE STATE CONTROL & UNCAPPED SPEED