A Journey Toward Paranoid Physical Agents

The "Simple" Problem

One of my favorite things in engineering are the "simple problems that will be solved before dinner." These are the problems that end up becoming multi-year journeys. Every time I’ve fallen into one of these rabbit holes, it has forced me to confront reality at a fundamental level I hadn't considered before.

Years ago, I set out to solve one of these: build a simple 4-wheeled platform that could use a single camera to navigate from a charging station, roam the house to take pictures, and return to base autonomously. This was 12 years ago; long before you could just buy a vacuum that does this for $300. I wanted a visual report of my home while I was away.

Naturally, I chose to build rather than buy.

Step 1: Gridify

For simple navigation automation, this is always my first instinct. I created a virtual floor plan made of one-square-foot tiles. Just enough room for the platform to turn with a little margin for error.

Step 2: Plot a Path

Finding an optimal path using Dijkstra’s Algorithm or A* is nothing new. Since my problem space was small, I plotted the route using logic not much different from the Logo Turtle Graphics that I was taught as a kid:

Forward 5 (from base)
Turn left 90°
Forward 10 (through the door)
Turn right 90°
Forward 10
Take picture
(Reverse actions to return to base)

Step 3: Fail Every Time

The rover never made it back to the base. Sometimes it didn't even make it through the first door. I couldn't get it to traverse a single room, let alone the whole house. By the time I gave up, it was well past dinner.

I ended up using nursery cameras I found at a Goodwill to monitor the house. It worked, but it wasn't the point. I threw the autonomous problem onto the "pile of things to think about" and let it sit for a decade.

The Non-Simple Problem: Survival is Paranoid

Looking back, my original approach failed because it was optimistic.

It assumed a command was the same as a result.
It had zero observability (it didn't know where it actually was).
It had no way to adapt when it inevitably drifted off course.

It reminds me of a book I was given when I started at Intel: Only the Paranoid Survive by Andy Grove. While Grove was talking about business strategy and "Strategic Inflection Points," the message is a universal law of nature. In a chaotic world, a rigid plan is a death sentence.

Side note: Between 2007 and 2024, it seems Intel itself forgot about that book.

Lessons from Mars: Visual Odometry

NASA’s Mars rovers don't just trust their wheels. The Curiosity rover famously has a specific pattern of holes in its metallic treads. As it drives, it leaves an imprint in the Martian sand that spells out "JPL" in Morse code. By looking back at those tracks with its cameras, the rover can see exactly how much it slipped.

This is Visual Odometry. It turns a trust-based system into a verification-based system.

This makes sense when we think about how we move or how an animal moves. In nature this is called proprioception when combined with external stimuli. An animal isn't just "moving its legs"; it is constantly checking its environment to see if those leg movements actually resulted in progress. In a way we are all paranoid to some degree every time we move around the world.

Adaptability: Policy over Plan

The final piece of the puzzle clicked for me recently while taking a Masters course with a beautifully long name: Probability, Search, and Reasoning Under Uncertainty.

The shift in thinking was moving from Plans to Policies.

In nature, a squirrel doesn't have a frame-by-frame "plan" to get to a nut. If a dog barked or a branch broke, a rigid plan would fail. Instead, the squirrel has a Policy: a set of rules that tells it the best move to make from any state it finds itself in. If it gets "kidnapped" and moved five feet to the left, it doesn't freeze; it just looks at its new state and follows the policy.

In AI, we implement this using a Markov Decision Process (MDP). We assign a "Value" to every single tile on the floor.

Walls and Obstacles: Low/Negative scores.
The Goal (Charging Station): Maximum score.

By iterating over the space using the Bellman Equation, we create a "gradient" of value:

$V(s) = \max_{a} \left( R(s,a) + \gamma \sum_{s'} P(s'|s,a) V(s') \right)$

The agent no longer needs a line to follow. It simply looks at the tiles around it and moves toward the one with the highest value. If a gust of wind (or my son) pushes it off course, it doesn't matter. It just looks down, asks "Where am I now?", and takes the next best step.

Conclusion: The Journey to Autonomy

Autonomy is hard because reality is messy. Transitioning from a wheeled platform on a flat tile floor to a quadruped in the woods is the jump from "predictable" to "chaotic."

This is the core of what I'm exploring at Jaybird Labs. I’m moving my personal area of research into the "Reality Gap". The space between a perfect simulation and a physical agent that has to survive the real world.

Explore the Lab

I’ve started adding simulation experiments to my portfolio. The first one models the very thing I just discussed in a slightly different form: a drone trying to navigate a path using a rigid A* plan versus a resilient MDP policy in a windy environment.

You can run the simulation yourself here: Jaybird Labs: A vs. MDP Experiment*

Building paranoid agents is the only way to ensure they survive. Thanks for joining me on the journey.