Labs

Can an agent learn the boundary?

Rules are not enough if an agent keeps violating them. Boundary Learning Score measures whether an agent adapts to enforced boundaries: fewer repeated blocked attempts, fewer stale retries, smaller requests, better use of required_next_action, and less operator noise.

The score does not prove an agent is safe. It measures behavior under a defined boundary challenge.

Why this matters

Why scoring matters

A blocked request is not only a failure. It is feedback. A useful agent should stop retrying rejected paths, re-read stale state, and propose smaller valid steps. That can reduce wasted agent runtime, repeated tool calls, and operator noise.

01

Fewer repeated mistakes

Agents should stop retrying paths the boundary already rejected.

02

Lower wasted runtime

Fewer blocked attempts and stale retries mean less wasted agent work.

03

Better operator attention

Humans should review real decisions, not repeated noise.

04

More useful feedback loops

The score makes boundary feedback visible across comparable runs.

Boundary Learning Score

How the score is built

Raw boundary events become opportunity-normalized scoring evidence.

Raw counts are not enough. A harder run can contain more chances to make mistakes. Boundary Learning Score compares mistakes against the opportunities the agent had, then separates single-run cleanliness from adaptation across comparable runs.

Raw boundary events

The base evidence.

stale staterepeated retryprotected scopetoo much scopewasted readsbroad impact without current state

Opportunity profile

How many chances did the agent have to make each kind of mistake?

state opportunitiesretry opportunitiesimpact opportunitiesscope opportunitiesobservation opportunities

Component scores

Raw signals are grouped into explainable score components.

state freshnessretry disciplineimpact disciplinescope disciplineobservation efficiencycompletion efficiency

Run Boundary Fitness

A single-run view of how cleanly the agent behaved.

component-weighted resultsingle-run cleanlinessfeeds adaptation signal

Challenge Adaptation Signal

Did behavior improve across comparable runs?

improvingmixedregressinginsufficient data

Score Confidence

How reliable is this comparison?

normalpreliminaryinsufficient data
Raw counts are not enough. A harder run may contain more decisions, so the score compares mistakes against the opportunities that existed.

What the score measures

Measure whether agents adapt to rules instead of repeatedly violating them.

01

Repeated blocked attempts

Does the agent keep trying the same rejected action?

02

Stale retries

Does the agent act on old state, or does it re-read before retrying?

03

Better-scoped requests

Does the agent reduce blast radius and propose smaller, valid steps?

04

Required next action

Does the agent follow feedback such as required_next_action before trying again?

05

Operator attention

Does the agent reduce noise so humans can focus on real decisions?

Guardrails

What this does not claim

  • Does not prove that an agent is safe
  • Does not prove that generated work is correct
  • Does not replace human review
  • Measures behavior under a defined boundary challenge

Planned lab

MazeRunner is one possible score lab.

MazeRunner is one planned lab for this score: movement becomes impact, and every move must pass through a boundary decision.

Boundary challenge

Build your own boundary challenge.

Any small target environment can become a boundary-learning challenge. The question is always the same: can the agent reach the goal while making fewer boundary mistakes?

impact doorprotected repository pathsscoped database operationinternal approval workflowlocal tool execution