Spend governor for AI agents

Stop your agent before it burns the budget.

An AI agent runs in a loop, deciding its own next step. It can loop longer, cost more, or repeat itself with no natural place to stop. leash sits in front of its model calls and enforces a hard stop on a limit you set - and the accounting survives a crash, so a restart can't reset the budget.

$ go install github.com/sylvester-francis/leash/cmd/leash@latest

one binary // no code change in the simplest mode // Go 1.25+, no C toolchain

leash - demo run, $0.10 budget

The problem

Loops don't know when to quit.

Three gaps leash is built to close. If you've watched a token bill or a job run longer than you meant, you've met at least one of them.

Runaway cost & time

A step is not a flat price

Each turn can call a bigger model, send a longer context, or spawn more work. Cost per step is not constant, and a stuck agent can keep paying to get nowhere.

Not durable

"Max steps" resets on restart

A framework's step limit lives in memory. Crash the process and start it again and the counter is back to zero - the budget silently reopens.

Too coarse, too late

Provider caps are per-key

A spending cap on an API key is a monthly total across everything, reported after the fact. It can't stop one runaway run in the moment.

What it enforces

Six boundaries, one fixed order.

You set the limits; leash checks them before every call and refuses the next one the moment the first boundary trips. A zero value turns a boundary off; the kill switch is always on.

kill

leash kill <run>

You stop it by hand, from anywhere.

deadline

--deadline 15m

Wall-clock since the first call.

cost

--max-cost 5

Token + compute dollars.

calls

--max-calls 100

Number of model calls made.

rate

--rate 100000/1m

Tokens in a trailing window.

stall

--stall 4

Same reply N times in a row.

The refusal is an ordinary HTTP 429 with a machine-readable reason, so the agent's own loop ends because its next call fails. Token counts come only from what the provider reports on the wire - never estimated.

How to use it

Three front doors, one engine.

Same governor behind each. Pick the one that fits how your agent runs.

1 - Wrap

Put leash before your command

Zero code change. leash launches your program as a child and points its SDK's base URL at an embedded proxy, so every model call is governed.

$ leash --max-cost 5 -- python my_agent.py

2 - Serve

Run it as a gateway

A standalone proxy for any language, CI, or a shared team endpoint. Point a client's base_url at leash and tag each run with a header. Auth is on by default, and each run is scoped to its credential.

$ export LEASH_AUTH_TOKEN=$(leash gen-token)
$ leash serve --listen :8088 --max-cost 20

3 - Inspect & control

See and stop runs

The ledger is readable from any terminal. List active runs, read one run's history, or stop a runaway - it takes effect on its next call.

$ leash ps
$ leash kill nightly-batch-7

Examples

Copy, paste, and watch it stop.

Concrete recipes for the common cases. Each shows the command and, in one line, what happens.

No key, no spend

The 60-second demo

A std-lib fake provider plus a curl loop under a 10-cent budget. Nothing real is billed.

# fake provider + your own prices
$ go run ./examples/fakeupstream &
$ echo '{"demo-model":{"input":10,"output":30}}' > prices.json
$ leash --max-cost 0.10 --prices prices.json \
      --upstream http://127.0.0.1:9099 -- ./agent.sh
call 1 -> 200   call 4 -> 200
call 5 -> 429   ... stays stopped

->Each call is 2.5c, so the budget trips on the fifth.

Tier 1 - wrap

Guard a Python agent

A dollar budget and a wall-clock deadline around any script. No SDK to adopt.

$ leash --max-cost 5 --deadline 15m \
      --prices prices.json -- python agent.py
... agent runs, every call metered ...
leash: stopped run a3f9 after 18 calls,
  $4.10 tokens + $0.91 compute = $5.01 (cost_budget)
$ echo $?
3

->On a boundary stop leash exits code 3, so a script can tell it apart from a normal failure.

Tier 1 - wrap

Cap a loop with no cost meter

When you have no prices, lean on calls, token rate, and repetition instead.

$ leash --max-calls 500 \
      --rate 200000/1m \
      --stall 4 -- ./agent.sh

->Stops at 500 calls, on a token-rate spike, or after 4 identical replies in a row.

Tier 2 - serve

Shared gateway, per-run budgets

One proxy in front of many agents. Each run carries its own X-Loop-Id and its own budget.

$ export LEASH_AUTH_TOKEN=$(leash gen-token)
$ leash serve --listen :8088 --max-cost 20 \
      --prices prices.json --require-run-id

# the client, any language:
$ curl http://localhost:8088/v1/chat/completions \
    -H "Authorization: Bearer $OPENAI_API_KEY" \
    -H "X-Leash-Token: $LEASH_AUTH_TOKEN" \
    -H "X-Loop-Id: nightly-batch-7" \
    -d '{"model":"gpt-4o", ...}'

->Different X-Loop-Id values get independent budgets, stops, and kills.

Tier 2 - serve

Point an SDK at it

leash speaks the OpenAI, Anthropic, and Gemini wire formats, so only the base URL changes. It keys on the format, not the model name, so any OpenAI-compatible endpoint works too - Gemini, Ollama, OpenRouter, and the rest - and a new model needs no update.

# Python, OpenAI SDK
client = OpenAI(
    base_url="http://localhost:8088/v1",
    api_key="sk-...",  # forwarded untouched
    default_headers={"X-Loop-Id": "run-42"},
)

->Your key is forwarded upstream untouched and never logged or stored.

Resume

Pick a budget back up

Name a run and reuse the name against the same database to continue its account.

$ leash --run nightly --db ./run.db \
      --max-cost 5 -- python agent.py
# later, same name + db: the budget
# continues where it left off, even
# after a crash. It never re-spends.

->Totals are rebuilt from a durable journal, so a restart can't double-count.

Docker

Run the gateway in a container

A small distroless image that runs as a non-root user, with the ledger on a volume.

$ make docker              # distroless, nonroot
$ docker run -p 8088:8088 -v leash-data:/data \
    leash:dev serve --db /data/leash.db --max-cost 20

# or the no-key demo, gateway + fake upstream:
$ docker compose up --build

->Add --admin :9090 for /healthz, /readyz, and Prometheus /metrics.

Observe

Read the ledger, anytime

Every run's account is on disk. Inspect it live, in a table or as JSON.

$ leash ps
RUN              CALLS  TOTAL$  STATUS   REASON
nightly-batch-7  18     5.01    stopped  cost_budget
api-eval-3       6      1.20    running

$ leash inspect nightly-batch-7 --json

->The ledger holds usage numbers and timestamps only - never bodies or secrets.

React

Escalate durably when it stops

Turn a stop or a budget warning into a webhook and a command hook that survive a crash and retry until they land.

$ leash serve --max-cost 20 \
      --webhook https://hooks.example/leash \
      --reactions-db ./reactions.db \
      --on-event-exec ./on-event.sh
# off the hot path, retried, resumed after a restart:
#   notify-webhook -> run-command-hook
# event data arrives in LEASH_* env vars

->At-least-once and crash-surviving. leash ships no connectors - the command hook reaches yours.

Where it fits

Not the same as the caps you already have.

Framework max-steps

Bounds the number of loop iterations, in memory, for this process.

Resets on restart. No dollars, no cross-process view.

Provider key cap

A monthly spend total on an API key, across every run and app that uses it.

Per-key, not per-run. Reported after the fact.

leash

Durable per-run accounting that a restart can't reset, enforced in the moment, on your terms.

Per-run. Survives crashes. Stops the next call.

Get started

Three ways to install.

A single static binary. Grab a release, use the Go toolchain, or build from a checkout.

Prices are yours to supply with --prices; leash ships none and never estimates. With a cost budget and no prices the meter is blind, so leash fails closed by default - refusing what it can't price (set --on-blind=warn to only warn and lean on calls, deadline, rate, and kill).

Every release is cosign-signed with an SBOM and SLSA build provenance, so you can verify a binary or the container image before you run it. One external dependency; a standard-library core.

# prebuilt binary (linux / mac / windows)
github.com/sylvester-francis/leash/releases

$ go install github.com/sylvester-francis/leash/cmd/leash@latest

$ git clone github.com/sylvester-francis/leash
$ cd leash && make build # -> ./leash