Spend governor for AI agents
Stop your agent before it burns the budget.
An AI agent runs in a loop, deciding its own next step. It can loop longer, cost more, or repeat itself with no natural place to stop. leash sits in front of its model calls and enforces a hard stop on a limit you set - and the accounting survives a crash, so a restart can't reset the budget.
one binary // no code change in the simplest mode // Go 1.25+, no C toolchain
The problem
Loops don't know when to quit.
Three gaps leash is built to close. If you've watched a token bill or a job run longer than you meant, you've met at least one of them.
A step is not a flat price
Each turn can call a bigger model, send a longer context, or spawn more work. Cost per step is not constant, and a stuck agent can keep paying to get nowhere.
"Max steps" resets on restart
A framework's step limit lives in memory. Crash the process and start it again and the counter is back to zero - the budget silently reopens.
Provider caps are per-key
A spending cap on an API key is a monthly total across everything, reported after the fact. It can't stop one runaway run in the moment.
What it enforces
Six boundaries, one fixed order.
You set the limits; leash checks them before every call and refuses the next one the moment the first boundary trips. A zero value turns a boundary off; the kill switch is always on.
The refusal is an ordinary HTTP 429 with a machine-readable reason, so the agent's own loop ends because its next call fails. Token counts come only from what the provider reports on the wire - never estimated.
How to use it
Three front doors, one engine.
Same governor behind each. Pick the one that fits how your agent runs.
Put leash before your command
Zero code change. leash launches your program as a child and points its SDK's base URL at an embedded proxy, so every model call is governed.
Run it as a gateway
A standalone proxy for any language, CI, or a shared team endpoint. Point a client's base_url at leash and tag each run with a header. Auth is on by default, and each run is scoped to its credential.
$ leash serve --listen :8088 --max-cost 20
See and stop runs
The ledger is readable from any terminal. List active runs, read one run's history, or stop a runaway - it takes effect on its next call.
$ leash kill nightly-batch-7
Examples
Copy, paste, and watch it stop.
Concrete recipes for the common cases. Each shows the command and, in one line, what happens.
The 60-second demo
A std-lib fake provider plus a curl loop under a 10-cent budget. Nothing real is billed.
# fake provider + your own prices $ go run ./examples/fakeupstream & $ echo '{"demo-model":{"input":10,"output":30}}' > prices.json $ leash --max-cost 0.10 --prices prices.json \ --upstream http://127.0.0.1:9099 -- ./agent.sh call 1 -> 200 call 4 -> 200 call 5 -> 429 ... stays stopped
Guard a Python agent
A dollar budget and a wall-clock deadline around any script. No SDK to adopt.
$ leash --max-cost 5 --deadline 15m \ --prices prices.json -- python agent.py ... agent runs, every call metered ... leash: stopped run a3f9 after 18 calls, $4.10 tokens + $0.91 compute = $5.01 (cost_budget) $ echo $? 3
Cap a loop with no cost meter
When you have no prices, lean on calls, token rate, and repetition instead.
$ leash --max-calls 500 \
--rate 200000/1m \
--stall 4 -- ./agent.sh
Shared gateway, per-run budgets
One proxy in front of many agents. Each run carries its own X-Loop-Id and its own budget.
$ export LEASH_AUTH_TOKEN=$(leash gen-token) $ leash serve --listen :8088 --max-cost 20 \ --prices prices.json --require-run-id # the client, any language: $ curl http://localhost:8088/v1/chat/completions \ -H "Authorization: Bearer $OPENAI_API_KEY" \ -H "X-Leash-Token: $LEASH_AUTH_TOKEN" \ -H "X-Loop-Id: nightly-batch-7" \ -d '{"model":"gpt-4o", ...}'
Point an SDK at it
leash speaks the OpenAI, Anthropic, and Gemini wire formats, so only the base URL changes. It keys on the format, not the model name, so any OpenAI-compatible endpoint works too - Gemini, Ollama, OpenRouter, and the rest - and a new model needs no update.
# Python, OpenAI SDK client = OpenAI( base_url="http://localhost:8088/v1", api_key="sk-...", # forwarded untouched default_headers={"X-Loop-Id": "run-42"}, )
Pick a budget back up
Name a run and reuse the name against the same database to continue its account.
$ leash --run nightly --db ./run.db \ --max-cost 5 -- python agent.py # later, same name + db: the budget # continues where it left off, even # after a crash. It never re-spends.
Run the gateway in a container
A small distroless image that runs as a non-root user, with the ledger on a volume.
$ make docker # distroless, nonroot $ docker run -p 8088:8088 -v leash-data:/data \ leash:dev serve --db /data/leash.db --max-cost 20 # or the no-key demo, gateway + fake upstream: $ docker compose up --build
Read the ledger, anytime
Every run's account is on disk. Inspect it live, in a table or as JSON.
$ leash ps RUN CALLS TOTAL$ STATUS REASON nightly-batch-7 18 5.01 stopped cost_budget api-eval-3 6 1.20 running $ leash inspect nightly-batch-7 --json
Escalate durably when it stops
Turn a stop or a budget warning into a webhook and a command hook that survive a crash and retry until they land.
$ leash serve --max-cost 20 \ --webhook https://hooks.example/leash \ --reactions-db ./reactions.db \ --on-event-exec ./on-event.sh # off the hot path, retried, resumed after a restart: # notify-webhook -> run-command-hook # event data arrives in LEASH_* env vars
Where it fits
Not the same as the caps you already have.
Framework max-steps
Bounds the number of loop iterations, in memory, for this process.
Provider key cap
A monthly spend total on an API key, across every run and app that uses it.
leash
Durable per-run accounting that a restart can't reset, enforced in the moment, on your terms.
Get started
Three ways to install.
A single static binary. Grab a release, use the Go toolchain, or build from a checkout.
Prices are yours to supply with --prices; leash ships none and never estimates. With a cost budget and no prices the meter is blind, so leash fails closed by default - refusing what it can't price (set --on-blind=warn to only warn and lean on calls, deadline, rate, and kill).
Every release is cosign-signed with an SBOM and SLSA build provenance, so you can verify a binary or the container image before you run it. One external dependency; a standard-library core.
github.com/sylvester-francis/leash/releases
$ cd leash && make build # -> ./leash