Which models can I use?

Any model on your provider account. We're a proxy — we don't host models. If your Anthropic key works with Claude Opus 4.7, prxy works with Claude Opus 4.7. Same for OpenAI (GPT-5, o4), Bedrock (Claude, Nova, Llama, Mistral, Cohere), and Google Gemini via direct API. Use any model id your provider accepts.

Is this Anthropic SDK compatible?

Yes — drop-in. Set ANTHROPIC_BASE_URL=https://api.prxy.monster/v1, keep your real ANTHROPIC_API_KEY, add a PRXY_KEY for our pipeline. Existing code keeps working. OpenAI SDK works the same way via OPENAI_BASE_URL. Bedrock SDK via boto3 endpoint override.

Do you mark up my provider bill?

No. We never touch your provider invoice. Anthropic, OpenAI, AWS bill you directly at their list rates. We bill separately for requests through our gateway. Caching and MCP optimization show up as savings on YOUR provider bill, not ours — that's the point.

What does composable mean here?

Every request flows through a configurable pipeline: mcp-optimizer, semantic-cache, patterns, cost-guard, ipc (infinite context). You toggle modules per API key via the PRXY_PIPE env var. Same primitives across cloud + self-hosted.

Why pay you when I can call Anthropic directly?

Because the modules pay for themselves. Average customer sees ~23% cache hit rate (those calls cost zero provider tokens) and 90% reduction in MCP tool overhead (smaller tool catalogs sent to the model). On a $500/month Anthropic bill, that math works out to ~$140/month saved on YOUR provider bill — for a $20 prxy subscription. Self-host is free if you'd rather run the modules yourself.

prxy-monster-local, prxy-module-sdk, and prxy-cli are MIT-licensed on npm. Install: npm i -g prxy-cli. Self-host the entire pipeline on your own infrastructure for free. The hosted control plane and paid modules are closed source.

LIVE · api.prxy.monster · BYOK

Your agent stops forgetting.
Your bill stops climbing.

prxy.monster sits in front of Claude, OpenAI, or Bedrock and runs every request through twelve modules that fix the things that keep breaking. Bring your own provider key. One env var. Zero code change.

Start free → Read the docs →

Free tier · BYOK · MIT self-host · Cancel anytime

          -ANTHROPIC_BASE_URL=https://api.anthropic.com
          +ANTHROPIC_BASE_URL=https://api.prxy.monster
          +PRXY_KEY=prxy_xxx        # your Anthropic key still does the inference
        

Modules run before every call · optimize, cache, remember, and cap spend

What broke	Where	The module
Auto-compaction regression dropping user intent mid-session	Issue #36068 · Mar 19, 2026	Compaction Bridge
MCP tool definitions burning 67K–143K tokens before you type	Apideck post · Mar 17, 2026	MCP Optimizer
Uber's $3.4B AI budget exhausted by April	CTO disclosure · Apr 15, 2026	Cost Guard
Claude Code v2.1.89 → 3–50× faster rate-limit drain	March 2026 release	Semantic + Exact Cache
Quiet Max-only pricing test on Claude Code	Apr 22, 2026	MIT self-host
Context rot after ~2 hours of session	Widely reported · Apr 2026	IPC + Rehydrator

What you actually get

It's a proxy.
That's it.

You send a request to api.prxy.monster with your existing Anthropic, OpenAI, or Bedrock key. The request flows through your configured module pipeline — caching, MCP optimization, pattern injection, cost guards — then hits your provider with your key. The response comes back the same way. Same wire format you already use.

1 · You send

curl -X POST https://api.prxy.monster/v1/messages \
  -H "Authorization: Bearer $PRXY_KEY" \
  -H "X-Provider-Key: $ANTHROPIC_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claude-sonnet-4-6",
    "max_tokens": 200,
    "messages": [{"role":"user","content":"Hello, prxy."}]
  }'

SDK drop-in: just swap ANTHROPIC_BASE_URL

2 · The pipeline runs

→ mcp-optimizer       # prune tool defs to what this request needs
→ semantic-cache      # similar requests return cached
→ patterns            # inject relevant past solutions
→ cost-guard          # short-circuit if budget breached
→ your provider       # using your key, billed to your account

Toggle modules per key via PRXY_PIPE

3 · You get back

Standard Anthropic / OpenAI response shape. SDKs work unchanged. Usage attribution per-request, scoped to your account.

Cache hit? Returns instantly with zero provider tokens consumed.

~23% of real-workload calls return cached

prxy.monster does not bill you for tokens. Your provider bills you for tokens. We bill you for the gateway and the module pipeline. We never mark up inference.

Not an inference provider. Not a web proxy. Not a VPN. Not prxy.com.

Featured modules

Each module fixes a specific thing that broke.

Compaction Bridge context · MIT · cloud + local

Survives the auto-compaction regression in #36068. Re-injects user intent on every compaction boundary so your agent doesn't drop the thread mid-session.
MCP Optimizer optimization · MIT · cloud + local

The 67K-tokens-of-MCP problem. Scores each tool against the request, ships only the relevant ones. ~90% reduction in tool overhead before the model sees your prompt.
Patterns — Golden Loop injection · MIT · cloud + local

Sessions don't have to start from zero. Injects relevant past solutions into the system prompt. Forges new patterns from successful resolutions. Compounds over time.
Semantic Cache cache · MIT · cloud + local

Repeat questions don't repeat costs. Embeds the request, replays the cached response above similarity threshold. Real workloads see ~23% hit rate.
Cost Guard safety · MIT · cloud + local

Uber's $3.4B AI budget exhausted by April. Per-key, per-day, per-month USD ceilings. 429 before the bill blows. Stops runaway agents in their tracks.

See all 12 modules →

vs the field

Other proxies route.
prxy.monster remembers.

	prxy.monster	OpenRouter	Portkey	Helicone	LiteLLM
Touches your provider bill	NEVER	YES — markup	YES — resold	YES — proxied	N/A
Per-request pricing (not per-token)	YES	NO	NO	NO	N/A
Free tier with all base modules	YES — 1k req/mo	NO	NO	limited	YES — self-host
Multi-provider routing (BYOK)	YES	YES	YES	YES	YES
MCP token optimization	YES	NO	NO	NO	NO
Infinite context (compressed)	YES	NO	NO	NO	NO
Pattern learning across sessions	YES	NO	NO	NO	NO
Semantic cache	YES	NO	YES	NO	NO
Self-host (MIT/Apache)	YES — MIT	NO	YES — Apache 2.0	YES	YES
Composable modules	YES	NO	NO	NO	NO

Most gateways are routers. prxy.monster is the modules.

Deploy

Cloud or local. Same modules.

Cloud

api.prxy.monster

Hosted gateway. Zero ops. Account-scoped memory and cache.

API keys, usage, billing, and team workflows managed for you
Patterns + cache available across your workspace
Bring your own provider key or use configured provider routes
Cost guards and hard limits available per key

Local

prxy-local

Single local gateway. Private data volume. MIT licensed.

Runs on your laptop, your homelab, your VPS
No telemetry. Patterns stay on your machine.
Same composable modules as cloud
BYOK to whichever providers you trust

Self-deploy

Private cloud

Dedicated deployment for teams that need their own account boundary.

Your data stays inside your controlled environment
Dedicated provider routing and policy controls
Custom domain, workspace policy, and support path
Best for regulated teams and sensitive codebases

What it costs
to run your stuff
through the monster.

Requests, not tokens. Your provider already charges you per token — we don't double-dip.

prxy_FREE

$0 forever

1,000 requests / month

Bring your own provider keys
All base modules (cache, mcp-optimizer, cost-guard)
Account-scoped patterns
Self-host unlimited — MIT-licensed
Community Discord

Start free →

prxy_PRO

$20 / month

100,000 requests / month · then $0.20 per 1k

Everything in Free
Pattern memory shared across your projects
Custom pipeline configs per API key
MPP merchant endpoint (agents pay you per call)
Email support

Go Pro →

prxy_TEAM

$99 / month

1,000,000 requests / month · then $0.10 per 1k

Everything in Pro
Custom modules (your code in our pipeline)
Per-seat attribution + spend caps
SSO + audit log
Priority support

Go Team →

One request = one HTTP call into our gateway. Streaming counts as one. Cached hits count as one. Failed-upstream calls don't count. Your provider bill (Anthropic, OpenAI, Bedrock) is paid directly to them at their list rates — we never see it.

Common questions

Frequently asked.

Is this an Anthropic SDK drop-in?

Yes. Set ANTHROPIC_BASE_URL=https://api.prxy.monster and provide your Anthropic key as a header. Existing code keeps working. OpenAI SDK works the same way via OPENAI_BASE_URL.

Whose key pays for the tokens?

Yours. prxy.monster is BYOK — every request hits your provider on your account. We never touch your provider bill and never mark up inference. We charge for the module pipeline only.

What's the module pipeline do?

Every request flows through configurable middleware: mcp-optimizer, semantic-cache, exact-cache, patterns, cost-guard, compaction-bridge, ipc, rehydrator, prompt-optimizer, tool-cache, router, guardrails. You toggle modules per API key via PRXY_PIPE. Same primitives across cloud + self-hosted.

How does pricing work?

Free tier covers 1,000 requests/month. Pro is $20/mo with 100K requests, then $0.20 per 1,000 overage. Team is $99/mo with 1M requests, then $0.10 per 1,000 overage. Provider tokens are billed by your provider account; prxy.monster does not mark them up.

Where does my data go?

Cloud: requests pass through the managed gateway, and patterns/caches are scoped to your account. Local: prxy-local runs on your machine with no telemetry; data lives in your local volume.

Open source?

prxy-monster-local, prxy-module-sdk, and prxy-cli are MIT-licensed on npm (npm i -g prxy-cli). Hosted control plane and any future paid modules are closed source.

Can agents pay per-call?

Agent-payment discovery is in private preview. API-key access is the production path today.

Cancel anytime?

Yes. Stripe customer portal, one click. No retention games. Your API key keeps working until the end of the billing period.

Your agent stops forgetting.
Your bill stops climbing.

Real incidents → real modules.

It's a proxy.
That's it.

1 · You send

2 · The pipeline runs

3 · You get back

Each module fixes a specific thing that broke.

Other proxies route.
prxy.monster remembers.

Works with what you already use.

Cloud or local. Same modules.

What it costs
to run your stuff
through the monster.

Try it
in 30 seconds.

Frequently asked.

Start
today.

Your agent stops forgetting.Your bill stops climbing.

Real incidents → real modules.

It's a proxy.That's it.

1 · You send

2 · The pipeline runs

3 · You get back

Each module fixes a specific thing that broke.

Other proxies route.prxy.monster remembers.

Works with what you already use.

Cloud or local. Same modules.

What it coststo run your stuffthrough the monster.

Try itin 30 seconds.

Frequently asked.

Starttoday.

Your agent stops forgetting.
Your bill stops climbing.

It's a proxy.
That's it.

Other proxies route.
prxy.monster remembers.

What it costs
to run your stuff
through the monster.

Try it
in 30 seconds.

Start
today.