prxy.monster sits in front of Claude, OpenAI, or Bedrock and runs every request through twelve modules that fix the things that keep breaking. Bring your own provider key. One env var. Zero code change.
Free tier · BYOK · MIT self-host · Cancel anytime
Modules run before every call · optimize, cache, remember, and cap spend
Built for what broke this month
| What broke | Where | The module |
|---|---|---|
| Auto-compaction regression dropping user intent mid-session | Issue #36068 · Mar 19, 2026 | Compaction Bridge |
| MCP tool definitions burning 67K–143K tokens before you type | Apideck post · Mar 17, 2026 | MCP Optimizer |
| Uber's $3.4B AI budget exhausted by April | CTO disclosure · Apr 15, 2026 | Cost Guard |
| Claude Code v2.1.89 → 3–50× faster rate-limit drain | March 2026 release | Semantic + Exact Cache |
| Quiet Max-only pricing test on Claude Code | Apr 22, 2026 | MIT self-host |
| Context rot after ~2 hours of session | Widely reported · Apr 2026 | IPC + Rehydrator |
What you actually get
You send a request to api.prxy.monster with your existing Anthropic, OpenAI, or Bedrock key. The request flows through your configured module pipeline — caching, MCP optimization, pattern injection, cost guards — then hits your provider with your key. The response comes back the same way. Same wire format you already use.
curl -X POST https://api.prxy.monster/v1/messages \ -H "Authorization: Bearer $PRXY_KEY" \ -H "X-Provider-Key: $ANTHROPIC_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "claude-sonnet-4-6", "max_tokens": 200, "messages": [{"role":"user","content":"Hello, prxy."}] }'
SDK drop-in: just swap ANTHROPIC_BASE_URL
→ mcp-optimizer # prune tool defs to what this request needs → semantic-cache # similar requests return cached → patterns # inject relevant past solutions → cost-guard # short-circuit if budget breached → your provider # using your key, billed to your account
Toggle modules per key via PRXY_PIPE
Standard Anthropic / OpenAI response shape. SDKs work unchanged. Usage attribution per-request, scoped to your account.
Cache hit? Returns instantly with zero provider tokens consumed.
~23% of real-workload calls return cached
prxy.monster does not bill you for tokens. Your provider bills you for tokens. We bill you for the gateway and the module pipeline. We never mark up inference.
Not an inference provider. Not a web proxy. Not a VPN. Not prxy.com.
How it works
Replace one env var. Zero code changes. Every app, every framework, every model — it just works.
Every conversation forges patterns. Outcomes are tracked. Failures retire. Good solutions reinforce.
Patterns inject before each request. Context never resets. Your AI bill goes down over time.
Featured modules
Survives the auto-compaction regression in #36068. Re-injects user intent on every compaction boundary so your agent doesn't drop the thread mid-session.
The 67K-tokens-of-MCP problem. Scores each tool against the request, ships only the relevant ones. ~90% reduction in tool overhead before the model sees your prompt.
Sessions don't have to start from zero. Injects relevant past solutions into the system prompt. Forges new patterns from successful resolutions. Compounds over time.
Repeat questions don't repeat costs. Embeds the request, replays the cached response above similarity threshold. Real workloads see ~23% hit rate.
Uber's $3.4B AI budget exhausted by April. Per-key, per-day, per-month USD ceilings. 429 before the bill blows. Stops runaway agents in their tracks.
vs the field
| prxy.monster | OpenRouter | Portkey | Helicone | LiteLLM | |
|---|---|---|---|---|---|
| Touches your provider bill | NEVER | YES — markup | YES — resold | YES — proxied | N/A |
| Per-request pricing (not per-token) | YES | NO | NO | NO | N/A |
| Free tier with all base modules | YES — 1k req/mo | NO | NO | limited | YES — self-host |
| Multi-provider routing (BYOK) | YES | YES | YES | YES | YES |
| MCP token optimization | YES | NO | NO | NO | NO |
| Infinite context (compressed) | YES | NO | NO | NO | NO |
| Pattern learning across sessions | YES | NO | NO | NO | NO |
| Semantic cache | YES | NO | YES | NO | NO |
| Self-host (MIT/Apache) | YES — MIT | NO | YES — Apache 2.0 | YES | YES |
| Composable modules | YES | NO | NO | NO | NO |
Most gateways are routers. prxy.monster is the modules.
Plays nice
Same wire format as Anthropic and OpenAI. Most integrations are a single env var. Zero code change for the SDKs you already have wired up.
AI Coding Tools
SDKs
Frameworks
Deploy
Hosted gateway. Zero ops. Account-scoped memory and cache.
Single local gateway. Private data volume. MIT licensed.
Dedicated deployment for teams that need their own account boundary.
Requests, not tokens. Your provider already charges you per token — we don't double-dip.
prxy_FREE
$0 forever
1,000 requests / month
prxy_PRO
$20 / month
100,000 requests / month · then $0.20 per 1k
prxy_TEAM
$99 / month
1,000,000 requests / month · then $0.10 per 1k
One request = one HTTP call into our gateway. Streaming counts as one. Cached hits count as one. Failed-upstream calls don't count. Your provider bill (Anthropic, OpenAI, Bedrock) is paid directly to them at their list rates — we never see it.
Subscribe to Pro, get an API key by email, then paste this in your terminal. Same wire format as Anthropic, so any SDK works.
Common questions
ANTHROPIC_BASE_URL=https://api.prxy.monster and provide your Anthropic key as a header. Existing code keeps working. OpenAI SDK works the same way via OPENAI_BASE_URL.
mcp-optimizer, semantic-cache, exact-cache, patterns, cost-guard, compaction-bridge, ipc, rehydrator, prompt-optimizer, tool-cache, router, guardrails. You toggle modules per API key via PRXY_PIPE. Same primitives across cloud + self-hosted.
prxy-local runs on your machine with no telemetry; data lives in your local volume.
prxy-monster-local, prxy-module-sdk, and prxy-cli are MIT-licensed on npm (npm i -g prxy-cli). Hosted control plane and any future paid modules are closed source.
Create your account, choose a plan, and continue through Stripe Checkout. When payment succeeds, your prxy_ API key is provisioned and emailed automatically.