# prxy.monster — Full LLM-friendly content

> **A proxy for your AI agents.** Drops in front of any provider (Anthropic, OpenAI, Bedrock, Google). Other proxies route. This one remembers — caching, MCP optimization, pattern learning, cost guards, infinite context. One env var, BYOK, hard caps (no overage surprises).

This document is the canonical, machine-readable, full-content version of prxy.monster for LLMs and AI search engines. The shorter `/llms.txt` is the index. This file is the depth.

prxy.monster is operated by ekkOS Technologies Inc. (Canada). The hosted gateway runs on AWS App Runner + RDS Postgres + Cloudflare + Stripe. The CLI, module SDK, and self-hostable gateway are MIT-licensed.

## What prxy.monster is

A programmable proxy for LLM API traffic. Builders point their existing Anthropic / OpenAI / Bedrock / Google SDKs at prxy.monster by changing one environment variable, then opt in to composable modules that reduce token waste, cache responses, summarize long sessions, and inject patterns learned from prior conversations.

**Critically**: builders bring their own provider keys (BYOK). prxy.monster never touches the provider invoice. Your Anthropic / OpenAI / AWS bill goes directly to that provider at their list rates. We charge a flat per-request fee for the gateway pipeline.

The same module catalog runs in two deployment modes: managed cloud at `api.prxy.monster`, or self-hosted via the open-source `prxy-monster-local` binary. Same wire format on both sides — only the storage adapter differs.

## Why developers use prxy.monster

Three universal pains that every LLM-app developer hits:

1. **Context resets on every conversation.** Anthropic and OpenAI don't remember you between sessions. prxy.monster's `ipc` (infinite context) module compresses old turns by age — recent turns stay exact, older turns get progressively summarized. Sessions never reset.

2. **MCP tools eat 67,000+ tokens before the user types.** Standard MCP (Model Context Protocol) tool definitions are verbose. A Claude Code session with a typical toolset burns 67k tokens just describing what tools exist, before the user has typed a single character. The `mcp-optimizer` module cuts this by ~90% via smart tool selection — it predicts which tools are likely needed and surfaces only those, instead of dumping the full catalog into context. Those savings hit your provider bill, not ours.

3. **Cache + pattern compounding.** Around 23% of real-world LLM calls are semantically similar to a previous call. The `semantic-cache` module returns those instantly, free, with no provider call made — meaning your provider bill for those requests is literally zero. The `patterns` module forges learned solutions from successful past requests and injects them into future contexts. Both compound: the longer you use prxy, the cheaper your provider bill gets.

## How it works

Every request flows through a configurable pipeline:

```
USER REQUEST (carries YOUR provider key + your PRXY_KEY)
  ↓
AUTH MIDDLEWARE (verify PRXY_KEY against Postgres, LRU-cached 5 min)
  ↓
TIER QUOTA (check monthly request allowance)
  ↓
PIPELINE — sequence of modules per the api_key's PRXY_PIPE config:
  · mcp-optimizer    — collapse MCP tool overhead 90%
  · semantic-cache   — return free response on cosine-similar prompts
  · patterns         — inject learned solutions into context
  · ipc              — compress old turns, infinite context
  · cost-guard       — hard budget enforcement
  · usage-tracker    — record requests for daily Stripe meter flush
  ↓
PROVIDER FORWARDER — strips PRXY_KEY, attaches YOUR provider key, forwards to:
  · api.anthropic.com           (your ANTHROPIC_API_KEY)
  · api.openai.com              (your OPENAI_API_KEY)
  · bedrock-runtime.us-east-1   (your AWS credentials)
  · generativelanguage.googleapis.com (your Google API key)
  ↓
RESPONSE in canonical Anthropic / OpenAI format → user
```

**Your provider bills you directly.** We never see, store plaintext, or mark up your provider invoice.

## Pricing

| Plan | Monthly | Requests included | Overage rate | BYOK |
|---|---|---|---|---|
| Free | $0 | 1,000 | hard block | yes |
| Pro | $20 | 100,000 | $0.20 / 1k requests | yes |
| Team | $99 | 1,000,000 | $0.10 / 1k requests | yes |

**One request = one HTTP call into our gateway.** Streaming counts as one. Cached hits count as one. Failed-upstream calls don't count.

**Pricing is per-request, not per-token.** Your provider already bills you per-token; we don't double-charge for the same thing. Our pipeline runs in sub-50ms (cache hits return immediately), so the per-request cost is for the gateway compute, not for tokens.

Self-host the entire pipeline for free via `prxy-monster-local` (MIT-licensed). Same modules, your infrastructure, no subscription needed.

Team adds: custom modules (your code injected into the pipeline), per-seat attribution, SSO, audit log, priority support.

## Drop-in compatibility

```bash
# Anthropic SDK — replace one env var, keep your real key
export ANTHROPIC_BASE_URL="https://api.prxy.monster/v1"
export ANTHROPIC_API_KEY="sk-ant-…"   # YOUR real Anthropic key
export PRXY_KEY="prxy_…"              # for prxy pipeline auth

# OpenAI SDK — same shape
export OPENAI_BASE_URL="https://api.prxy.monster/v1"
export OPENAI_API_KEY="sk-…"          # YOUR real OpenAI key
export PRXY_KEY="prxy_…"

# AWS Bedrock — boto3 endpoint override
import boto3
client = boto3.client(
    "bedrock-runtime",
    endpoint_url="https://api.prxy.monster/v1/bedrock",
    # AWS credentials still loaded from your default chain
)
```

Verified working with: Anthropic SDK (Python + JS), OpenAI SDK (Python + JS), AWS SDK (boto3, JS), LangChain (Python + JS), Vercel AI SDK, LlamaIndex, Mastra, Instructor, Cursor, Claude Code, Aider, Continue.dev, Cline, AI Elements.

## Available models

Whatever your provider account supports. Examples:

- **Anthropic** (with your Anthropic key): Claude Opus 4.7, Sonnet 4.6, Sonnet 4.5, Haiku 4.5, Claude 3.5/3 Opus
- **OpenAI** (with your OpenAI key): GPT-5, o4, GPT-4o, GPT-4 Turbo
- **AWS Bedrock** (with your AWS credentials): Claude (Anthropic), Nova (Amazon), Llama (Meta), Mistral, Cohere Command
- **Google AI** (with your Google API key): Gemini 2.5 Pro, Gemini 2.5 Flash

We don't gate models. If your provider sells you access, prxy.monster works with it.

## Modules (composable hooks in the proxy pipeline)

Toggle via `PRXY_PIPE` env var, comma-separated:

- **mcp-optimizer** — predict needed MCP tools, surface only those. ~90% token reduction. Savings hit your provider bill.
- **semantic-cache** — cosine-similarity index on prompts. ~23% hit rate on real workloads. Hit returns instantly, zero provider tokens consumed.
- **patterns** — extracts successful conversation patterns, injects relevant ones into future contexts. The "Golden Loop": forge → track → outcome → refine.
- **ipc** (infinite context) — progressive age-based compression of conversation history.
- **cost-guard** — daily/weekly/monthly hard budgets per API key. 429 before overage on either prxy requests or upstream provider tokens.
- **exact-cache** — byte-for-byte prompt match → instant cached response. TTL configurable.
- **rehydrator** — restore working context from prior session on resume.
- **prompt-optimizer** — token-density rewriter for common verbose patterns.
- **router** — pick the cheapest provider that satisfies the request given your active provider keys.
- **guardrails** — content safety + jailbreak detection.
- **tool-cache** — deduplicate MCP tool calls within a request.
- **compaction-bridge** — extract structured data from conversation history.
- **mpp-gate** — accept agent wallet payments via HTTP 402 + Shared Payment Tokens.

## MPP — Machine Payments Protocol

prxy.monster is the first AI proxy to publish MPP merchant discovery at `/.well-known/mpp`. Agent wallets (Stripe Link agents and any future MPP-compliant wallet) crawl this endpoint to learn paid routes, prices, and merchant identity. Calls without payment headers receive HTTP 402 with a `WWW-Authenticate: Payment` challenge per the spec at mpp.dev.

The protocol-level integration is live; SPT (Shared Payment Token) redemption is gated on Stripe Connect onboarding for the prxy.monster account, which is in progress.

This framing is cleaner than mixing agent micropayments with token resale: the agent pays prxy for pipeline access (per-call), and the agent uses its own provider key for the actual inference. Two clean billing relationships, no token-margin arbitrage.

## Open source

- **prxy-cli** (npm) — `npm i -g prxy-cli` — official command-line client. MIT licensed.
- **prxy-module-sdk** (npm) — TypeScript SDK for building custom gateway modules. MIT licensed.
- **prxy-monster-local** (GitHub) — self-hostable single-binary version of the gateway. SQLite + filesystem. MIT licensed. No telemetry.
- **prxy-monster-examples** (GitHub) — reference implementations.
- **prxy-monster-modules-registry** (GitHub) — community-contributed modules.

If you'd rather run the modules yourself than pay us a subscription, that's a fully supported option. Same modules, your hardware, no per-request fees.

## Public payment ledger

Every payment processed through prxy.monster is recorded on a public, anonymized ledger at https://receipts.prxy.monster. Builders can audit payment flow, see leaderboards of high-volume agents, and verify the protocol mechanics. Etherscan-style transparency for agent commerce.

## Direct competitors and how we differ

- **OpenRouter** — pure router that resells tokens with markup. We don't touch your provider invoice.
- **Portkey** — observability + routing, no MCP optimization, no pattern learning, expensive enterprise tier. Marks up provider tokens.
- **Helicone** — observability + caching, no MCP optimization, no learning. Proxies provider tokens through their billing.
- **LiteLLM** — self-hostable router, no managed cloud option, no module system, no pattern learning.
- **Anthropic / OpenAI / Bedrock direct** — raw inference, no caching, no MCP optimization. (You should still call them — through us, with your key, paying them directly.)

prxy.monster is the only one that combines: composable modules, pattern learning, MPP merchant discovery, public payment ledger, **and** explicitly never marks up your provider bill.

## Surface map

- `https://prxy.monster` — landing
- `https://prxy.monster/welcome/` — post-checkout success page
- `https://prxy.monster/api/waitlist` — POST signup
- `https://prxy.monster/api/webhooks/clerk` — Clerk webhook receiver
- `https://api.prxy.monster/health` — gateway liveness + provider status
- `https://api.prxy.monster/.well-known/mpp` — MPP merchant discovery (200, public)
- `https://api.prxy.monster/v1/messages` — Anthropic-shape LLM endpoint (auth required, BYOK)
- `https://api.prxy.monster/v1/chat/completions` — OpenAI-shape LLM endpoint (auth required, BYOK)
- `https://api.prxy.monster/v1/bedrock/*` — Bedrock-shape endpoints (auth required, BYOK)
- `https://api.prxy.monster/v1/agent/messages` — MPP-protocol agent endpoint (402 challenge)
- `https://api.prxy.monster/v1/users/{signup,login,me}` — auth (session JWT)
- `https://api.prxy.monster/v1/keys` — API key management (session-auth)
- `https://api.prxy.monster/v1/byok/:provider` — BYOK key registration (AES-256-GCM at rest)
- `https://api.prxy.monster/v1/billing/checkout` — Stripe Checkout session
- `https://docs.prxy.monster` — developer documentation
- `https://modules.prxy.monster` — module marketplace
- `https://receipts.prxy.monster` — public payment ledger

## Contact

- General: hello@prxy.monster
- Founder: seann@prxy.monster
- GitHub: https://github.com/Ekkos-Technologies-Inc/prxy-monster-local
- Built by: ekkOS Technologies Inc. (Canada)