LLM Proxy, Router, Gateway: How Preto.ai Unifies the Stack

Preto.ai Unifies Proxy, Router, and Gateway Layers to Simplify AI Request Management

Preto.ai unifies proxy, router, and gateway layers with cost intelligence in one URL change, adding audit, rate limits, and a free tier up to 10K requests.

The AI stack is often sold in shorthand — “gateway,” “proxy,” “router” — with vendors and docs using those terms interchangeably. Preto.ai starts from a different premise: these terms describe three distinct layers — proxy, router, gateway — that stack together to form a complete request path for models, and each layer has concrete responsibilities. Understanding the difference between proxy, router, and gateway and when to deploy each is essential for teams that need cost visibility, per-team governance, or compliance audit trails. This article explains those three layers, illustrates implementation patterns shown in the source Go examples, maps vendor offerings to the stack, and outlines how teams should choose the right approach.

The proxy as the transport layer

A proxy is fundamentally a transport mechanism: it sits in the HTTP request path and forwards calls from your application to an upstream model provider. From the application’s perspective, the only change is the base URL used to reach the SDK or HTTP client. The proxy does not make routing or policy decisions; it replaces the target endpoint and transmits requests and responses.

The source material demonstrates a minimal proxy approach where the incoming request’s Authorization header is replaced with a provider key and the request is forwarded to an upstream host (for example, the OpenAI API). In practice that means your app can continue using the same SDK and request format while routing all traffic through your own domain — for example, changing base_url to https://proxy.your-company.com/v1 — and gaining the ability to observe traffic, add caching later, or collect logs without changing the app logic.

Because a proxy is transport-only, it remains stateless with respect to the caller: it does not track tenant identity, enforce budgets, or decide which model should run a particular prompt. Those responsibilities sit above the proxy, in the router and gateway layers.

The router as the decision layer

The router implements the decision logic that determines which model and which provider should handle a given request. Crucially, the router is described as pure business logic in the source: it operates independently of HTTP transport, which makes it testable and replaceable without modifying the proxy.

Three routing patterns appear in the source examples:

Cost-based routing: the router estimates request complexity and assigns models accordingly (for example, routing short, low-complexity tasks to a smaller, cheaper model and reserving larger models for multi-step reasoning). The provided logic maps numerical complexity thresholds to specific models and providers.
Failover routing: the router defines a provider chain and selects the first available provider based on an availability circuit. If the preferred provider is unavailable, the router falls back to the next candidate in the chain.
Metadata-based routing: the router accepts a feature tag supplied by the application — for example, an X-Feature header set by the app — and routes by that tag. If no explicit tag matches, the router falls back to the default cost-based decision.

Implementing routing as transport-agnostic business logic means you can plug the same Router into different proxies or use it to run simulations and unit tests. The source highlights cost-based routing as particularly valuable because it enables per-request cost attribution and quick cost savings by routing simple tasks to cheaper models.

The gateway as the policy and identity layer

A gateway sits above the router and proxy and adds policy enforcement tied to caller identity. The defining characteristic of a gateway in the source is that it knows who is making each request — the tenant or team — and enforces rules such as authentication, rate limiting, budget controls, and audit logging accordingly.

The provided Go-based design models a gateway as a middleware chain around the proxy. Typical middleware components described include:

Auth middleware: validates an API key, resolves tenant identity, injects the tenant’s provider key into the outbound request, and attaches the tenant object to the request context.
Rate-limit middleware: enforces per-tenant request and token rate limits.
Budget middleware: checks the tenant’s monthly spend against a preset budget and rejects requests when the budget has been exceeded.
Audit middleware: logs each request together with identity and routing decision for later inspection.

Because the gateway maintains state related to who is calling, it enables governance and accountability that a plain proxy cannot provide. A proxy is stateless relative to the caller; a gateway is not.

How commercial products map to proxy, router, and gateway responsibilities

The source maps several vendors to the three-layer model, showing which layers each product covers and whether it provides cost intelligence. The prose below reflects the source’s mapping:

LiteLLM: provides proxy functionality and a router capability that supports 100+ providers; it offers partial gateway features.
Helicone: positions itself as a proxy and supplies some gateway functionality, with basic cost intelligence.
Portkey: offers proxy and router capabilities and a full gateway for enterprise customers; it includes basic cost intelligence.
Langfuse: operates as an async observer rather than a request-path proxy. It does not sit in the live request path, which eliminates proxy latency but also precludes caching, routing, or real-time budget enforcement; it is suitable when post-hoc observability is sufficient.
Preto (Preto.ai): according to the source, implements proxy, router, and gateway together and adds cost intelligence with recommendation features; the source also notes Preto’s free tier for up to 10K requests.

This mapping highlights that vendor descriptions can blur the lines between layers. Some products focus on observation and analytics (Langfuse), while others attempt to cover multiple layers with varying depth of policy and governance features.

When you need each layer: a team-size decision framework

The source offers a practical, team-centered decision framework that ties deployment complexity to organizational needs and spend:

One team, single model, under $2K/month: direct SDK calls are the recommended starting point. Add a proxy only when you have enough traffic to justify centralized logging or caching.
Multiple models and a need for cost visibility: add a proxy plus a router. Switching a single base URL to the proxy gives per-request cost attribution and enables routing of simple tasks to cheaper models. The source reports teams typically see 20–40% cost reductions within the first week of enabling model routing.
Multiple teams with shared access and billing concerns: introduce a gateway. Once teams share an API key and their usage is indistinguishable on invoices, governance and accountability problems arise; a gateway restores per-team visibility and enables enforcement of budgets to prevent untraceable bill spikes.
Compliance requirements (SOC 2, HIPAA, GDPR): the source recommends a gateway with audit logging and PII controls, because a gateway’s identity-aware logging produces the necessary trails for compliance verification.

These recommendations are deliberately conservative: start minimal and introduce layers as governance, cost, or compliance needs emerge.

Developer implementation patterns and testing benefits

The source’s examples emphasize separations of concern that are useful for engineering organizations:

Keep the router free of HTTP concerns. Implement routing logic as a pure function (or service) that accepts a structured request representation and returns a routing decision (model + provider). That makes routing logic easier to unit test and evolve independently of transport.
Keep the proxy simple and focused on forwarding and header rewriting. Minimal proxy implementations swap Authorization for a provider key and forward to the chosen upstream host. Adding caching or request/response inspection can come later once traffic patterns are observable.
Implement the gateway as middleware that enriches requests with identity and enforces policies. Middlewares such as Auth, RateLimit, Budget, and Audit form a clear, composable chain around the proxy.

This layered separation makes it possible to iterate on routing policies quickly (for example, adjusting complexity thresholds for cost-based routing) without touching the transport code path. It also enables staged adoption: an organization can add a proxy to observe usage, plug in a router for cost savings, and later fold in gateway policies for team governance.

Cost intelligence and the practical value of routing

The source repeatedly calls out cost intelligence as a high-value outcome of introducing routing. By attributing cost on a per-request basis and routing requests to models that match the task complexity, teams gain two practical benefits:

Immediate cost savings: routing simple classification and extraction tasks to smaller, cheaper models can materially reduce spend. The source indicates a typical 20–40% reduction in the first week after enabling routing.
Visibility for governance: when teams can see who used what model and how much it cost, budget enforcement and chargeback become possible. That visibility is also the prerequisite for automated budget middleware that blocks requests once a team exceeds its monthly allocation.

Cost intelligence does not require a full gateway: a proxy plus a router can surface per-request cost and route accordingly. A gateway is required when teams need per-tenant enforcement and compliance-grade audit logs.

How observability-only approaches differ

Langfuse — described in the source as an async observer — illustrates a deliberate trade-off. Because it does not sit in the live request path, it adds no proxy latency and can record detailed telemetry for post-hoc analysis. The trade-off is explicit: without being in-path, the tool cannot perform caching, routing, or real-time enforcement like budget blocking. For teams that only need retroactive analytics and experimentation data, an observer-only approach is appropriate; for teams that need real-time control and cost enforcement, an in-path proxy or gateway is necessary.

Implications for developers, security, and compliance

The three-layer model has several industry implications worth noting directly from the source’s logic:

Engineering ergonomics: Developers can retain existing SDK usage by switching a single base URL to a proxy, lowering friction for centralized control. Decoupling routing logic from transport makes policies easier to iterate and test.
Security and governance: Only the gateway — the identity-aware layer — can reliably enforce team-level rate limits, budgets, and audit trails. When multiple teams share access without identity-aware controls, billing surprises and governance holes become likely.
Compliance: For SOC 2, HIPAA, or GDPR needs, the auditability provided by an identity-aware gateway and explicit PII controls is necessary to demonstrate controls and chain of custody for requests.

These implications underline why organizations should treat proxy, router, and gateway as complementary elements of a security and governance strategy rather than as interchangeable labels.

How products differ in scope and trade-offs

The source’s product mapping reveals common trade-offs vendors make:

Full-stack coverage (proxy + router + gateway + cost intelligence) simplifies adoption at the cost of greater product breadth and operational complexity.
Proxy-only or observer-only offerings minimize latency and integration friction but require customers to implement routing or enforcement themselves if they need those capabilities.
Partial gateway features (for example, basic audit or rate limits) may suffice for small teams but can fall short for enterprise governance or strict compliance requirements.

Understanding these trade-offs helps teams select a product or build internal tooling aligned to their current needs and growth path.

Preto.ai, as described in the source, positions itself as a unified option that implements the three layers (proxy, router, gateway) together and adds cost intelligence and recommendations behind a single URL change, with a free tier up to 10K requests. That single-URL pattern mirrors the developer ergonomics advantage described earlier and makes it straightforward to adopt centralized control without rewriting SDK usage.

Looking ahead, the layered model — transport (proxy), decision (router), and policy (gateway) — provides a clear engineering roadmap for organizations as their AI usage matures. Teams can start with direct SDK calls, add a proxy to centralize observation, introduce routing for cost savings, and deploy a gateway when multi-team governance or compliance becomes mandatory. Vendors that expose these layers distinctly, or that allow teams to adopt them incrementally, will fit varied adoption curves; observer-only analytics will remain useful for experimentation, while identity-aware gateways will be the default for regulated or multi-tenant environments.

As model choice, cost, and compliance pressures continue to shape operational architectures, expect the distinctions between proxy, router, and gateway to matter more than vendor branding. The practical pattern in the source — lightweight proxy for transport, pure-business-logic router for decisions, and an identity-aware gateway for enforcement — is a clear, implementable blueprint for teams that need control, accountability, and cost visibility when operating multiple models and serving multiple teams.