A toolkit of multimodal AI experiences powered by Google's Gemini API — image, video, audio, and text generation in one place.

Why your AI tool should never have a Gemini key in the browser — a server-side proxy walkthrough

A few people read my post about how GeminiOmni's first version got its API key scraped in seventeen minutes and asked for the full version of the proxy architecture. This is that post.

If you're building anything that calls a paid AI API — Gemini, OpenAI, Anthropic, Replicate, anything with a metered key — the pattern below is what stands between you and a Tuesday-morning surprise from your billing alerts. It's not complicated. It is load-bearing.

I'll walk through the architecture, the four concrete things you have to get right, and the edge cases I've hit running this in production for a month.

The core pattern, drawn out

The shape is the same regardless of which AI provider you're using:

┌─────────────────┐       ┌──────────────────────┐       ┌──────────────┐
│                 │       │                      │       │              │
│  Browser        │       │  Your API route      │       │  Provider    │
│  (untrusted)    │  ───→ │  (server, has key)   │  ───→ │  (Gemini /   │
│                 │       │                      │       │   OpenAI...) │
│  - prompt       │       │  - auth check        │       │              │
│  - upload       │       │  - rate limit        │       └──────────────┘
│                 │       │  - quota check       │
│                 │       │  - attach API key    │
│                 │       │  - call provider     │
│                 │       │  - return result     │
│                 │ ←───  │                      │
└─────────────────┘       └──────────────────────┘

The point is that the browser sees a prompt go in and a result come out. It has no idea what model was called, what the API key is, what the upstream provider looks like, or how much the call cost. This is the right amount of information for the browser to have.

What "right" looks like, concretely

Four things have to be true for this architecture to actually protect you. Missing any one of them and the protection collapses.

1. The API key lives only on the server

In Next.js terms: the env var must NOT start with NEXT_PUBLIC_. In runtime terms: only code that runs in a route handler (app/api/.../route.ts), a server component, or a server action ever touches the key.

// app/api/ai/generate/route.ts — server only
const GEMINI_KEY = process.env.GEMINI_API_KEY; // ✅ server-only
if (!GEMINI_KEY) throw new Error('GEMINI_API_KEY not set');

A startup check that crashes the app if GEMINI_API_KEY is missing is worth its weight in gold. The worst failure mode is the app silently running without a key and falling back to some default that empties your savings.

2. The browser never sees raw provider responses

When the provider call returns, you cherry-pick what to send back. The browser does NOT need to know:

Which exact model was called (Veo 3.1 Fast vs Veo 3.1 Standard)
The provider's request ID (useful for support, useless for clients)
Any rate-limit headers from the provider
The token count or cost of the call

Why this matters: information disclosure compounds. If your browser can see "model: gemini-2.5-flash, tokens: 4096", a scraper can map your front-end to your provider's pricing page and reverse-engineer your unit economics. That's not catastrophic but it's not great. Strip it.

// Return only what the UI needs
return Response.json({
  imageUrl: result.generated_images[0].url,
  watermarked: !user.isPro,
  // Don't return: model name, tokens used, raw provider response
});

3. Auth, rate-limit, and quota check happen BEFORE the provider call

The proxy isn't just a passthrough. It's the place where you enforce the rules:

// Inside the API route, in order:
1. Authenticate the request (session cookie, JWT, whatever).
   Anonymous requests are allowed for free-tier features but tracked by IP.

2. Rate-limit by user ID (or IP for anonymous).
   GeminiOmni: 10 req/min anonymous, 60 req/min Pro, 200 req/min Team.

3. Check the user's quota for the specific operation.
   Free users have 5 video generations per month — check the database BEFORE
   the expensive call, not after.

4. Only now — call the provider.

5. Increment usage counters on success.

The order matters. If you call the provider first and then check the quota, you've already paid for the inference whether the user was entitled to it or not. Every quota check happens up front.

4. The proxy has its own observability

Log every provider call from the proxy with: timestamp, user ID, route, model, input size, output size, latency, success/failure, and the rough dollar cost. This is your billing receipt and your debugging surface.

await logUsage({
  userId,
  route: '/api/ai/generate',
  model: 'gemini-2.5-flash-image',
  inputBytes: prompt.length + imageBytes,
  outputBytes: result.length,
  latencyMs: Date.now() - start,
  costUsd: estimateCost(model, inputBytes, outputBytes),
  status: 'success',
});

When your Tuesday-morning billing alert fires because spend doubled overnight, this table is what you SQL into to find out which user, which route, which model. Without it, you're guessing.

Edge cases I've hit running this in production

A month of running the proxy in production has surfaced a few non-obvious things.

Streaming responses need streaming proxies. Gemini's streamGenerateContent endpoint returns a stream of partial generations. If you naively call response.json() in your proxy, you'll buffer the entire stream and lose the latency benefit. The fix is to pipe the upstream stream through your proxy as a ReadableStream, which means your API route returns new Response(stream) instead of Response.json(...). Worth doing — perceived latency on video generation drops from "wait 90 seconds" to "see something happening immediately."

Large file uploads need direct-to-storage uploads. If a user uploads a 50MB video for image-to-video generation, you don't want that 50MB transiting through your Next.js API route. The pattern is: client uploads directly to Cloud Storage / R2 via a presigned URL, then sends just the URL to your proxy. The proxy passes the URL to Gemini's "media URI" parameter, and the bytes never touch your server. This drops your egress bill by ~95% on the upload-heavy paths.

Errors from upstream need translation. Gemini returns errors like "PROMPT_BLOCKED" or "QUOTA_EXCEEDED" or "INVALID_ARGUMENT". The browser does not need to see these strings — they're confusing and they leak details about the provider. Translate each to a user-facing message: "Your prompt was blocked by safety filters", "You've used your free generations for this month", "That image looks corrupted." Three-line lookup table.

Cost estimation is per-call, not per-token. I tried to estimate the dollar cost of each call by counting tokens. Then I noticed that Veo billing is by SECONDS of output video, Imagen is by IMAGE, and Nano Banana 2 has a four-tier price based on resolution. The cost function for the proxy is a switch statement, not a multiplication. There's no clean way around this — every provider bills differently and you have to encode each.

What this looks like in the GeminiOmni codebase

Every model call in /tools/text-to-video, /tools/image-to-video, /tools/nano-banana-edit, /tools/pdf-chat, and the live chat at /chat goes through app/api/ai/generate/route.ts and app/api/ai/query/route.ts. Two endpoints. About 400 lines of code together. The four checks above are all enforced before any provider call leaves my server.

I named the model in the user-facing UI but not in the API response. The badge under each generated clip says "Veo 3.1 Fast" — that comes from a server-side lookup at render time, not from the API response. This is the kind of thing that sounds silly until you realize that any string the API returns is one screenshot away from being a target.

The honest summary

The proxy pattern is not exciting. There are no clever algorithms in it. It's just the bare minimum required to run a paid AI API on the public internet without losing money to people who didn't pay.

But it's the bare minimum. Every indie builder I know who launched without it learned the lesson the same way: a billing alert at an awkward hour, a panic-revoke of the key, and a long evening rewriting the data flow. Cheaper to do it on day one.