Self-Hosting Guide

Deploy your own WebLLM gateway for full control over data and infrastructure

Run your own gateway server for complete data sovereignty, custom rate limiting, and integration with your existing infrastructure. Choose the deployment option that best fits your needs.

Deployment Options

Cloudflare Workers

Recommended

Serverless, globally distributed, automatic scaling. Best for production.

Node.js Server

Traditional server deployment. Good for VPS, EC2, or on-premise.

Docker

Containerized deployment. Ideal for Kubernetes or Docker Compose.

Option 1: Cloudflare Workers

Deploy to Cloudflare's edge network for low-latency, globally distributed inference.

Clone and Install

git clone https://github.com/webllm-org/webllm
cd webllm/packages/gateway
npm install

Configure wrangler.toml

name = "webllm-gateway"
main = "src/worker/index.ts"
compatibility_date = "2024-01-01"

[[kv_namespaces]]
binding = "TOKEN_USAGE"
id = "your-kv-namespace-id"

[vars]
ENVIRONMENT = "production"

Set Secrets

# Gateway secret key for signing tokens
wrangler secret put GATEWAY_SECRET_KEY

# Provider API keys
wrangler secret put OPENAI_API_KEY
wrangler secret put ANTHROPIC_API_KEY

Deploy

npm run deploy

# Your gateway is live at:
# https://webllm-gateway.your-account.workers.dev

Option 2: Node.js Server

Run the gateway as a traditional Node.js server with Express/Hono.

Install Dependencies

npm install @webllm/server @webllm/gateway-tokens hono

Create Server

// server.ts
import { serve } from '@hono/node-server'
import { Hono } from 'hono'
import { cors } from 'hono/cors'
import { LLMServer } from '@webllm/server'

const app = new Hono()
const llmServer = new LLMServer()

app.use('*', cors())

// Health check
app.get('/api/v1/health', (c) => {
  return c.json({ status: 'healthy', version: '1.0.0' })
})

// Inference endpoint
app.post('/api/v1/inference', async (c) => {
  const body = await c.req.json()
  const result = await llmServer.chat(body)
  return c.json(result)
})

serve({ fetch: app.fetch, port: 3000 })
console.log('Gateway running on http://localhost:3000')

Configure Environment

# .env
GATEWAY_SECRET_KEY=sk-webllm-gateway-...
OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...
PORT=3000

Run with PM2

# Install PM2 for production
npm install -g pm2

# Start server
pm2 start server.ts --name webllm-gateway

# View logs
pm2 logs webllm-gateway

Option 3: Docker

Containerized deployment for Kubernetes, Docker Compose, or any container orchestrator.

Dockerfile

FROM node:20-alpine

WORKDIR /app
COPY package*.json ./
RUN npm ci --only=production

COPY . .
RUN npm run build

EXPOSE 3000
CMD ["node", "dist/server.js"]

docker-compose.yml

version: '3.8'

services:
  gateway:
    build: .
    ports:
      - "3000:3000"
    environment:
      - GATEWAY_SECRET_KEY=${GATEWAY_SECRET_KEY}
      - OPENAI_API_KEY=${OPENAI_API_KEY}
      - ANTHROPIC_API_KEY=${ANTHROPIC_API_KEY}
    restart: unless-stopped

  redis:
    image: redis:7-alpine
    volumes:
      - redis-data:/data
    restart: unless-stopped

volumes:
  redis-data:

Deploy

# Build and run
docker-compose up -d

# View logs
docker-compose logs -f gateway

Configuration

Environment Variable	Required	Description
GATEWAY_SECRET_KEY	Yes	Secret key for signing tokens (64+ chars)
OPENAI_API_KEY	Optional	OpenAI API key for GPT models
ANTHROPIC_API_KEY	Optional	Anthropic API key for Claude models
RATE_LIMIT_PER_MINUTE	Optional	Max requests per minute per token (default: 60)
PORT	Optional	Server port (default: 3000)

Security Checklist

Use HTTPS

Always deploy behind HTTPS. Use Cloudflare, nginx, or your load balancer's SSL termination.

Rotate Secret Keys

Generate new gateway secret keys periodically. Use a secrets manager (AWS Secrets Manager, HashiCorp Vault).

Enable Rate Limiting

Set appropriate rate limits to prevent abuse. Consider both per-token and global limits.

Monitor and Log

Enable request logging and set up monitoring alerts for unusual traffic patterns.

Restrict CORS Origins

Configure CORS to only allow requests from your domains. Don't use wildcard (*) in production.

Self-Hosting Benefits

Data Sovereignty

All requests and responses stay within your infrastructure. No data sent to third-party gateway services.

Custom Providers

Configure any combination of providers with your own API keys. Add custom providers or private model endpoints.

Custom Auth

Integrate with your existing authentication system. Use SSO, LDAP, or custom token validation.

Cost Control

No per-request gateway fees. Only pay for your infrastructure and the LLM API calls you make.