Self-Hosting Guide
Deploy your own WebLLM gateway for full control over data and infrastructure
Run your own gateway server for complete data sovereignty, custom rate limiting, and integration with your existing infrastructure. Choose the deployment option that best fits your needs.
Deployment Options
Serverless, globally distributed, automatic scaling. Best for production.
Traditional server deployment. Good for VPS, EC2, or on-premise.
Containerized deployment. Ideal for Kubernetes or Docker Compose.
Option 1: Cloudflare Workers
Deploy to Cloudflare's edge network for low-latency, globally distributed inference.
Clone and Install
git clone https://github.com/webllm-org/webllm cd webllm/packages/gateway npm install
Configure wrangler.toml
name = "webllm-gateway" main = "src/worker/index.ts" compatibility_date = "2024-01-01" [[kv_namespaces]] binding = "TOKEN_USAGE" id = "your-kv-namespace-id" [vars] ENVIRONMENT = "production"
Set Secrets
# Gateway secret key for signing tokens wrangler secret put GATEWAY_SECRET_KEY # Provider API keys wrangler secret put OPENAI_API_KEY wrangler secret put ANTHROPIC_API_KEY
Deploy
npm run deploy # Your gateway is live at: # https://webllm-gateway.your-account.workers.dev
Option 2: Node.js Server
Run the gateway as a traditional Node.js server with Express/Hono.
Install Dependencies
npm install @webllm/server @webllm/gateway-tokens hono
Create Server
// server.ts
import { serve } from '@hono/node-server'
import { Hono } from 'hono'
import { cors } from 'hono/cors'
import { LLMServer } from '@webllm/server'
const app = new Hono()
const llmServer = new LLMServer()
app.use('*', cors())
// Health check
app.get('/api/v1/health', (c) => {
return c.json({ status: 'healthy', version: '1.0.0' })
})
// Inference endpoint
app.post('/api/v1/inference', async (c) => {
const body = await c.req.json()
const result = await llmServer.chat(body)
return c.json(result)
})
serve({ fetch: app.fetch, port: 3000 })
console.log('Gateway running on http://localhost:3000')Configure Environment
# .env GATEWAY_SECRET_KEY=sk-webllm-gateway-... OPENAI_API_KEY=sk-... ANTHROPIC_API_KEY=sk-ant-... PORT=3000
Run with PM2
# Install PM2 for production npm install -g pm2 # Start server pm2 start server.ts --name webllm-gateway # View logs pm2 logs webllm-gateway
Option 3: Docker
Containerized deployment for Kubernetes, Docker Compose, or any container orchestrator.
Dockerfile
FROM node:20-alpine WORKDIR /app COPY package*.json ./ RUN npm ci --only=production COPY . . RUN npm run build EXPOSE 3000 CMD ["node", "dist/server.js"]
docker-compose.yml
version: '3.8'
services:
gateway:
build: .
ports:
- "3000:3000"
environment:
- GATEWAY_SECRET_KEY=${GATEWAY_SECRET_KEY}
- OPENAI_API_KEY=${OPENAI_API_KEY}
- ANTHROPIC_API_KEY=${ANTHROPIC_API_KEY}
restart: unless-stopped
redis:
image: redis:7-alpine
volumes:
- redis-data:/data
restart: unless-stopped
volumes:
redis-data:Deploy
# Build and run docker-compose up -d # View logs docker-compose logs -f gateway
Configuration
| Environment Variable | Required | Description |
|---|---|---|
| GATEWAY_SECRET_KEY | Yes | Secret key for signing tokens (64+ chars) |
| OPENAI_API_KEY | Optional | OpenAI API key for GPT models |
| ANTHROPIC_API_KEY | Optional | Anthropic API key for Claude models |
| RATE_LIMIT_PER_MINUTE | Optional | Max requests per minute per token (default: 60) |
| PORT | Optional | Server port (default: 3000) |
Security Checklist
Use HTTPS
Always deploy behind HTTPS. Use Cloudflare, nginx, or your load balancer's SSL termination.
Rotate Secret Keys
Generate new gateway secret keys periodically. Use a secrets manager (AWS Secrets Manager, HashiCorp Vault).
Enable Rate Limiting
Set appropriate rate limits to prevent abuse. Consider both per-token and global limits.
Monitor and Log
Enable request logging and set up monitoring alerts for unusual traffic patterns.
Restrict CORS Origins
Configure CORS to only allow requests from your domains. Don't use wildcard (*) in production.
Self-Hosting Benefits
Data Sovereignty
All requests and responses stay within your infrastructure. No data sent to third-party gateway services.
Custom Providers
Configure any combination of providers with your own API keys. Add custom providers or private model endpoints.
Custom Auth
Integrate with your existing authentication system. Use SSO, LDAP, or custom token validation.
Cost Control
No per-request gateway fees. Only pay for your infrastructure and the LLM API calls you make.