AI Integration Approaches: A Framework for Developers

There are 5 ways to add AI to your app. Here is how to choose.

WebLLM Team
AI Integration Approaches: A Framework for Developers

You've decided to add AI to your web app. Now what?

There are multiple approaches, each with different tradeoffs. This guide helps you choose.

The 5 Approaches

1. Direct API Integration

Call AI providers (OpenAI, Anthropic, etc.) directly from your backend.

// Your backend
import OpenAI from 'openai';

const openai = new OpenAI({ apiKey: process.env.OPENAI_KEY });

app.post('/api/chat', async (req, res) => {
  const response = await openai.chat.completions.create({
    model: 'gpt-4',
    messages: req.body.messages
  });
  res.json(response);
});

Best for: Quick prototypes, single-provider apps

2. AI Gateway Services

Use managed services like Vercel AI SDK, Cloudflare AI Gateway, or AWS Bedrock.

// Using Vercel AI SDK
import { generateText } from 'ai';
import { openai } from '@ai-sdk/openai';

const result = await generateText({
  model: openai('gpt-4'),
  prompt: 'Hello'
});

Best for: Production apps needing reliability, caching, analytics

3. Backend Proxy

Build your own abstraction layer over multiple providers.

// Your custom proxy
class AIProxy {
  async generate(prompt) {
    try {
      return await this.openai.generate(prompt);
    } catch {
      return await this.anthropic.generate(prompt); // Fallback
    }
  }
}

Best for: Custom routing logic, multi-provider fallback, cost optimization

4. Browser-Native (User-Powered)

Users bring their own AI via browser extension or native browser support.

// Frontend - no backend needed
if ('llm' in navigator) {
  const response = await navigator.llm.prompt('Hello');
}

Best for: Privacy-focused apps, cost-sensitive projects, user control

5. Local Models

Run models on-device using Ollama, LM Studio, or in-browser via WebGPU.

// Connect to local Ollama
const response = await fetch('http://localhost:11434/api/generate', {
  method: 'POST',
  body: JSON.stringify({ model: 'llama3', prompt: 'Hello' })
});

Best for: Offline capability, full privacy, air-gapped environments

Comparison Matrix

FactorDirect APIAI GatewayBackend ProxyBrowser-NativeLocal Models
Setup complexityLowMediumHighLowMedium
Cost to youHighMediumHighZeroZero
PrivacyLowLowMediumHighHighest
User controlNoneNoneNoneFullFull
Provider flexibilityLowMediumHighHighMedium
Offline supportNoNoNoPossibleYes
ReliabilityProvider-dependentHigh (managed)You manageUser-dependentYou manage

Decision Tree

Do you need offline support?
├── Yes → Local Models (5)
└── No
    │
    Do users need to control their AI provider?
    ├── Yes → Browser-Native (4)
    └── No
        │
        Do you need multi-provider fallback?
        ├── Yes
        │   │
        │   Build vs Buy?
        │   ├── Build → Backend Proxy (3)
        │   └── Buy → AI Gateway (2)
        └── No → Direct API (1)

Approach 1: Direct API Integration

When to Use

  • Rapid prototyping
  • Single provider is acceptable
  • Simple use case
  • Small scale

Implementation

// Backend (Node.js/Express)
import Anthropic from '@anthropic-ai/sdk';

const anthropic = new Anthropic();

app.post('/api/chat', async (req, res) => {
  const message = await anthropic.messages.create({
    model: 'claude-3-sonnet-20240229',
    max_tokens: 1024,
    messages: [{ role: 'user', content: req.body.prompt }]
  });

  res.json({ response: message.content[0].text });
});

Pros

  • Simplest to implement
  • Direct access to provider features
  • Good documentation

Cons

  • Locked to one provider
  • No fallback
  • You manage everything

Approach 2: AI Gateway Services

When to Use

  • Production applications
  • Need caching, rate limiting, analytics
  • Want managed reliability
  • Multiple models/providers

Implementation

// Using Vercel AI SDK
import { streamText } from 'ai';
import { openai } from '@ai-sdk/openai';
import { anthropic } from '@ai-sdk/anthropic';

// Easy provider switching
const model = useOpenAI ? openai('gpt-4') : anthropic('claude-3-sonnet');

const result = await streamText({
  model,
  prompt: input
});

// Streaming built-in
for await (const chunk of result.textStream) {
  process.stdout.write(chunk);
}

Pros

  • Managed infrastructure
  • Built-in streaming, caching
  • Provider abstraction
  • Good for production

Cons

  • Additional service dependency
  • May have usage limits
  • Less control than custom proxy

Approach 3: Backend Proxy

When to Use

  • Custom routing logic needed
  • Cost optimization across providers
  • Specific compliance requirements
  • Full control required

Implementation

// Custom AI proxy with fallback
class AIService {
  constructor() {
    this.providers = [
      { name: 'groq', client: new Groq(), priority: 1 },
      { name: 'openai', client: new OpenAI(), priority: 2 },
      { name: 'anthropic', client: new Anthropic(), priority: 3 }
    ];
  }

  async generate(prompt, options = {}) {
    const sorted = this.providers.sort((a, b) => a.priority - b.priority);

    for (const provider of sorted) {
      try {
        return await this.callProvider(provider, prompt, options);
      } catch (error) {
        console.log(`${provider.name} failed, trying next...`);
      }
    }

    throw new Error('All providers failed');
  }

  async callProvider(provider, prompt, options) {
    // Provider-specific implementation
    switch (provider.name) {
      case 'groq':
        return this.callGroq(provider.client, prompt, options);
      case 'openai':
        return this.callOpenAI(provider.client, prompt, options);
      case 'anthropic':
        return this.callAnthropic(provider.client, prompt, options);
    }
  }
}

Pros

  • Full control
  • Custom fallback logic
  • Cost optimization
  • Compliance flexibility

Cons

  • Significant engineering effort
  • You maintain everything
  • Still server-side costs

Approach 4: Browser-Native (User-Powered)

When to Use

  • Privacy is important
  • Don't want to pay AI costs
  • Users likely have AI subscriptions
  • Simple AI features (enhancement, not core)

Implementation

// Frontend - no backend AI needed
class BrowserAI {
  constructor() {
    this.available = 'llm' in navigator;
  }

  async generate(prompt, fallback = null) {
    if (this.available) {
      return await navigator.llm.prompt(prompt);
    }

    if (fallback) {
      return await fallback(prompt);
    }

    throw new Error('AI not available');
  }

  async stream(prompt) {
    if (!this.available) {
      throw new Error('AI not available');
    }

    return navigator.llm.streamPrompt(prompt);
  }
}

// Usage
const ai = new BrowserAI();

if (ai.available) {
  const response = await ai.generate('Improve this text: ' + text);
}

Pros

  • Zero cost to you
  • User controls privacy
  • Simpler architecture
  • No API key management

Cons

  • Users must have AI configured
  • Less control over model
  • Graceful degradation needed

Approach 5: Local Models

When to Use

  • Offline capability required
  • Maximum privacy needed
  • Air-gapped environments
  • Specific model requirements

Implementation

// Using Ollama
class LocalAI {
  constructor(baseUrl = 'http://localhost:11434') {
    this.baseUrl = baseUrl;
  }

  async generate(prompt, model = 'llama3') {
    const response = await fetch(`${this.baseUrl}/api/generate`, {
      method: 'POST',
      headers: { 'Content-Type': 'application/json' },
      body: JSON.stringify({
        model,
        prompt,
        stream: false
      })
    });

    const data = await response.json();
    return data.response;
  }

  async *stream(prompt, model = 'llama3') {
    const response = await fetch(`${this.baseUrl}/api/generate`, {
      method: 'POST',
      headers: { 'Content-Type': 'application/json' },
      body: JSON.stringify({
        model,
        prompt,
        stream: true
      })
    });

    const reader = response.body.getReader();
    const decoder = new TextDecoder();

    while (true) {
      const { done, value } = await reader.read();
      if (done) break;

      const lines = decoder.decode(value).split('\n');
      for (const line of lines) {
        if (line) {
          const data = JSON.parse(line);
          if (data.response) yield data.response;
        }
      }
    }
  }
}

Pros

  • Works offline
  • Full privacy
  • No API costs
  • No rate limits

Cons

  • User needs to set up Ollama
  • Hardware-dependent performance
  • Limited model selection
  • No cloud fallback

Hybrid Approaches

Real applications often combine approaches:

Browser-First with Backend Fallback

async function getAIResponse(prompt) {
  // Try browser AI first (free for you)
  if ('llm' in navigator) {
    try {
      return await navigator.llm.prompt(prompt);
    } catch {
      // Fall through to backend
    }
  }

  // Fallback to your backend
  const res = await fetch('/api/ai', {
    method: 'POST',
    body: JSON.stringify({ prompt })
  });
  return res.json();
}

Local-First with Cloud Fallback

async function generate(prompt) {
  // Try local Ollama
  if (await this.isOllamaRunning()) {
    return await this.localAI.generate(prompt);
  }

  // Fallback to cloud
  return await this.cloudAI.generate(prompt);
}

Recommendations by Use Case

Use CaseRecommended Approach
PrototypeDirect API
Production SaaSAI Gateway
Privacy-focused appBrowser-Native + Local
Cost-sensitiveBrowser-Native
Enterprise/complianceBackend Proxy
Offline-firstLocal Models
Open source projectBrowser-Native

Conclusion

There's no single "best" approach. The right choice depends on:

  1. Privacy requirements → Browser-Native or Local
  2. Cost constraints → Browser-Native
  3. Control needs → Backend Proxy
  4. Simplicity needs → Direct API or AI Gateway
  5. Offline needs → Local Models

Most production apps benefit from hybrid approaches—browser-native for enhancement features, backend for core functionality.


Further Reading

In this article:

Share this article: