Your graphics card is the most powerful processor in your computer, and until recently, web browsers couldn't really use it. WebGL gave us 3D graphics, but the GPU's real potential—massive parallel computation—remained locked away.
WebGPU changes that. It's not just a better WebGL. It's a fundamental shift in what web applications can do.
This matters because the same GPU that renders video games can also run machine learning inference, physics simulations, video encoding, and cryptocurrency mining. When browsers gain full GPU access, web apps gain capabilities that previously required native code.
Let's understand what WebGPU is, why it exists, and what it enables.
The GPU: A Different Kind of Processor
Before diving into WebGPU, we need to understand what makes GPUs special.
CPU vs. GPU: Different Tools for Different Jobs
CPU (Central Processing Unit):
- Few powerful cores (4-16 typical)
- Optimized for sequential tasks
- Great at complex, branching logic
- Handles diverse workloads
GPU (Graphics Processing Unit):
- Many smaller cores (hundreds to thousands)
- Optimized for parallel tasks
- Great at doing the same thing to lots of data
- Handles specific workloads extremely fast
The Parallelism Advantage
Consider calculating brightness for every pixel in a 1920x1080 image:
CPU approach (sequential):
for (let i = 0; i < 2_073_600; i++) {
pixels[i] = calculateBrightness(pixels[i]);
}
// ~2 million iterations, one at a time
GPU approach (parallel):
// All ~2 million pixels calculated simultaneously
// Each GPU core handles different pixels
The GPU might be 10-100x faster for this workload, not because each core is faster, but because thousands of cores work simultaneously.
What GPUs Excel At
- Graphics rendering: The original use case—transforming 3D geometry into 2D images
- Image/video processing: Filters, transformations, encoding/decoding
- Machine learning inference: Matrix multiplications, neural network layers
- Physics simulation: Particle systems, fluid dynamics
- Scientific computing: Financial modeling, cryptography, simulations
All of these involve doing similar operations on large amounts of data—perfect for parallelization.
The History: From WebGL to WebGPU
WebGL (2011): Graphics for the Web
WebGL brought hardware-accelerated 3D graphics to browsers. It was based on OpenGL ES 2.0, a graphics API designed in 2007 for mobile devices.
// WebGL: Verbose, state-machine based
const gl = canvas.getContext('webgl');
gl.clearColor(0.0, 0.0, 0.0, 1.0);
gl.enable(gl.DEPTH_TEST);
gl.useProgram(shaderProgram);
gl.bindBuffer(gl.ARRAY_BUFFER, vertexBuffer);
gl.drawArrays(gl.TRIANGLES, 0, vertexCount);
WebGL achievements:
- 3D games in browser (HexGL, Quake JS)
- Data visualization (Three.js, deck.gl)
- Creative tools (Figma's rendering)
- Maps (Google Maps 3D, Mapbox)
WebGL limitations:
- State machine API: Error-prone, hard to optimize
- Old design paradigm: Based on 2007-era graphics concepts
- Single-threaded: Can't prepare work on multiple threads
- Limited compute: No general-purpose GPU computing
- Driver overhead: High CPU cost for draw calls
By 2015, native graphics APIs had evolved dramatically—Vulkan, Metal, DirectX 12—while WebGL was stuck in the OpenGL ES 2.0 era.
The Problem With Modernizing WebGL
Browsers couldn't just "update" WebGL because:
- Different native APIs: Windows uses DirectX, macOS uses Metal, Linux uses Vulkan
- Breaking changes: Modern paradigms are fundamentally different from WebGL's model
- Security requirements: Browsers need extra safety layers native apps don't
- Backward compatibility: Millions of WebGL sites must keep working
A new API was needed.
WebGPU: The Modern Solution
In 2017, Apple proposed WebGPU. Google had been working on NXT. These efforts merged into a unified specification through the W3C.
Design goals:
- Abstract over Vulkan, Metal, and DirectX 12
- Enable general-purpose GPU computing
- Reduce CPU overhead (more draw calls, less state management)
- Support multi-threading (prepare work off main thread)
- Maintain browser security model
In 2023, Chrome shipped WebGPU. Firefox and Safari implementations are in progress.
WebGPU Architecture: The Key Concepts
Adapter and Device
// 1. Request adapter (physical GPU)
const adapter = await navigator.gpu.requestAdapter();
// 2. Request device (logical connection)
const device = await adapter.requestDevice();
Adapter: Represents the physical GPU. You can query capabilities and limits.
Device: Your application's connection to the GPU. All work goes through this.
This separation matters because:
- Multiple tabs/apps share the same physical GPU
- Each gets isolated logical access
- Crashes in one app don't affect others
Buffers: GPU Memory
// Create buffer for vertex data
const vertexBuffer = device.createBuffer({
size: vertices.byteLength,
usage: GPUBufferUsage.VERTEX | GPUBufferUsage.COPY_DST,
});
// Copy data to GPU
device.queue.writeBuffer(vertexBuffer, 0, vertices);
Buffers are explicitly allocated and typed. You declare upfront how they'll be used—vertex data, uniform data, storage, etc. This explicitness enables GPU driver optimizations.
Shaders: GPU Programs
// WGSL shader (WebGPU Shading Language)
const shaderModule = device.createShaderModule({
code: `
@vertex
fn vertexMain(@location(0) position: vec3f) -> @builtin(position) vec4f {
return vec4f(position, 1.0);
}
@fragment
fn fragmentMain() -> @location(0) vec4f {
return vec4f(1.0, 0.0, 0.0, 1.0); // Red
}
`,
});
WGSL (WebGPU Shading Language) is new. Unlike GLSL (WebGL's shader language), WGSL was designed for the modern GPU model with safety and portability in mind.
Pipelines: Execution Configuration
const pipeline = device.createRenderPipeline({
layout: 'auto',
vertex: {
module: shaderModule,
entryPoint: 'vertexMain',
buffers: [
/* vertex buffer layout */
],
},
fragment: {
module: shaderModule,
entryPoint: 'fragmentMain',
targets: [{ format: canvasFormat }],
},
primitive: {
topology: 'triangle-list',
},
});
Pipelines bundle all the state needed for a draw call:
- Which shaders to use
- How to interpret vertex data
- Blending modes, depth testing, etc.
Creating pipelines is expensive, but using them is cheap. You create pipelines once, reuse many times.
Command Encoding: Recording Work
// Create command encoder
const encoder = device.createCommandEncoder();
// Begin render pass
const pass = encoder.beginRenderPass({
colorAttachments: [
{
view: context.getCurrentTexture().createView(),
clearValue: { r: 0, g: 0, b: 0, a: 1 },
loadOp: 'clear',
storeOp: 'store',
},
],
});
// Record drawing commands
pass.setPipeline(pipeline);
pass.setVertexBuffer(0, vertexBuffer);
pass.draw(3); // Draw 3 vertices
// End pass
pass.end();
// Submit to GPU
device.queue.submit([encoder.finish()]);
Key insight: You're not drawing immediately. You're recording commands that the GPU will execute later. This enables:
- Batching work efficiently
- Recording on background threads
- Optimizing command sequences
The Game Changer: Compute Shaders
Here's what truly sets WebGPU apart from WebGL: compute shaders.
What Are Compute Shaders?
Graphics shaders (vertex, fragment) are designed for rendering pipelines. Compute shaders are general-purpose—they just crunch data.
// WGSL compute shader: double every number
const computeShader = device.createShaderModule({
code: `
@group(0) @binding(0) var<storage, read_write> data: array<f32>;
@compute @workgroup_size(64)
fn main(@builtin(global_invocation_id) id: vec3u) {
data[id.x] = data[id.x] * 2.0;
}
`,
});
This shader runs across thousands of GPU cores simultaneously. Each core processes a different array element.
Creating a Compute Pipeline
const computePipeline = device.createComputePipeline({
layout: 'auto',
compute: {
module: computeShader,
entryPoint: 'main',
},
});
// Create buffer with data
const dataBuffer = device.createBuffer({
size: data.byteLength,
usage: GPUBufferUsage.STORAGE | GPUBufferUsage.COPY_SRC | GPUBufferUsage.COPY_DST,
});
// Bind buffer to shader
const bindGroup = device.createBindGroup({
layout: computePipeline.getBindGroupLayout(0),
entries: [
{
binding: 0,
resource: { buffer: dataBuffer },
},
],
});
// Dispatch compute work
const encoder = device.createCommandEncoder();
const pass = encoder.beginComputePass();
pass.setPipeline(computePipeline);
pass.setBindGroup(0, bindGroup);
pass.dispatchWorkgroups(Math.ceil(data.length / 64));
pass.end();
device.queue.submit([encoder.finish()]);
Why Compute Shaders Enable ML
Machine learning inference is fundamentally about matrix math:
output = activation(weights * input + bias)
This involves:
- Matrix multiplication (massive parallelism opportunity)
- Element-wise operations (add, multiply)
- Activation functions (apply same function to each element)
All perfect for GPU parallelization. A neural network layer that takes 100ms on CPU might take 1ms on GPU.
Before WebGPU: Browser ML used WebGL hacks—encoding matrices as textures, using fragment shaders for computation. It worked, but was awkward and limited.
With WebGPU: Compute shaders directly express ML operations. Libraries like ONNX Runtime Web and Transformers.js can target WebGPU naturally.
What WebGPU Enables
1. Browser-Based ML Inference
// Conceptual: Run inference with WebGPU backend
import { pipeline } from '@xenova/transformers';
const classifier = await pipeline('sentiment-analysis', {
backend: 'webgpu', // Use GPU acceleration
});
const result = await classifier('WebGPU is amazing!');
// { label: 'POSITIVE', score: 0.9998 }
Real ML models running at near-native speed in a browser tab:
- Image classification
- Object detection
- Text generation (LLMs!)
- Speech recognition
- Translation
2. Advanced Graphics
// Clustered forward rendering, GPU-driven culling, etc.
// Techniques that were too expensive in WebGL
Modern rendering techniques become practical:
- Deferred rendering
- GPU particle systems
- Volumetric effects
- Real-time global illumination
3. Scientific Simulation
// Fluid simulation with compute shaders
// Each cell updated in parallel
Physics simulations that were previously server-side or native-only:
- Fluid dynamics
- N-body gravity
- Molecular dynamics
- Weather modeling (simplified)
4. Video Processing
// Real-time video effects
// Background removal, style transfer, upscaling
Effects applied per-frame in real-time:
- Background blur (like Zoom/Meet)
- Style transfer
- Super-resolution upscaling
- Real-time color grading
5. Cryptography
// Parallel hash computation
// (Note: Responsible use required)
Cryptographic operations benefit from parallelism:
- Proof-of-work (yes, browser mining exists)
- Bulk encryption/decryption
- Hash verification
WebGPU vs. WebGL: The Real Differences
| Aspect | WebGL | WebGPU |
|---|---|---|
| API Model | State machine | Object-oriented |
| Compute Shaders | No (hacks only) | Yes, first-class |
| Multi-threading | Limited | Full support |
| Draw Call Overhead | High | Low |
| Shader Language | GLSL | WGSL |
| Error Handling | Silent failures | Explicit errors |
| Pipeline State | Global, mutable | Pre-baked objects |
| Browser Support | Universal | Chrome (others coming) |
Code Comparison: Drawing a Triangle
WebGL (~100 lines):
// Initialize WebGL context
const gl = canvas.getContext('webgl');
// Compile vertex shader
const vertexShader = gl.createShader(gl.VERTEX_SHADER);
gl.shaderSource(
vertexShader,
`
attribute vec4 position;
void main() { gl_Position = position; }
`
);
gl.compileShader(vertexShader);
// Compile fragment shader
const fragmentShader = gl.createShader(gl.FRAGMENT_SHADER);
gl.shaderSource(
fragmentShader,
`
precision mediump float;
void main() { gl_FragColor = vec4(1, 0, 0, 1); }
`
);
gl.compileShader(fragmentShader);
// Link program
const program = gl.createProgram();
gl.attachShader(program, vertexShader);
gl.attachShader(program, fragmentShader);
gl.linkProgram(program);
// Set up buffer
const buffer = gl.createBuffer();
gl.bindBuffer(gl.ARRAY_BUFFER, buffer);
gl.bufferData(
gl.ARRAY_BUFFER,
new Float32Array([0, 0.5, -0.5, -0.5, 0.5, -0.5]),
gl.STATIC_DRAW
);
// Draw
gl.useProgram(program);
const positionLocation = gl.getAttribLocation(program, 'position');
gl.enableVertexAttribArray(positionLocation);
gl.vertexAttribPointer(positionLocation, 2, gl.FLOAT, false, 0, 0);
gl.drawArrays(gl.TRIANGLES, 0, 3);
WebGPU (still verbose, but more explicit):
const adapter = await navigator.gpu.requestAdapter();
const device = await adapter.requestDevice();
const context = canvas.getContext('webgpu');
const format = navigator.gpu.getPreferredCanvasFormat();
context.configure({ device, format });
const shaderModule = device.createShaderModule({
code: `
@vertex fn vs(@builtin(vertex_index) i: u32) -> @builtin(position) vec4f {
var pos = array<vec2f, 3>(
vec2f(0, 0.5), vec2f(-0.5, -0.5), vec2f(0.5, -0.5)
);
return vec4f(pos[i], 0, 1);
}
@fragment fn fs() -> @location(0) vec4f {
return vec4f(1, 0, 0, 1);
}
`,
});
const pipeline = device.createRenderPipeline({
layout: 'auto',
vertex: { module: shaderModule, entryPoint: 'vs' },
fragment: { module: shaderModule, entryPoint: 'fs', targets: [{ format }] },
});
const encoder = device.createCommandEncoder();
const pass = encoder.beginRenderPass({
colorAttachments: [
{
view: context.getCurrentTexture().createView(),
loadOp: 'clear',
storeOp: 'store',
},
],
});
pass.setPipeline(pipeline);
pass.draw(3);
pass.end();
device.queue.submit([encoder.finish()]);
WebGPU isn't necessarily shorter, but it's more explicit about what's happening—and that explicitness enables optimizations.
Current Browser Support
As of late 2024:
| Browser | Status |
|---|---|
| Chrome | Shipped (113+) |
| Edge | Shipped (113+) |
| Opera | Shipped |
| Firefox | In development (Nightly) |
| Safari | In development (Technology Preview) |
Practical advice: Use WebGPU with fallback to WebGL for broader compatibility.
if ('gpu' in navigator) {
// Use WebGPU
} else if ('WebGLRenderingContext' in window) {
// Fall back to WebGL
} else {
// No GPU support
}
Performance: Real Numbers
Benchmark comparisons (representative, your mileage varies):
ML Inference (BERT-base)
- CPU (JavaScript): ~800ms per inference
- WebGL backend: ~150ms
- WebGPU backend: ~30ms
Particle Simulation (1M particles)
- CPU: 15 FPS
- WebGL: 45 FPS
- WebGPU: 60 FPS (compute shader)
Image Processing (4K image, blur)
- CPU: 500ms
- WebGL: 80ms
- WebGPU: 20ms
The gains are most dramatic for compute-heavy workloads that parallelize well.
The ML Connection: Why This Matters for AI in the Browser
WebGPU is the foundation that makes browser-based AI practical:
- LLM inference: Running language models requires matrix operations that benefit from GPU parallelism
- Image AI: Vision models (classification, segmentation, generation) need GPU compute
- Real-time AI: Video effects, pose detection, background removal—all need low-latency GPU access
- Private AI: Running models locally (no server round-trip) requires efficient GPU use
Without WebGPU, browser AI is limited to:
- Small models only
- High latency
- Battery drain (CPU is less efficient)
- Janky user experience
With WebGPU:
- Larger models become practical
- Near-real-time inference
- Efficient power usage
- Smooth 60fps integration
This is why WebGPU is a prerequisite for browser-native AI APIs. You can't have navigator.llm without the GPU infrastructure to make it fast enough to be useful.
Getting Started with WebGPU
Learn the Fundamentals
-
Raw WebGPU: Start with the basics
-
Use a Library: For practical projects
- Three.js: Now has WebGPU renderer
- Babylon.js: WebGPU support
- wgpu-matrix: Math utilities
Simple Compute Example
Here's a minimal compute shader that adds two arrays:
// Full working example: Add two arrays on GPU
async function gpuAdd(a, b) {
const adapter = await navigator.gpu.requestAdapter();
const device = await adapter.requestDevice();
const shader = device.createShaderModule({
code: `
@group(0) @binding(0) var<storage, read> a: array<f32>;
@group(0) @binding(1) var<storage, read> b: array<f32>;
@group(0) @binding(2) var<storage, read_write> result: array<f32>;
@compute @workgroup_size(64)
fn main(@builtin(global_invocation_id) id: vec3u) {
result[id.x] = a[id.x] + b[id.x];
}
`,
});
const pipeline = device.createComputePipeline({
layout: 'auto',
compute: { module: shader, entryPoint: 'main' },
});
// Create buffers
const size = a.byteLength;
const bufferA = device.createBuffer({
size,
usage: GPUBufferUsage.STORAGE | GPUBufferUsage.COPY_DST,
});
const bufferB = device.createBuffer({
size,
usage: GPUBufferUsage.STORAGE | GPUBufferUsage.COPY_DST,
});
const bufferResult = device.createBuffer({
size,
usage: GPUBufferUsage.STORAGE | GPUBufferUsage.COPY_SRC,
});
const bufferRead = device.createBuffer({
size,
usage: GPUBufferUsage.MAP_READ | GPUBufferUsage.COPY_DST,
});
device.queue.writeBuffer(bufferA, 0, a);
device.queue.writeBuffer(bufferB, 0, b);
const bindGroup = device.createBindGroup({
layout: pipeline.getBindGroupLayout(0),
entries: [
{ binding: 0, resource: { buffer: bufferA } },
{ binding: 1, resource: { buffer: bufferB } },
{ binding: 2, resource: { buffer: bufferResult } },
],
});
const encoder = device.createCommandEncoder();
const pass = encoder.beginComputePass();
pass.setPipeline(pipeline);
pass.setBindGroup(0, bindGroup);
pass.dispatchWorkgroups(Math.ceil(a.length / 64));
pass.end();
encoder.copyBufferToBuffer(bufferResult, 0, bufferRead, 0, size);
device.queue.submit([encoder.finish()]);
await bufferRead.mapAsync(GPUMapMode.READ);
const result = new Float32Array(bufferRead.getMappedRange().slice(0));
bufferRead.unmap();
return result;
}
// Usage
const a = new Float32Array([1, 2, 3, 4]);
const b = new Float32Array([5, 6, 7, 8]);
const result = await gpuAdd(a, b);
console.log(result); // Float32Array [6, 8, 10, 12]
Yes, it's verbose for adding numbers. But when you're processing millions of values, the parallelism makes it worthwhile.
Conclusion
WebGPU represents a fundamental expansion of web platform capabilities. By exposing modern GPU features—especially compute shaders—it enables application categories that were previously native-only:
- Real-time machine learning inference
- Advanced graphics and visualization
- Scientific simulation
- High-performance media processing
For AI in the browser specifically, WebGPU is foundational. The same infrastructure that enables WebGPU-accelerated ML inference will eventually enable native browser AI APIs.
When you hear about running LLMs in the browser or AI-powered web apps, WebGPU is what makes it possible. It's not just a graphics API—it's the compute layer that the next generation of web applications will be built on.
