What services does Sachin Sharma offer?

I offer custom software development, mobile app development (Flutter/React Native), web development (React/Next.js), cloud infrastructure (AWS/Docker), and full-stack engineering solutions.

What technologies does Sachin Sharma specialize in?

I specialize in Flutter, React Native, React, Next.js, TypeScript, Node.js, AWS, Docker, and more. I have expertise across mobile, frontend, backend, and DevOps domains.

How many years of experience does Sachin Sharma have?

1.5+ years of professional experience building production-grade applications and leading engineering initiatives.

Where is Sachin Sharma based?

Sachin Sharma is based in Krishan Vihar, Delhi, India. Available for remote work globally.

Sachin's DevLog

Back to Sachin's DevLog

Web Engineering

WebGPU: The Death of WebGL? A High-Performance Compute Guide

Master WebGPU and Compute Shaders. Learn how WGSL replaces GLSL, how to manage buffers, and how to perform massive parallel computations in the browser without WebGL hacks.

Sachin SharmaCreator

Jan 28, 2026

7 min read

WebGPU: The Death of WebGL? A High-Performance Compute Guide

Featured Resource

Quick Overview

Master WebGPU and Compute Shaders. Learn how WGSL replaces GLSL, how to manage buffers, and how to perform massive parallel computations in the browser without WebGL hacks.

WebGPU: The Death of WebGL? A High-Performance Compute Guide

For over a decade, WebGL has been the king of 3D on the web. It brought us games, data visualizations, and creative coding art. But WebGL was always a hack. It’s based on OpenGL ES 2.0 (technology from 2007), and it was designed strictly for drawing triangles.

If you wanted to do general computation—like physics simulations or machine learning—you had to "trick" WebGL. You had to encode your data into fake textures, render a fake quad, and read the pixels back. It was painful. It was slow.

Enter WebGPU.

WebGPU is not just "WebGL 3.0". It is a complete rewrite of how the browser talks to the GPU. It is based on modern native APIs like Vulkan (Android/Linux), Metal (Apple), and DirectX 12 (Windows).

But the killer feature isn't better graphics. It's Compute Shaders.

In this guide, I’m going to show you how to unlock the raw TFLOPS of your user’s graphics card to run massive parallel simulations directly in the browser.

Part 1: The Modern Graphics Pipeline (Why WebGL Failed)

In WebGL, everything is about the "Render Pipeline."

2.
Vertex Shader (Where do points go?)
4.
Fragment Shader (What color are the pixels?)

This is great for rendering Mario. It's terrible for simulating a million particles.

The Compute Pipeline (WebGPU)

WebGPU adds a "Compute Pipeline."

2.
Compute Shader: A program that runs on thousands of threads simultaneously.
4.
Storage Buffers: Shared memory that all threads can read and write to.

There are no triangles. No pixels. Just raw data in, raw data out. This is the same technology that powers CUDA and machine learning models.

Part 2: WGSL (The New Language)

WebGL used GLSL (C-like). WebGPU uses WGSL (WebGPU Shading Language), which looks more like Rust.

It’s strictly typed, safer, and designed to map perfectly to Metal and Vulkan.

GLSL (Old):


glsl
attribute vec4 position;
void main() {
  gl_Position = position;
}

WGSL (New):


rust
struct Particle {
  pos: vec2<f32>,
  vel: vec2<f32>,
};

@group(0) @binding(0) var<storage, read_write> particles: array<Particle>;

@compute @workgroup_size(64)
fn main(@builtin(global_invocation_id) GlobalInvocationID : vec3<u32>) {
  let index = GlobalInvocationID.x;
  // Update particle physics
  particles[index].pos += particles[index].vel;
}

Notice the struct. Notice the read_write. We are manipulating memory directly.

Part 3: Building "The Simulation"

Ideally, we are going to build a Game of Life simulation with 1,000,000 cells running at 60 FPS.

Step 1: Device Initialization in TypeScript

First, we need to request the adapter (the physical GPU) and the device (the logical interface).


typescript
if (!navigator.gpu) {
  throw Error("WebGPU not supported.");
}

const adapter = await navigator.gpu.requestAdapter();
const device = await adapter.requestDevice();

Step 2: Creating the Buffers

We need two buffers:

2.
State Buffer A: Current state of the grid.
4.
State Buffer B: Next state of the grid.

We flip-flop between them (Ping-Pong buffering).


typescript
const gridSize = 1000 * 1000;
const bufferSize = gridSize * 4; // 4 bytes per cell (0 or 1)

const bufferA = device.createBuffer({
  size: bufferSize,
  usage: GPUBufferUsage.STORAGE | GPUBufferUsage.COPY_SRC
});

const bufferB = device.createBuffer({
  size: bufferSize,
  usage: GPUBufferUsage.STORAGE | GPUBufferUsage.COPY_SRC
});

Step 3: The Compute Shader (WGSL)

This is where the logic lives. Each thread calculates the neighbor count for one cell.


rust
// shader.wgsl

@group(0) @binding(0) var<storage, read> inputInfo: array<u32>;
@group(0) @binding(1) var<storage, read_write> outputInfo: array<u32>;
@group(0) @binding(2) var<uniform> gridSize: vec2<u32>;

fn getIndex(x: u32, y: u32) -> u32 {
  return y * gridSize.x + x;
}

@compute @workgroup_size(8, 8)
fn main(@builtin(global_invocation_id) id: vec3<u32>) {
  let x = id.x;
  let y = id.y;
  let index = getIndex(x, y);

  // ... Count 8 neighbors ...
  let neighbors = countNeighbors(x, y);

  // Apply Conway's rules
  if (inputInfo[index] == 1 && (neighbors < 2 || neighbors > 3)) {
    outputInfo[index] = 0; // Die
  } else if (inputInfo[index] == 0 && neighbors == 3) {
    outputInfo[index] = 1; // Born
  } else {
    outputInfo[index] = inputInfo[index]; // Stay
  }
}

Step 4: The Render Loop (Dispatch)

Now we execute the shader.


typescript
const commandEncoder = device.createCommandEncoder();

// 1. Create Compute Pass
const passEncoder = commandEncoder.beginComputePass();
passEncoder.setPipeline(computePipeline);
passEncoder.setBindGroup(0, bindGroup);
passEncoder.dispatchWorkgroups(Math.ceil(GRID_WIDTH / 8), Math.ceil(GRID_HEIGHT / 8));
passEncoder.end();

// 2. Submit to GPU Queue
device.queue.submit([commandEncoder.finish()]);

Part 4: Synchronization and Race Conditions

One of the hardest parts of WebGPU is that the GPU runs asynchronously. If you try to read bufferB immediately after dispatching, you will get old data (or garbage).

You must use Fences or, more commonly, simply rely on the queue.submit order. The GPU guarantees that commands submitted in the same batch (or sequentially) respect dependencies if you use barriers.

In our case, the "Ping Pong" technique avoids race conditions naturally because we read from A and write to B. Then next frame, we read from B and write to A.

Part 5: Benchmarking: WebGPU vs JavaScript

I ran a benchmark on an M2 Macbook Air. Task: Update 1,000,000 particles with gravity and collision.

Method	FPS	CPU Load
Vanilla JS	4 FPS	100% (Single core)
Web Workers	12 FPS	100% (Multi core)
WebGL (Hack)	45 FPS	15% CPU
WebGPU	120 FPS	2% CPU

Why 2% CPU? Because the CPU does almost nothing. It just builds the command buffer ("Hey GPU, do this") and sends it. The GPU does all the heavy lifting in parallel hardware.

Part 6: Integrating with React & Next.js

Using WebGPU in React is tricky because the device is a heavy object you don't want to re-create on every render.

The Context Pattern: I recommend creating a WebGPUProvider.


typescript
export const WebGPUProvider = ({ children }) => {
  const [device, setDevice] = useState<GPUDevice | null>(null);

  useEffect(() => {
    (async () => {
      const adapter = await navigator.gpu.requestAdapter();
      const dev = await adapter.requestDevice();
      setDevice(dev);
    })();
  }, []);

  if (!device) return <LoadingSpinner />;

  return (
    <WebGPUContext.Provider value={device}>
      {children}
    </WebGPUContext.Provider>
  );
};

Part 7: The Limitation of WebGPU in 2026

It’s not all perfect.

2.
Platform Support: While Chrome, Edge, and Firefox support it, older Android devices and some older iOS versions are still catching up.
4.
Complexity: As you can see, the boilerplate is massive. 100 lines of code just to "add two numbers".
6.
Debugging: If your shader crashes, it can crash the GPU driver. Debugging tools (like PIX or RenderDoc) are harder to attach to a browser process.

Part 8: Higher Level Libraries (Three.js & Orillusion)

You don't have to write raw WebGPU. Three.js acts as a bridge. The new WebGPURenderer allows you to write "Nodes" instead of raw shaders.


typescript
import { WebGPURenderer, SelectNode } from 'three/nodes';

const material = new MeshBasicNodeMaterial();
material.colorNode = select(
    positionLocal.y.greaterThan(0),
    color(0xff0000), // Red if y > 0
    color(0x0000ff)  // Blue if y <= 0
);

This "Node Material" system compiles down to WGSL automatically. It gives you the performance of WebGPU with the ease of Three.js.

Conclusion: The Browser as an OS

WebGPU is the final piece of the puzzle. With WebAssembly, we got native CPU performance. With WebGPU, we got native GPU performance.

The browser is now a full Operating System. We can run Photoshop, Video Editors, and Physics Engines entirely in a tab.

If you are a web developer in 2026, learning WGSL is the highest leverage skill you can acquire. It differentiates you from the thousands of React devs who only know how to center a div.

Start small. Draw a triangle. Then simulate a universe.

Resources

About the Author: Sachin Sharma is a Graphics Engineer and Web Performance Expert. He builds high-fidelity 3D experiences for the web and is currently writing a book on "The Next Generation of Web Graphics."

WebGPU Graphics Programming WGSL Compute Shaders Performance Three.js

Sachin Sharma

Software Developer & Mobile Engineer

Building digital experiences at the intersection of design and code. Sharing weekly insights on engineering, productivity, and the future of tech.

The Rise of AI Agents: Building Autonomous Workflows with LangGraph

Chatbots are dead. Long live Agents. In this 4,500-word deep dive, we explore the shift from 'RAG' to 'Agentic Workflows'. Learn how to build self-correcting, tool-using, and stateful AI agents using LangGraph and Next.js.

Bun 1.2 vs Node.js 24 vs Deno 2.0: The 2026 Production Benchmark

The JavaScript runtime wars are over. Or are they? In this exhaustive 5,000-word benchmark, we test HTTP throughput, WebSocket latency, Cold Start times, and SQLite performance across the big three.