System Design

System Design: Architecting a Real-Time Collaboration Engine (Like Figma)

Master Real-Time System Design. Learn about CRDTs (Conflict-free Replicated Data Types), WebSocket scaling with Redis Pub/Sub, and handling offline-first synchronization.

Sachin Sharma
Sachin SharmaCreator
Feb 14, 2026
5 min read
System Design: Architecting a Real-Time Collaboration Engine (Like Figma)
Featured Resource
Quick Overview

Master Real-Time System Design. Learn about CRDTs (Conflict-free Replicated Data Types), WebSocket scaling with Redis Pub/Sub, and handling offline-first synchronization.

System Design: Architecting a Real-Time Collaboration Engine (Like Figma)

Building a Chat App is the "Hello World" of Real-Time. Building a Collaboration Engine (Google Docs, Figma, Trello) is the boss fight.

The challenge isn't just sending messages. The challenge is Consistency.

  • User A types "Hello" (at index 0).
  • User B types "World" (at index 0).
  • Both are offline for 100ms.
  • They sync.

What happens?

  • "HelloWorld"?
  • "WorldHello"?
  • "HWeorllldo"? (This happens if you naively merge indices).

To solve this, we cannot use simple databases. We need CRDTs (Conflict-free Replicated Data Types).

In this deep dive, we will design a multi-user whiteboard system that handles:

  1. 2.
    Offline editing.
  2. 4.
    Conflict resolution.
  3. 6.
    Scaling to 100k concurrent users.

Part 1: The Primitive (CRDT vs OT)

Google Docs uses OT (Operational Transformation). It requires a central server to transform operations (User A: insert at 0, User B: insert at 0 -> transform B to insert at 5). It is complex and centralized.

Figma uses CRDTs (Fractional Indexing). CRDTs are data structures that always merge to the same state, regardless of the order in which updates are applied.

Example: The Sequence CRDT (Yjs) Instead of "Insert at Index 0", we say: "Insert ID: UserA-1 after ID: root".

When User A and User B both insert after root:

  • User A: "Hello" (ID: A1)
  • User B: "World" (ID: B1)

The system uses the User ID as a tie-breaker. Result: root -> A1 ("Hello") -> B1 ("World"). Every client arrives at this same result mathematically. No central server needed for logic.


Part 2: The WebSocket Gateway

CRDTs are just the data structure. We need to move the binary updates between clients.

Architecture:

  • Client: React + Yjs (Library).
  • Gateway: Node.js + ws.
  • Bus: Redis Pub/Sub.

Why Redis? WebSockets are stateful (TCP).

  • User A connects to Server 1.
  • User B connects to Server 2.

If User A draws a line, Server 1 receives it. Server 1 must Publish this update to a Redis Channel (room:123). Server 2 Subscribes to room:123 and forwards the update to User B.

typescript
// server.ts import { WebSocketServer } from 'ws'; import { createClient } from 'redis'; const pub = createClient(); const sub = createClient(); const wss = new WebSocketServer({ port: 8080 }); wss.on('connection', (ws, req) => { const roomId = parseRoom(req.url); // 1. Subscribe to Redis for this room sub.subscribe(`room:${roomId}`, (message) => { ws.send(message); // Forward to client }); // 2. Publish client updates to Redis ws.on('message', (message) => { pub.publish(`room:${roomId}`, message); saveToDB(roomId, message); // Async persistence }); });

Part 3: Protocol Buffers & Binary Encodings

JSON is too slow for mouse movements (60 updates per second). Stringifying { "x": 100, "y": 200, "id": "uuid" } creates massive GC pressure.

We use Binary Encodings. Yjs naturally encodes document updates as Uint8Array. This makes the payload 10x smaller than JSON.

Optimization: Throttling & Debouncing Do not send every mouse pixel.

  1. 2.
    Local: Update UI at 60fps (optimistic).
  2. 4.
    Network: Send accumulated updates every 50ms (20fps).

This makes the network traffic "bursty" but efficient.


Part 4: Persistent Storage (The "Save" Button)

If all users disconnect, the data is lost from RAM. We need a database. But updating Postgres every 50ms is instant death.

Strategy: The Write-Behind Buffer.

  1. 2.
    All updates go to Redis Stream.
  2. 4.
    A separate "Worker" process reads the stream.
  3. 6.
    The Worker debounces writes to S3/Postgres (e.g., save snapshot every 10 seconds).

The Snapshot: Yjs allows us to encode the entire document state into a binary blob. We save this blob to S3: doc-123-snapshot-v45.bin.

On Load:

  1. 2.
    Server fetches Snapshot from S3.
  2. 4.
    Sends Snapshot to Client.
  3. 6.
    Client hydrates CRDT.

Part 5: Handling "The Presence" (Who is here?)

You know those colorful avatars in the top right? That is "Ephemeral State". It doesn't need to be saved to disk.

Implementation: Use "Heartbeats".

  1. 2.
    Client sends: { type: "ping", user: "Sachin" } every 5 seconds.
  2. 4.
    Server stores in Redis: SETEX room:123:user:sachin 10 "active".
  3. 6.
    If no Ping for 10 seconds, Redis key expires.
  4. 8.
    Server broadcasts "User Left".

Part 6: Offline Support (IndexedDB)

The beauty of CRDTs is that Offline is trivial. If the WebSocket drops, the user continues editing local Yjs state.

We create a y-indexeddb provider. It syncs the CRDT state to the browser's IndexedDB on every change.

When the internet returns:

  1. 2.
    Client connects to WebSocket.
  2. 4.
    Client sends its entire local state vector.
  3. 6.
    Server computes the "Diff" (only the changes the server hasn't seen).
  4. 8.
    Server requests necessary updates.

This is exactly how Git works.


Conclusion: Complexity vs Value

Building a real-time engine is one of the hardest engineering challenges. You have to deal with:

  • Distributed Systems (Redis/PubSub).
  • Binary protocols.
  • Mathematics (CRDTs).
  • Browser Storage (IndexedDB).

But once you build it, you unlock a class of applications that feel "Alive". The static web is dying. The collaborative web is here.

Resources


About the Author: Sachin Sharma has architected real-time collaboration tools for enterprise clients, scaling WebSocket clusters to support 50k concurrent sessions.

Sachin Sharma

Sachin Sharma

Software Developer & Mobile Engineer

Building digital experiences at the intersection of design and code. Sharing weekly insights on engineering, productivity, and the future of tech.