Building ExAgent: A Chat-Driven Canvas Controller

How we designed a streaming AI agent that creates, moves, resizes, and styles Excalidraw elements through natural conversation — with context-aware actions and undo support.

2026-02-206 min read

What is ExAgent and Why It Exists

ExAgent is a chat-driven canvas controller for SparkLumina. Instead of using menus and toolbars, users describe what they want in plain language: "Create a blue circle in the center," "Move the rectangle to the right," "Resize the ellipse to be larger." The AI interprets these intents and emits structured actions that are executed on the Excalidraw canvas in real time.

The motivation is to lower the barrier between thought and visual output — especially for teachers who want to focus on explaining, not on tool selection. ExAgent also enables complex multi-step workflows (e.g., "Create a flowchart, then align all shapes, then add labels") through a single conversation.

Architecture: ExcalidrawAgent + AgentService + Shared Types

ExAgent is implemented across three layers:

Frontend — ExcalidrawAgent.ts

The main agent class lives in the frontend. It holds a reference to the Excalidraw imperative API (api) and manages:

Atoms — Reactive state for chat history ($chatHistory), todo list ($todoList), context items ($contextItems), user action history ($userActionHistory), and request state ($activeRequest, $scheduledRequest). We use a custom atom implementation in stateUtils.ts; the tldraw dependency was removed for a lighter, Excalidraw-only stack.
Action utilities — Each action type (create, move, resize, etc.) has an AgentActionUtil that knows how to apply it to the canvas. The agent dispatches completed actions to the correct util via getAgentActionUtil(action._type).
Streaming parser — Incoming SSE/JSON streams are parsed incrementally. The agent consumes completed action objects and calls act(action) for each.

Backend — AgentService

The Node.js backend exposes POST /api/agent/stream. It receives the user message, optional context (selected shapes, viewport, screenshot), and optional LLM overrides. It calls the configured LLM (e.g., Claude) with a system prompt built from the action utils’ buildSystemPrompt() methods, then streams back JSON action objects. The response schema is built from the same shared action types used by the frontend.

Shared Types

Action schemas (e.g., CreateAction, MoveAction, ResizeAction) are defined in shared TypeScript modules. Both frontend and backend import them for parsing, validation, and prompt construction. This keeps the protocol consistent and reduces drift.

Supported Action Types

ExAgent supports a wide range of canvas operations:

create — Add new shapes (rectangles, ellipses, diamonds, arrows, text, freehand). Shapes can be specified as simple shapes or library shapes (e.g., flowchart symbols).
update — Change properties of existing elements (stroke color, fill, text content, etc.).
delete — Remove elements by ID or by selection.
move — Translate elements to new (x, y) positions.
resize — Change width and height.
rotate — Rotate elements by angle.
pen — Freehand drawing (freedraw) with optional smooth styling.
clear — Clear the canvas or a region.
message — Chat-only responses (no canvas change).
update-todo-list — Add, update, or complete todo items shown in the UI.

Additional actions include place (position shapes at coordinates), align, distribute, bringToFront, sendToBack, label, setMyView (pan/zoom to a region), addDetail (planning), countShapes, and think (internal reasoning). The UnknownActionUtil handles incomplete or unrecognized action types during streaming.

Context Awareness

ExAgent is context-aware. Before each request, the agent gathers:

Selected shapes — Elements the user has selected on the canvas. Passed to the LLM so it can refer to "the blue rectangle" or "these three circles."
Encircled shapes — Shapes inside a user-drawn region (picking mode). The agent receives their IDs and properties.
Viewport bounds — Current scroll and zoom, so the model knows what is visible and can suggest placements within view.
Screenshot — Optional canvas screenshot for visual grounding.
User action history — ElementsDiff (added, updated, removed since last request). Helps the model understand recent changes.

All of this is assembled into prompt parts by PromptPartUtil instances (e.g., SelectedShapesPartUtil, UserActionHistoryPartUtil, ViewportBoundsPartUtil). The LLM sees a structured description of the canvas state and can make informed decisions about targets and coordinates.

Streaming Action Execution

The backend streams actions as JSON. The frontend uses a streaming JSON parser that yields complete action objects as they become available. Each action has a _type field; the agent looks up the corresponding AgentActionUtil and calls util.applyAction(action, helpers).

Actions can be incomplete while streaming (e.g., create- before the full create-shape payload arrives). The agent checks action.complete; only complete actions are executed. Incomplete ones are ignored to avoid partial or erroneous canvas updates.

After applying an action, the agent computes an ElementsDiff (before vs. after scene elements) and syncs it for collaboration and recording. Actions that savesToHistory() return true are appended to the chat history with a summary (e.g., "Created 1 rectangle").

State Management with Atoms

ExAgent uses a custom atom-based state system in stateUtils.ts:

atom<T>(name: string, initial: T): Atom<T>

Each atom has get(), set(), update(fn), and subscribe(listener). React components use hooks like useValue(atom) to react to changes. The agent’s $chatHistory, $todoList, $contextItems, and $userActionHistory are all atoms. This replaces the previous tldraw-based state, keeping ExAgent independent of tldraw’s editor model.

Collaborative Teaching Mode (scene_1v1_teaching)

ExAgent supports a scene_1v1_teaching mode for collaborative tutoring. When enabled, the backend uses different prompts and templates tuned for step-by-step problem solving. The agent can incorporate lecture-style content, structured board content (板书), and interactive Q&A. The llmConfigOverride allows the host to use the room’s global LLM settings (model, temperature, etc.) instead of the agent’s default, so ExAgent and LectureSidebar can share the same configuration.

Chat History and Todo List Management

Chat history stores each user message and the agent’s response. For canvas-modifying actions, the history entry includes a brief description (e.g., "Moved 2 shapes") and a reference to the action. The ChatHistoryPartUtil includes recent history in the prompt so the model has conversational context.

Todo list is a separate atom ($todoList). The agent can emit update-todo-list actions to add or update items. When all todos are marked done and the user sends a new message, the todo list can be cleared automatically. The TodoListPartUtil adds the current list to the prompt so the model can track multi-step tasks.

Summary

ExAgent turns natural language into canvas actions through a streaming pipeline: user message → backend LLM → JSON actions → ExcalidrawAgent → AgentActionUtil.applyAction() → updateScene. Context (selection, viewport, history) is gathered via prompt parts; atoms manage reactive state; and the shared type system keeps frontend and backend in sync. The result is a chat-driven canvas controller that supports create, move, resize, style, and many more operations — all with context awareness and collaborative teaching support.