SparkLumina
Back to Blog

How AI Drawing Works: From Text to Canvas

A deep dive into streaming ExcalidrawElementSkeleton generation, audio-visual sync with TTS, and how AI-drawn elements are sanitized and applied to the shared canvas in real-time.

2026-02-255 min read

Overview

AI Smart Drawing in SparkLumina turns natural language descriptions into Excalidraw elements on a shared canvas. The flow involves a streaming LLM, a structured skeleton format, sanitization, conversion, and optional text-to-speech (TTS) narration that stays in sync with each element appearing on the canvas. This post walks through each step.

ExcalidrawElementSkeleton Format

The AI generates elements in a JSON array format we call ExcalidrawElementSkeleton. Each object has a type and the fields required for that type.

Supported types:

  • text — Text labels. Required: text, x, y. Optional: fontSize, strokeColor.
  • rectangle — Rectangles. Required: x, y, width, height. Optional: backgroundColor, strokeColor, strokeWidth, label.
  • ellipse — Ellipses. Same layout as rectangle.
  • diamond — Diamonds. Same layout.
  • arrow — Arrows between shapes. Required: x, y, width, height, plus start and end objects with id references to other elements (e.g., start: { id: "ellipse-1" }, end: { id: "ellipse-2" }).
  • line — Lines. Similar to arrows.
  • image — Images (with src or data URL).

Extended fields:

  • label — For shapes and arrows: { text: string, fontSize?: number }. Renders as bound text.
  • script — TTS narration text. Not part of the Excalidraw API; used only for audio-visual sync. Must be stripped before conversion.

The skeleton format is a minimal description. The backend LLM outputs JSON arrays; the frontend parses, validates, and converts them into full Excalidraw elements.

Subject × Chart Type Two-Level Selector

To steer the AI toward specific styles (e.g., scientific diagrams vs. journal layouts), AISidebar provides a subject × chart type selector:

  • Subject: The domain — science, math, journal, general illustration, etc.
  • Chart type: The layout — flowchart, labeled diagram, decorative elements, etc.

These choices are passed into the system prompt so the model produces content that matches the selected style. This improves consistency and reduces irrelevant outputs.

Streaming Generation Pipeline

The LLM response is streamed token-by-token. The frontend does not wait for the full JSON array; instead, it uses a streaming parser called parseStreamingSkeletonArray (in jsonRepair.ts).

Important constraint: The parser returns only fully closed objects. Incomplete fragments (e.g., an object missing a closing brace) are not emitted. This avoids pushing half-formed elements to the canvas, which would cause render glitches or disappear when the final structure differs. The parser tracks object depth, waits for complete objects, and validates each against a minimal schema (e.g., text needs text + x/y; arrows need start/end IDs).

Validation uses SKELETON_TYPES and type-specific checks: hasXY, hasWH, hasStartEndIds, etc. Only objects that pass validation are queued for application.

Sanitization (skeletonSanitize.ts)

Raw skeleton objects may include fields that Excalidraw does not accept. skeletonSanitize.ts provides:

  • sanitizeSkeletonElements(elements) — Iterates over the array, shallow-copies each object, and removes keys listed in EXTENDED_KEYS (e.g., script). It preserves type, id, x, y, width, height, text, fontSize, label, start, end, and other standard fields.
  • ensureSkeletonIds(elements) — Ensures each element has a stable id (for regenerateIds: false and for id-based mapping when syncing from canvas).

Sanitization runs before convertToExcalidrawElements. Without it, unknown keys could cause Excalidraw to throw or behave unpredictably.

Conversion to Excalidraw Elements

The sanitized array is passed to convertToExcalidrawElements, which is exported by @excalidraw/excalidraw. This function:

  1. Creates proper Excalidraw element instances (with correct boundElements, containerId, etc.).
  2. Resolves arrow start/end references to element IDs.
  3. Optionally regenerates IDs when regenerateIds is true (default) or preserves them when false for sync scenarios.

The result is a list of native ExcalidrawElement objects ready for api.updateScene({ elements }).

Audio-Visual Sync

When TTS is enabled, each element should appear on the canvas exactly when its narration plays. The flow is:

  1. Parse the streaming array and collect complete skeleton objects.
  2. For each object: sanitizeconvertapply to the scene.
  3. If the object has a script field, fetch TTS audio (or use a cached blob) and play it.
  4. Wait for the audio to finish (or a minimum duration) before processing the next element.

Because we apply one element at a time and sync with playback, the user sees drawings appear in sync with the spoken explanation. The script is removed during sanitization and never reaches Excalidraw; it is used only for the TTS pipeline.

Multi-User Sync

Once elements are applied locally, they must reach other users in the room. SparkLumina uses Socket.IO:

  • The client that applies the change emits canvas-change (or the equivalent event) with the updated elements.
  • The server broadcasts to the room.
  • Other clients receive remote-changes and merge the new elements into their scene.

The same sanitize-and-convert pipeline runs on the applying client; the broadcast carries Excalidraw elements, not raw skeletons. Conflict resolution is version-based so that concurrent edits from multiple users converge correctly.

Export Capabilities

Users can export the canvas in several formats:

  • PNG — Raster image via Excalidraw’s export API.
  • SVG — Vector export for diagrams and illustrations.
  • .excalidraw — Native Excalidraw file (JSON) for editing later or sharing.

These exports work on the current scene, including both hand-drawn and AI-generated elements.

Summary

AI Drawing in SparkLumina follows a clear pipeline: text promptstreaming LLMparseStreamingSkeletonArray (only closed objects) → sanitizeSkeletonElements (remove script etc.) → convertToExcalidrawElementsupdateScene + optional TTS. The subject × chart type selector guides style, and Socket.IO keeps collaborators in sync. The result is a fluid, collaborative, and narratively coherent drawing experience.