AppLaunch App
Home/Learn/Under the Hood: How Smart Fill Runs AI Locally in Your Browser
App Guides

Under the Hood: How Smart Fill Runs AI Locally in Your Browser

See how Smart Fill uses TensorFlow.js, BlazeFace, and Web Workers to run local AI photo analysis in your browser without sending images to a server or slowing the editor.

10 min read
Updated March 25, 2026
Under the Hood: How Smart Fill Runs AI Locally in Your Browser

When you click "Generate Fill Options" in GalleryPlanner, something unusual happens. A lightweight neural network fires up, analyzes dozens of photos for faces and composition, calculates color harmony across every frame, evaluates print quality for each combination, and produces a set of complete wall options—often in under two seconds.

None of that happened on a server. It happened in your browser tab.

Here's a look at how Smart Fill works: why we built it this way, what's running under the hood, and what trade-offs we made along the way.


Why Local-First AI?

When it comes to AI, we decided the best privacy feature is keeping your photos on your device.

Gallery wall photos are personal. They're family portraits, baby pictures, moments from your home. The idea of routing them through a third-party AI API—even a reputable one—felt like the wrong trade-off for a home design tool.

Running inference locally means:

  • Zero server costs for AI: We don't pay per-inference fees, so Smart Fill is included in the base Pro subscription without metering.
  • Works offline: Smart Fill works with no internet connection after the model loads once.
  • Instant results: No roundtrip latency to a remote API.
  • Verifiable privacy: Users can check their browser's network tab and see zero photo uploads.

The downside is real: browser-based ML is constrained. You can't run GPT-4V locally in a browser tab. You have to choose models that are small, fast, and accurate enough for the task.


The Architecture: Three Layers

Smart Fill is split into three logical layers, each doing a specific job.

Layer 1: Isolation via Web Worker

The entire Smart Fill engine runs in a Web Worker—a background thread that's completely separate from the main browser thread.

Main Thread (UI)          Smart Fill Worker (Background)
─────────────────         ──────────────────────────────────
Canvas renders            TensorFlow.js WASM runtime
User interactions    ──►  BlazeFace face detection
Sidebar updates           Image analysis pipeline
                    ◄──   Compatibility scores + assignments

This separation is critical. TensorFlow.js model inference and image processing are computationally expensive. Without a worker, they'd freeze the UI every time analysis runs. With the worker, analysis runs in parallel on a separate thread while the UI stays responsive.

Layer 2: Face Detection with BlazeFace

Smart Fill uses BlazeFace, a lightweight face detection model developed by Google and trained on short-range frontal face images.

BlazeFace is designed to run in a browser at real-time speeds. It's only ~1.3MB and can process a frame in under 10ms on modern hardware.

What BlazeFace gives us:

  • Face bounding boxes (x, y, width, height)
  • Six keypoints: eyes, nose, mouth corners
  • A confidence score (0–1)

We use face detection to:

  1. Mark photos as portrait-type: Images with detected faces are flagged as portraits.
  2. Prioritize prominent placement: Portrait images score higher for larger frames when the "Target Faces" toggle is on.

The B&W edge case: The BlazeFace model is trained on color (RGB) images. Grayscale images fed directly as single-channel data significantly reduce detection accuracy. We expand grayscale images to 3-channel RGB (by repeating the luma channel) before inference. While this improves results, detection on monochrome photos is still technically less accurate than on color originals—a limitation of the underlying model.

Layer 3: The Compatibility Scoring Engine

Face detection is just one input. For each possible photo-to-frame assignment, Smart Fill calculates a compatibility score (0–100) from four sub-scores:

Sub-Score 1: Aspect Ratio (0–25 points)

The most important check. Placing a tall portrait photo in a wide landscape frame wastes subject matter and creates awkward cropping. The engine looks for matches within 5% of the target ratio to award full points.

Sub-Score 2: Resolution (0–25 points)

A critical check, because resolution errors are irreversible. We evaluate the megapixel count of the source image to ensure it can hold up at gallery sizes. Photos below a minimum threshold are penalized heavily to prevent blurry prints.

Sub-Score 3: Composition & Orientation (0–20 points)

This ensures that portrait-oriented frames get portrait-oriented photos, and vice versa. It prevents the engine from trying to "force" a landscape shot into a vertical frame, which usually results in losing most of the image's context.

Sub-Score 4: Color Harmony (0–15 points)

This is where it gets interesting. We want color to be distributed across the wall. We extract a dominant color profile from each photo. Based on your settings (like "Prefer B&W" or "Prefer Vibrant"), the engine biases the score to help find images that fit the desired mood.

Sub-Score 5: Face Handling (0–15 points)

If "Target Faces" is enabled and the photo contains detected faces, it gets a bonus. The engine also checks if the face orientation matches the frame (e.g., a single portrait face in a vertical frame).

Total maximum score: 100 points.


The Assignment Problem

Calculating scores for every photo-frame pair is the easy part. The actual assignment—which photo gets which frame—is an optimization problem formally known as the Assignment Problem.

We use a randomized optimization approach:

  1. Generate multiple candidates: The engine doesn't just produce one layout. It generates several distinct "solutions."
  2. Diversity Scoring: Each solution is ranked by geographic diversity—how different it is from the other generated options. This ensures that when you click "Generate Fill Options," you see a variety of possible "looks" for your wall.
  3. Greedy matching with a twist: Within each solution, we use a greedy algorithm to match photos to frames, but with a "temperature" setting that allows for some randomness. This prevents the AI from always picking the same "perfect" photo for the same frame every time.

This approach produces excellent results much faster than exact mathematical solvers, keeping the experience interactive.


Caching with IndexedDB

Model inference is expensive. Running BlazeFace on 50 photos can take several seconds on the first run. To solve this, we cache all analysis results in IndexedDB, keyed by a unique hash of the image data.

This means:

  • First run: Full analysis, a few seconds per photo.
  • Subsequent runs (same session or after reload): Instant lookup from IndexedDB.
  • Cache invalidated if you edit the photo file and re-upload.

The cache persists across sessions—close the tab and come back a week later, and Smart Fill uses cached results instantly.


Limitations We're Honest About

Local-first AI has real constraints:

Model size caps: We're limited to models that are small enough to download quickly (~few MB). Large models that would improve face detection at unusual angles or in poor light simply aren't feasible in a browser bundle.

No retraining on your data: Cloud AI systems can learn from aggregated user behavior to improve. Ours can't. We ship a fixed model version and update it with each app release.

CPU bound on older hardware: On devices without GPU acceleration (older mobile devices, low-end laptops), inference can be noticeably slower. We show a progress indicator and process images in batches to avoid locking the UI.

Color analysis is approximate: Our dominant color extraction is a simplified approximation, not a perceptually accurate color model. For most gallery walls, it works excellently. For a wall where precise color harmony is the top priority, manual curation will outperform the algorithm.


The Performance Numbers

On a mid-range 2023 MacBook Pro:

OperationTime
BlazeFace model load~800ms (first load, cached after)
Per-image face detection~8–15ms
Per-image color extraction~3ms
Assignment algorithm (40 photos, 12 frames)~1ms
Total (12-frame wall, 40 photos, first run)~2–4 seconds
Total (cached)~50ms

What's Next

Smart Fill 2.0 is the current foundation. By moving to local inference, Web Worker isolation, and IndexedDB caching, we've built a system that can scale as browser capabilities improve. Specific things we're exploring for the future:

  • Subject recognition: Beyond faces—detecting animals, landscapes, and travel markers.
  • Composition scoring: Analyzing rule-of-thirds compliance and focal point placement.
  • Improved color matching: Using more advanced perceptual color models for even better harmony.

Try Smart Fill for yourself: Open GalleryPlanner →

Transparency Note: This content was drafted with the assistance of AI tools and reviewed by our human design team for accuracy. Videos were generated using NotebookLM.

Ready to Let Smart Fill Help?

Open GalleryPlanner with Smart Fill ready so you can test photo picks directly against your frame layout.

Launch GalleryPlanner