App Guides

Under the Hood: How Smart Fill Runs AI Locally in Your Browser

See how Smart Fill uses MediaPipe Tasks Vision and Web Workers to run local AI photo analysis in your browser without sending images to a server or slowing the editor.

Written byTimothy StraubReviewedJune 3, 2026

10 min read

Updated June 3, 2026

GalleryPlanner's on-device AI analyzes your photos locally for composition, faces, color, and print quality so it can suggest better wall layouts without uploading your library. That keeps the app fast, private, and useful even offline.

None of that happened on a server. It happened in your browser tab.

Here's a look at how Smart Fill works: why we built it this way, what's running under the hood, and what trade-offs we made along the way.

Why Local-First AI?

When it comes to AI, we decided the best privacy feature is keeping your photos on your device.

Gallery wall photos are personal. They're family portraits, baby pictures, moments from your home. The idea of routing them through a third-party AI API—even a reputable one—felt like the wrong trade-off for a home design tool.

Running inference locally means:

Zero server costs for AI: We don't pay per-inference fees, so Smart Fill is free on all plans without metering.
Works offline: Smart Fill works with no internet connection after the model loads once.
Instant results: No roundtrip latency to a remote API.
Verifiable privacy: Users can check their browser's network tab and see zero photo uploads.

The downside is real: browser-based ML is constrained. You can't run GPT-4V locally in a browser tab. You have to choose models that are small, fast, and accurate enough for the task.

The Architecture: Three Layers

Smart Fill is split into three logical layers, each doing a specific job.

Layer 1: Isolation via Web Worker

The entire Smart Fill engine runs in a Web Worker—a background thread that's completely separate from the main browser thread.

Main Thread (UI)          Smart Fill Worker (Background)
─────────────────         ──────────────────────────────────
Canvas renders            MediaPipe Tasks Vision runtime
User interactions    ──►  Face detection
Sidebar updates           Image analysis pipeline
                    ◄──   Compatibility scores + assignments

This separation is critical. Model inference and image processing are computationally expensive. Without a worker, they'd freeze the UI every time analysis runs. With the worker, analysis runs in parallel on a separate thread while the UI stays responsive.

Layer 2: Face Detection with MediaPipe Tasks Vision

Smart Fill uses MediaPipe Tasks Vision with a face detector model bundle optimized for short-range frontal face images and runs locally in the browser.

The short-range model is designed to run in a browser at real-time speeds. It is small enough to load quickly and can process a frame in under 10ms on modern hardware.

What the detector gives us:

Face bounding boxes (x, y, width, height)
Six keypoints: eyes, nose, mouth corners
A confidence score (0–1)

We use face detection to:

Mark photos as portrait-type: Images with detected faces are flagged as portraits.
Prioritize prominent placement: Portrait images score higher for larger frames when the "Target Faces" toggle is on.

The B&W edge case: The detector is trained on color (RGB) images. Grayscale images fed directly as single-channel data significantly reduce detection accuracy. We expand grayscale images to 3-channel RGB (by repeating the luma channel) before inference. While this improves results, detection on monochrome photos is still technically less accurate than on color originals.

Layer 3: The Compatibility Scoring Engine

Face detection is just one input. For each possible photo-to-frame assignment, Smart Fill calculates a compatibility score (0–100) from four sub-scores:

Sub-Score 1: Aspect Ratio (0–25 points)

The most important check. Placing a tall portrait photo in a wide landscape frame wastes subject matter and creates awkward cropping. The engine looks for matches within 5% of the target ratio to award full points.

Sub-Score 2: Resolution (0–25 points)

A critical check, because resolution errors are irreversible. We evaluate the megapixel count of the source image to ensure it can hold up at gallery sizes. Photos below a minimum threshold are penalized heavily to prevent blurry prints.

Sub-Score 3: Composition & Orientation (0–20 points)

This ensures that portrait-oriented frames get portrait-oriented photos, and vice versa. It prevents the engine from trying to "force" a landscape shot into a vertical frame, which usually results in losing most of the image's context.

Sub-Score 4: Color Harmony (0–15 points)

This is where it gets interesting. We want color to be distributed across the wall. We extract a dominant color profile from each photo. Based on your settings (like "Prefer B&W" or "Prefer Vibrant"), the engine biases the score to help find images that fit the desired mood.

Sub-Score 5: Face Handling (0–15 points)

If "Target Faces" is enabled and the photo contains detected faces, it gets a bonus. The engine also checks if the face orientation matches the frame (e.g., a single portrait face in a vertical frame).

Total maximum score: 100 points.

The Assignment Problem

Calculating scores for every photo-frame pair is the easy part. The actual assignment—which photo gets which frame—is an optimization problem formally known as the Assignment Problem.

We use a randomized optimization approach:

Generate multiple candidates: The engine doesn't just produce one layout. It generates several distinct "solutions."
Diversity Scoring: Each solution is ranked by geographic diversity—how different it is from the other generated options. This ensures that when you click "Generate Fill Options," you see a variety of possible "looks" for your wall.
Greedy matching with a twist: Within each solution, we use a greedy algorithm to match photos to frames, but with a "temperature" setting that allows for some randomness. This prevents the AI from always picking the same "perfect" photo for the same frame every time.

This approach produces excellent results much faster than exact mathematical solvers, keeping the experience interactive.

Caching with IndexedDB

Model inference is expensive. Running face detection on 50 photos can take several seconds on the first run. To solve this, we cache all analysis results in IndexedDB, keyed by a unique hash of the image data.

This means:

First run: Full analysis, a few seconds per photo.
Subsequent runs (same session or after reload): Instant lookup from IndexedDB.
Cache invalidated if you edit the photo file and re-upload.

The cache persists across sessions—close the tab and come back a week later, and Smart Fill uses cached results instantly.

Limitations We're Honest About

Local-first AI has real constraints:

Model size caps: We're limited to models that are small enough to download quickly (~few MB). Large models that would improve face detection at unusual angles or in poor light simply aren't feasible in a browser bundle.

No retraining on your data: Cloud AI systems can learn from aggregated user behavior to improve. Ours can't. We ship a fixed model version and update it with each app release.

CPU bound on older hardware: On devices without GPU acceleration (older mobile devices, low-end laptops), inference can be noticeably slower. We show a progress indicator and process images in batches to avoid locking the UI.

Color analysis is approximate: Our dominant color extraction is a simplified approximation, not a perceptually accurate color model. For most gallery walls, it works excellently. For a wall where precise color harmony is the top priority, manual curation will outperform the algorithm.

The Performance Numbers

On a mid-range 2023 MacBook Pro:

Operation	Time
Face detection model load	~800ms (first load, cached after)
Per-image face detection	~8–15ms
Per-image color extraction	~3ms
Assignment algorithm (40 photos, 12 frames)	~1ms
Total (12-frame wall, 40 photos, first run)	~2–4 seconds
Total (cached)	~50ms

What's Next

Smart Fill 2.0 is the current foundation. By moving to local inference, Web Worker isolation, and IndexedDB caching, we've built a system that can scale as browser capabilities improve. Specific things we're exploring for the future:

Subject recognition: Beyond faces—detecting animals, landscapes, and travel markers.
Composition scoring: Analyzing rule-of-thirds compliance and focal point placement.
Improved color matching: Using more advanced perceptual color models for even better harmony.

Mastering Smart Fill — The user-facing guide to using Smart Fill.
Why Your Home Data Should Stay Home — The philosophy behind our local-first approach.
From Pixel to Print: Understanding DPI — How we calculate resolution scores.

Try Smart Fill for yourself: Open GalleryPlanner →

Transparency Note: This content was drafted with the assistance of AI tools and reviewed by our human design team for accuracy. Videos were generated using NotebookLM.

About Timothy Straub

More about Timothy

Timothy Straub is the data scientist behind GalleryPlanner. He built it around precise measurements, privacy-first design, and practical installation workflows.

Ready to Let Smart Fill Help?

Open GalleryPlanner with Smart Fill ready so you can test photo picks directly against your frame layout.

Launch GalleryPlanner

Next Steps

Browse all guides

Complete GuideOpen the complete GalleryPlanner user guide Try It in AppApply this guide in your live wall plan with real dimensions Help CenterGet FAQs, workflow tips, and troubleshooting guidance

Related Resources

App Guides

Smart Fill: AI Photo Picks for Frames

Learn how Smart Fill analyzes your photos for color, composition, resolution, and whole-wall balance so GalleryPlanner can place stronger image choices into your frame layout — including a Wall photos only option for reshuffling what you already have.

7 min read App Guides

Under the Hood: How Smart Fill Runs AI Locally in Your Browser

Why Local-First AI?

The Architecture: Three Layers

Layer 1: Isolation via Web Worker

Layer 2: Face Detection with MediaPipe Tasks Vision

Layer 3: The Compatibility Scoring Engine

Sub-Score 1: Aspect Ratio (0–25 points)

Sub-Score 2: Resolution (0–25 points)

Sub-Score 3: Composition & Orientation (0–20 points)

Sub-Score 4: Color Harmony (0–15 points)

Sub-Score 5: Face Handling (0–15 points)

The Assignment Problem

Caching with IndexedDB

Limitations We're Honest About

The Performance Numbers

What's Next

About Timothy Straub

Ready to Let Smart Fill Help?

Next Steps

Related Resources

Smart Fill: AI Photo Picks for Frames

Local-First Privacy for Your Photos

Print Resolution Guide for Wall Art

GalleryPlanner: Complete User Guide

Why Local-First AI?

The Architecture: Three Layers

Layer 1: Isolation via Web Worker

Layer 2: Face Detection with MediaPipe Tasks Vision

Layer 3: The Compatibility Scoring Engine

Sub-Score 1: Aspect Ratio (0–25 points)

Sub-Score 2: Resolution (0–25 points)

Sub-Score 3: Composition & Orientation (0–20 points)

Sub-Score 4: Color Harmony (0–15 points)

Sub-Score 5: Face Handling (0–15 points)

The Assignment Problem

Caching with IndexedDB

Limitations We're Honest About

The Performance Numbers

What's Next

Related Reading

About Timothy Straub

Ready to Let Smart Fill Help?

Next Steps

Related Resources

Smart Fill: AI Photo Picks for Frames

Local-First Privacy for Your Photos

Print Resolution Guide for Wall Art

GalleryPlanner: Complete User Guide