High-Performance Media Processing Tool
Native desktop app for video and image optimization
A native macOS application that optimizes media files for e-commerce product pages. Uses hardware acceleration for video encoding and AI-powered upscaling, delivering professional-quality output with minimal file sizes.
The Problem
E-commerce teams uploading product media faced a tedious workflow: videos were too large for web, images needed compression, and low-resolution assets required expensive upscaling services or manual Photoshop work.
- Product videos too large for Shopify (exceeding upload limits)
- Image compression tools produced visible quality loss
- AI upscaling services charged per-image (expensive at scale)
- No unified tool for video + image optimization workflow
Constraints
- Must work offline, no cloud dependencies for sensitive product imagery
- Processing speed must beat cloud services for typical batch sizes
- Output quality must match professional editing software
- App size under 50MB (avoid Electron bloat for simple utility)
The Approach
Built with Tauri 2.0 and Rust for native performance and small footprint. Leveraged Apple's VideoToolbox for hardware-accelerated encoding and bundled Real-ESRGAN for offline AI upscaling. Result: 13MB app vs 175MB Electron equivalent.
Why Tauri over Electron?
Electron bundles Chromium (175MB+). Tauri uses native WebView (0MB additional). Result: 93% smaller app, 50MB idle memory vs 300-500MB, and native macOS integration (dialogs, permissions).
Why Rust backend instead of Node.js?
Rust provides direct FFmpeg process management with memory safety. No GC pauses during long video encodes. Compile-time safety for complex command construction. Native async for concurrent operations.
Why bundle FFmpeg instead of FFmpeg.wasm?
FFmpeg.wasm can't access VideoToolbox (hardware acceleration). Native FFmpeg with VideoToolbox is 8-15x faster than software encoding. Also avoids 2GB file size limit of WebAssembly.
Why three upscaling tiers instead of one?
Different content needs different algorithms. HQx for pixel art/animation (fast, clean). Lanczos for photos (balanced). Real-ESRGAN for maximum quality (slow but best). Users choose based on content type.
Key Tradeoffs
macOS only (no Windows/Linux)
VideoToolbox is Apple-only. Cross-platform would mean software encoding everywhere, losing the performance advantage. Target users (design team) all use Macs.
Real-ESRGAN limited to 150 frames (~5 seconds of video)
Frame-by-frame AI upscaling is extremely slow. 150 frames takes minutes. Longer videos use traditional upscaling with option to segment critical scenes for AI enhancement.
Bundled binaries add 174MB to distribution
FFmpeg (148MB) + Real-ESRGAN (26MB) ensure tool works without dependencies. Users don't need Homebrew knowledge. Tradeoff: larger download for self-contained experience.
Implementation Highlights
VideoToolbox hardware acceleration
Automatic detection of Apple Silicon. Uses h264_videotoolbox encoder when available, falls back to libx264 on Intel. Achieves 8-15x realtime encoding speed on M1/M2/M3/M4 Macs.
FFmpeg path resolution with fallbacks
Checks bundled binary first, then system PATH, then Homebrew location. Ensures app works out-of-box while allowing users to substitute their own FFmpeg version.
Real-time progress streaming
Rust parses FFmpeg's -progress pipe output in real-time. Calculates ETA, current speed, and percentage. Progress events stream to React frontend via Tauri IPC.
Memory-optimized tiling for large images
Real-ESRGAN processes images in 256px tiles to reduce peak memory by ~70% (3-5GB → 1.5GB). Enables upscaling 100MP+ images without OOM crashes.
Subprocess crash isolation
FFmpeg/Real-ESRGAN run as child processes. If encoder crashes on corrupted input, main app continues. Global PID tracking enables clean cancellation.
Outcomes
What I'd Do Differently
- Add Windows support from the start using different acceleration APIs (NVENC, QSV). macOS-only limits adoption even though it was right for initial use case.
- Implement pause/resume for long operations. Current kill-only approach frustrates users who need to interrupt and continue later.
- Build better preview functionality. Users want to compare before/after before committing to full encode, especially for quality-sensitive content.