Back to home

Overview

As a Founding Engineer at ngram, I built a web-based video editor with AI capabilities that no competing tool had at the time. Users can manipulate timelines, apply effects, generate captions, and manage clips using natural language instead of traditional timeline-based editing.


The main idea: an AI companion that understands what the user wants from a conversational prompt and turns it into editing operations. Instead of dragging clips or tweaking keyframes, you just say "trim the intro to 3 seconds" or "add a fade between scenes 2 and 3," and the AI does it on the timeline in real time.


Building this from scratch meant tightly integrating browser-based rendering, AI inference, and a state management system that handles undo/redo across both AI-initiated and manual edits.

Key Features

Tech Stack

React
Editor UI, timeline controls, and property panels with optimized re-rendering for smooth interactions
TypeScript
Type safety across the editor codebase, keeping data flow between AI outputs and editing operations reliable
AI/ML
NLU pipeline for interpreting edit commands, plus computer vision models for scene detection and caption generation
WebGL
GPU-accelerated rendering for real-time video preview, effects compositing, and transitions in the browser
FFmpeg
Server-side and WASM-based video processing for transcoding, format conversion, and export rendering
Node.js
Backend for AI inference orchestration, media asset management, and export job processing

Architecture

Layered architecture: a React presentation layer talks to the editing engine through a command bus. The AI companion and the traditional UI sit side by side as parallel input layers, both emitting the same normalized edit commands. WebGL handles real-time preview, FFmpeg handles final exports. State is managed through an immutable operation log, which gives us full undo/redo and collaborative editing.

Challenges & Solutions

Bridging AI intent with precise editing operations

Natural language is ambiguous, but video editing needs frame-level precision. I designed an intermediate representation layer that converts AI-parsed intent into deterministic editing commands. This meant building a context engine that tracks the current timeline state (clip positions, applied effects, active selections) so the AI can resolve references like "the last clip" or "the transition I just added" into concrete timeline addresses.

Real-time rendering performance in the browser

Rendering video previews with effects, transitions, and captions in real time inside a browser is demanding. I built a tiered rendering system: a lightweight canvas-based preview for scrubbing, a WebGL-accelerated pipeline for playback with effects, and an FFmpeg-based backend for final exports. Memory management and texture pooling were essential to keep the browser from running out of GPU memory during long editing sessions.