AI-Powered Video Editor

Overview

As a Founding Engineer at ngram, I built a web-based video editor with AI capabilities that no competing tool had at the time. Users can manipulate timelines, apply effects, generate captions, and manage clips using natural language instead of traditional timeline-based editing.

The main idea: an AI companion that understands what the user wants from a conversational prompt and turns it into editing operations. Instead of dragging clips or tweaking keyframes, you just say "trim the intro to 3 seconds" or "add a fade between scenes 2 and 3," and the AI does it on the timeline in real time.

Building this from scratch meant tightly integrating browser-based rendering, AI inference, and a state management system that handles undo/redo across both AI-initiated and manual edits.

Key Features

Natural language editing interface that turns conversational prompts into timeline operations
AI companion that's aware of the current project state and can handle multi-step editing workflows
Timeline engine with frame-accurate cuts, splits, trims, and rearrangements
Clip management with organization, tagging, and search across media libraries
Real-time effects including transitions, filters, color grading, and motion graphics
Automated caption generation with speaker detection, timing sync, and style customization
WebGL-powered preview rendering for instant visual feedback while editing
Export pipeline with multiple formats, resolutions, and compression profiles, processed in the background

Tech Stack

React

Editor UI, timeline controls, and property panels with optimized re-rendering for smooth interactions

TypeScript

Type safety across the editor codebase, keeping data flow between AI outputs and editing operations reliable

AI/ML

NLU pipeline for interpreting edit commands, plus computer vision models for scene detection and caption generation

WebGL

GPU-accelerated rendering for real-time video preview, effects compositing, and transitions in the browser

FFmpeg

Server-side and WASM-based video processing for transcoding, format conversion, and export rendering

Node.js

Backend for AI inference orchestration, media asset management, and export job processing

Architecture

Layered architecture: a React presentation layer talks to the editing engine through a command bus. The AI companion and the traditional UI sit side by side as parallel input layers, both emitting the same normalized edit commands. WebGL handles real-time preview, FFmpeg handles final exports. State is managed through an immutable operation log, which gives us full undo/redo and collaborative editing.

Challenges & Solutions

Bridging AI intent with precise editing operations

Natural language is ambiguous, but video editing needs frame-level precision. I designed an intermediate representation layer that converts AI-parsed intent into deterministic editing commands. This meant building a context engine that tracks the current timeline state (clip positions, applied effects, active selections) so the AI can resolve references like "the last clip" or "the transition I just added" into concrete timeline addresses.

Real-time rendering performance in the browser

Rendering video previews with effects, transitions, and captions in real time inside a browser is demanding. I built a tiered rendering system: a lightweight canvas-based preview for scrubbing, a WebGL-accelerated pipeline for playback with effects, and an FFmpeg-based backend for final exports. Memory management and texture pooling were essential to keep the browser from running out of GPU memory during long editing sessions.