The Problem
YouTube videos are full of filler. Intros, sponsor segments, recaps, and off-topic tangents eat up huge chunks of your time. You know the answer to your question is somewhere in that 45-minute video, but you're stuck scrubbing through the timeline trying to find it.
I built SkipVid to fix this. It's a Chrome extension backed by a FastAPI server that uses a combination of machine learning, natural language processing, and large language models to give you intelligent control over any YouTube video.
What It Does
- Auto-skip intros and sponsors - An XGBoost model trained on video transcript patterns detects intro/sponsor segments and skips them automatically
- Keyword search - Jump to exact moments where specific words are spoken (free tier)
- Concept search - Find topics even when the exact words aren't used, powered by spaCy NLP
- Search-AI - Ask natural language questions like "where does he talk about the database setup?" and get timestamps
- Ask-AI - Have a full conversation with the video content using an LLM
Architecture
Backend (FastAPI + Python)
The backend runs on Python 3.11 with FastAPI, deployed on Render via Docker. It handles all the heavy lifting: transcript fetching, ML inference, NLP processing, and LLM orchestration.
# Key backend components
├── app/
│ ├── main.py # FastAPI app, routes, middleware
│ ├── routes_skip.py # /api/skip - ML + SponsorBlock fusion
│ ├── routes_semantic_free.py # Concept search (spaCy)
│ ├── routes_stripe.py # Stripe checkout + webhooks
│ ├── deps.py # Auth dependencies
│ └── security.py # JWT handling
├── intro_detector/
│ └── model/ # XGBoost trained model (92.9% accuracy)
└── Dockerfile
Chrome Extension (MV3)
The extension uses Manifest V3 with a shadow DOM UI injected into YouTube pages. This keeps
the extension's styles completely isolated from YouTube's CSS. A service worker (bg.js)
handles all API communication, and the content script manages the UI and video player integration.
The Intro Detection Model
One of the most interesting parts of the project. I trained an XGBoost classifier on transcript patterns to detect intro and sponsor segments. The model looks at features like:
- Position within the video (intros tend to be in the first 10-15%)
- Transcript text patterns (sponsor language, subscribe/like prompts)
- Segment duration and timing characteristics
- SponsorBlock community data for training labels
The result: 92.9% accuracy on the test set. The model is fused with live SponsorBlock API data at inference time for the best of both worlds - ML predictions for videos without community labels, and verified community data when available.
Auth & Monetization
Authentication flows through Supabase (Google OAuth) with JWT tokens. The extension stores license tokens locally and validates them against the backend. Payments are handled through Stripe with a freemium model:
Tier Structure
Free: Keyword search + auto-skip
Regular ($5/mo): + Concept search + Search-AI
Pro ($10/mo): + Ask-AI chat
All paid tiers include a 30-day free trial.
Challenges & Lessons
Building a Chrome extension that integrates deeply with YouTube taught me a lot about working with third-party platforms:
- YouTube's SPA navigation - YouTube doesn't do full page loads, so the extension needs to detect navigation via the History API and MutationObserver
- Chrome Web Store review process - Updates can take days to get approved, which creates awkward windows where the extension and backend are out of sync
- Shadow DOM isolation - Essential for keeping YouTube's aggressive CSS from breaking the extension UI, but it means you can't use global stylesheets
- Rate limiting LLM calls - DeepSeek API costs add up fast with chatty users, so careful prompt engineering and caching are critical
Stack Summary
- Backend: Python 3.11, FastAPI, Docker, Render
- ML: XGBoost, scikit-learn, spaCy
- LLM: DeepSeek API for Search-AI and Ask-AI
- Auth: Supabase (Google OAuth), JWT, license tokens
- Payments: Stripe subscriptions with webhook handling
- Extension: Chrome MV3, shadow DOM, service worker
Links
- SkipVid Website
- GitHub Repository
- Chrome Web Store (search "SkipVid")