Feature: Reel Extraction Cache (Global, Multi-Model, Merged)
ID: 0009Status: In progressOwner: @satyaCreated: 2026-05-14Updated: 2026-05-14Related ADRs: 0003 (Python workers)Depends on: 0008 (reel video imports)1. Why
Section titled “1. Why”Spec 0008 ships per-import Gemini extractions. Every share of the same reel re-pays the Gemini cost and ~60s of latency, even though the output for a given (URL, model, prompt_version) is identical. This spec turns that one-shot extraction into a globally-shared cache: once any user has imported a reel, every subsequent share of the same reel — by anyone — returns the cached structured data instantly.
It also splits “raw model output” from “the canonical merged view”, so we can re-run with a better model later, keep history, and let users (eventually) hand-merge.
2. Who it is for
Section titled “2. Who it is for”P1 — Solo planner (PRODUCT.md §2). Power users with 20+ saved reels benefit most: 90%-ish of the inspiration corpus tends to be popular travel reels with high re-share rates.
3. Scope
Section titled “3. Scope”In scope
Section titled “In scope”F0009.1New tablesreel_assets,reel_extractions,reel_extraction_merges. Migration 0015.F0009.2URL canonicalizer producing(platform, external_id). Instagram/p/<code>and/reel/<code>collapse to the same key.F0009.3Thumbnail extraction in the worker via ffmpeg (640px wide, ~25% in to skip title cards).F0009.4Pipeline cache-hit short-circuit: worker checksreel_extractionsbefore running yt-dlp or calling Gemini.F0009.5Auto-merge strategyauto:latest_prorecomputed after every new extraction insert.F0009.6BackendPOST /trips/:id/importsshort-circuits to the merged view when a cached extraction exists —importsrow is born withstatus='ready'and a populatedimport_drafts.payload.F0009.7BackendPOST /imports/:id/retry { model? }accepts a model override so a user can deliberately re-run with Pro.F0009.8BackendGET /reels/:hashexposes the asset, all extractions, and the merged view to any authenticated user.
Out of scope (this milestone)
Section titled “Out of scope (this milestone)”auto:consensusandmanualmerge strategies (post-v0).- Mobile thumbnail rendering / “imported N times” badge.
- Takedown / DMCA UI (DB has
deleted_at; admin tool comes later). - TikTok short-link resolution (
vm.tiktok.com302). Resolve in v1.
4. User stories
Section titled “4. User stories”- As any user, when I share a reel another user already imported, I see a draft within ~1s and we don’t burn Gemini quota.
- As a user, I can hit “re-extract with Pro” on a Flash-parsed reel and get the higher-quality output without losing the original.
5. UX notes
Section titled “5. UX notes”Worker + backend slice. Mobile picks up the speedup automatically —
GET /imports/:id returns a ready draft on first poll instead of
spinning. Thumbnail rendering in the review screen lands in a
follow-up.
6. Acceptance criteria
Section titled “6. Acceptance criteria”AC-1 F0009.1 Migration 0015 creates the three tables with the documented columns + constraints.AC-2 F0009.2 canonicalize_reel_url(...) returns the same (platform, external_id) for /p/<X>/ and /reel/<X>/ on Instagram, strips ?igsh= and other tracking params.AC-3 F0009.4 Worker pipeline given a video URL whose source_url_hash already has a `reel_extractions` row for the configured (model, prompt_version) does NOT call yt-dlp and does NOT call Gemini.AC-4 F0009.4 The cache-hit path inserts an import_drafts row with the merged payload and marks the imports row status='ready'.AC-5 F0009.5 After a new extraction insert, an upsert on reel_extraction_merges runs idempotently and produces a payload sourced from the most recent Pro extraction (else the most recent extraction).AC-6 F0009.3 Every reel_assets row has a thumbnail_storage_path that resolves to a JPEG ≤ 200KB in the bucket.AC-7 F0009.6 Backend POST /trips/:id/imports with a cached video URL returns 202 with status='ready' (or the existing 'queued' code-path completes within the same request — implementation choice).AC-8 F0009.7 POST /imports/:id/retry { model: 'gemini-2.5-pro' } forces a new extraction row even if Flash exists, then merge picks Pro on the next read.AC-9 F0009.8 GET /reels/:hash returns { asset, extractions[], merged } for any authenticated user.7. Data model
Section titled “7. Data model”reel_assets (1 row per reel) source_url_hash text PK sha256(platform || ':' || external_id) or raw url fallback source_canonical_url text source_platform text instagram|tiktok|youtube|unknown source_external_id text shortcode where known video_storage_path text thumbnail_storage_path text duration_sec int bytes int fetched_at timestamptz default now() last_referenced_at timestamptz default now() GC anchor deleted_at timestamptz soft-delete for takedowns
reel_extractions (N rows per reel) id uuid PK source_url_hash text FK reel_assets model text e.g. gemini-2.5-pro prompt_version text "v1" schema_version int payload jsonb ItineraryDraft tokens_in / tokens_out int cost_usd numeric(10,4) parsed_at timestamptz default now() extracted_by_user_id uuid triggered_by_import_id uuid unique (source_url_hash, model, prompt_version)
reel_extraction_merges (1 row per reel) source_url_hash text PK FK reel_assets payload jsonb merged ItineraryDraft strategy text auto:latest_pro|auto:consensus|manual contributing_extraction_ids uuid[] merged_at timestamptz default now() merged_by_user_id uuid null = auto
import_sources + reel_extraction_id uuid null which extraction satisfied this importStorage layout
Section titled “Storage layout”trip-attachments/reels/<source_url_hash>/source.<ext> mirrored videotrip-attachments/reels/<source_url_hash>/thumb.jpg 640px thumbnailWe keep them in the existing trip-attachments bucket (not a new
public bucket) so RLS stays uniform. Reads use signed URLs minted
from the backend.
8. APIs / contracts
Section titled “8. APIs / contracts”GET /reels/:hash → { asset, extractions[], merged } any authed userPOST /trips/:id/imports cache-hit short-circuit for video URLsPOST /imports/:id/retry { model? } forces a fresh extraction at a chosen model
# worker-onlyPOST /ai/reels/extract { url, model?, prompt_version? } admin re-extractinternal _recompute_merge(hash) after every insert9. Non-functional requirements
Section titled “9. Non-functional requirements”| Aspect | Target |
|---|---|
| Cache hit latency | < 2s end-to-end (no yt-dlp, no Gemini) |
| Cache hit rate at scale | ≥ 30% on travel reels (assumption to validate) |
| Thumbnail size | ≤ 200 KB JPEG |
| Storage growth | bounded by GC on last_referenced_at < now() - 12mo |
| Multi-tenant safety | extractions are global; no per-user data leaks (only extracted_by_user_id audit column, not exposed in API) |
10. Risks & open questions
Section titled “10. Risks & open questions”- Cross-user reuse of AI-derived data. Resolved: global sharing
with takedown soft-delete (
reel_assets.deleted_at). Privacy policy should note “AI-extracted public content may be reused across users.” - Hot-reel race: two concurrent imports of the same uncached
reel pay twice. Mitigation: in-process or Redis 60s lock keyed by
source_url_hash. Defer to v1; volume is too low to matter now. - Schema drift on
ItineraryDraft: every extraction carriesschema_version; the merge step coerces old extractions to the newest schema. v0 only has schema_version=1, no coercion needed yet.
11. Rollout plan
Section titled “11. Rollout plan”Migration first, then worker, then backend. All slices are backward-compatible: old (non-cached) flows keep working unchanged. No feature flag — the cache layer is transparent.
12. References
Section titled “12. References”- Parent: 0008 Reel Video Imports
- Pipeline parent: 0003 Trip Imports
- ADR: 0003 Python workers
- Migration:
infra/supabase/migrations/0015_reel_extraction_cache.sql