Feature: Reel Extraction Cache (Global, Multi-Model, Merged)

ID:           0009
Status:       In progress
Owner:        @satya
Created:      2026-05-14
Updated:      2026-05-14
Related ADRs: 0003 (Python workers)
Depends on:   0008 (reel video imports)

1. Why

Spec 0008 ships per-import Gemini extractions. Every share of the same reel re-pays the Gemini cost and ~60s of latency, even though the output for a given (URL, model, prompt_version) is identical. This spec turns that one-shot extraction into a globally-shared cache: once any user has imported a reel, every subsequent share of the same reel — by anyone — returns the cached structured data instantly.

It also splits “raw model output” from “the canonical merged view”, so we can re-run with a better model later, keep history, and let users (eventually) hand-merge.

2. Who it is for

P1 — Solo planner (PRODUCT.md §2). Power users with 20+ saved reels benefit most: 90%-ish of the inspiration corpus tends to be popular travel reels with high re-share rates.

3. Scope

In scope

F0009.1 New tables reel_assets, reel_extractions, reel_extraction_merges. Migration 0015.
F0009.2 URL canonicalizer producing (platform, external_id). Instagram /p/<code> and /reel/<code> collapse to the same key.
F0009.3 Thumbnail extraction in the worker via ffmpeg (640px wide, ~25% in to skip title cards).
F0009.4 Pipeline cache-hit short-circuit: worker checks reel_extractions before running yt-dlp or calling Gemini.
F0009.5 Auto-merge strategy auto:latest_pro recomputed after every new extraction insert.
F0009.6 Backend POST /trips/:id/imports short-circuits to the merged view when a cached extraction exists — imports row is born with status='ready' and a populated import_drafts.payload.
F0009.7 Backend POST /imports/:id/retry { model? } accepts a model override so a user can deliberately re-run with Pro.
F0009.8 Backend GET /reels/:hash exposes the asset, all extractions, and the merged view to any authenticated user.

Out of scope (this milestone)

auto:consensus and manual merge strategies (post-v0).
Mobile thumbnail rendering / “imported N times” badge.
Takedown / DMCA UI (DB has deleted_at; admin tool comes later).
TikTok short-link resolution (vm.tiktok.com 302). Resolve in v1.

4. User stories

As any user, when I share a reel another user already imported, I see a draft within ~1s and we don’t burn Gemini quota.
As a user, I can hit “re-extract with Pro” on a Flash-parsed reel and get the higher-quality output without losing the original.

5. UX notes

Worker + backend slice. Mobile picks up the speedup automatically — GET /imports/:id returns a ready draft on first poll instead of spinning. Thumbnail rendering in the review screen lands in a follow-up.

6. Acceptance criteria

AC-1  F0009.1  Migration 0015 creates the three tables with the
                documented columns + constraints.
AC-2  F0009.2  canonicalize_reel_url(...) returns the same
                (platform, external_id) for /p/<X>/ and /reel/<X>/
                on Instagram, strips ?igsh= and other tracking
                params.
AC-3  F0009.4  Worker pipeline given a video URL whose
                source_url_hash already has a `reel_extractions`
                row for the configured (model, prompt_version)
                does NOT call yt-dlp and does NOT call Gemini.
AC-4  F0009.4  The cache-hit path inserts an import_drafts row
                with the merged payload and marks the imports
                row status='ready'.
AC-5  F0009.5  After a new extraction insert, an upsert on
                reel_extraction_merges runs idempotently and
                produces a payload sourced from the most recent
                Pro extraction (else the most recent extraction).
AC-6  F0009.3  Every reel_assets row has a thumbnail_storage_path
                that resolves to a JPEG ≤ 200KB in the bucket.
AC-7  F0009.6  Backend POST /trips/:id/imports with a cached
                video URL returns 202 with status='ready' (or
                the existing 'queued' code-path completes within
                the same request — implementation choice).
AC-8  F0009.7  POST /imports/:id/retry { model: 'gemini-2.5-pro' }
                forces a new extraction row even if Flash exists,
                then merge picks Pro on the next read.
AC-9  F0009.8  GET /reels/:hash returns { asset, extractions[],
                merged } for any authenticated user.

7. Data model

reel_assets               (1 row per reel)
  source_url_hash         text  PK     sha256(platform || ':' || external_id) or raw url fallback
  source_canonical_url    text
  source_platform         text         instagram|tiktok|youtube|unknown
  source_external_id      text         shortcode where known
  video_storage_path      text
  thumbnail_storage_path  text
  duration_sec            int
  bytes                   int
  fetched_at              timestamptz default now()
  last_referenced_at      timestamptz default now()    GC anchor
  deleted_at              timestamptz                  soft-delete for takedowns

reel_extractions          (N rows per reel)
  id                      uuid  PK
  source_url_hash         text  FK reel_assets
  model                   text         e.g. gemini-2.5-pro
  prompt_version          text         "v1"
  schema_version          int
  payload                 jsonb        ItineraryDraft
  tokens_in / tokens_out  int
  cost_usd                numeric(10,4)
  parsed_at               timestamptz default now()
  extracted_by_user_id    uuid
  triggered_by_import_id  uuid
  unique (source_url_hash, model, prompt_version)

reel_extraction_merges    (1 row per reel)
  source_url_hash         text  PK FK reel_assets
  payload                 jsonb        merged ItineraryDraft
  strategy                text         auto:latest_pro|auto:consensus|manual
  contributing_extraction_ids uuid[]
  merged_at               timestamptz default now()
  merged_by_user_id       uuid         null = auto

import_sources
  + reel_extraction_id    uuid null    which extraction satisfied this import

Storage layout

trip-attachments/reels/<source_url_hash>/source.<ext>     mirrored video
trip-attachments/reels/<source_url_hash>/thumb.jpg        640px thumbnail

We keep them in the existing trip-attachments bucket (not a new public bucket) so RLS stays uniform. Reads use signed URLs minted from the backend.

8. APIs / contracts

GET  /reels/:hash                   → { asset, extractions[], merged }   any authed user
POST /trips/:id/imports             cache-hit short-circuit for video URLs
POST /imports/:id/retry { model? }  forces a fresh extraction at a chosen model

# worker-only
POST /ai/reels/extract { url, model?, prompt_version? }     admin re-extract
internal _recompute_merge(hash)                              after every insert

9. Non-functional requirements

Aspect	Target
Cache hit latency	< 2s end-to-end (no yt-dlp, no Gemini)
Cache hit rate at scale	≥ 30% on travel reels (assumption to validate)
Thumbnail size	≤ 200 KB JPEG
Storage growth	bounded by GC on `last_referenced_at < now() - 12mo`
Multi-tenant safety	extractions are global; no per-user data leaks (only `extracted_by_user_id` audit column, not exposed in API)

10. Risks & open questions

Cross-user reuse of AI-derived data. Resolved: global sharing with takedown soft-delete (reel_assets.deleted_at). Privacy policy should note “AI-extracted public content may be reused across users.”
Hot-reel race: two concurrent imports of the same uncached reel pay twice. Mitigation: in-process or Redis 60s lock keyed by source_url_hash. Defer to v1; volume is too low to matter now.
Schema drift on ItineraryDraft: every extraction carries schema_version; the merge step coerces old extractions to the newest schema. v0 only has schema_version=1, no coercion needed yet.

11. Rollout plan

Migration first, then worker, then backend. All slices are backward-compatible: old (non-cached) flows keep working unchanged. No feature flag — the cache layer is transparent.

12. References

Parent: 0008 Reel Video Imports
Pipeline parent: 0003 Trip Imports
ADR: 0003 Python workers
Migration: infra/supabase/migrations/0015_reel_extraction_cache.sql