Skip to content

Feature: Reel / Video Imports (Instagram → Itinerary)

ID: 0008
Status: In progress
Owner: @satya
Created: 2026-05-13
Updated: 2026-05-13
Related ADRs: 0003 (Python workers)
Supersedes: —

Travel inspiration on phones today flows through short-form video — Instagram reels, TikTok, YouTube Shorts. Users save dozens of “must visit” reels per trip and re-type the places into their itinerary by hand. We already extract itineraries from PDFs, images, and blog URLs (spec 0003); video is the natural next source.

Persona P1 — Solo planner (PRODUCT.md §2): collects reels while commuting, plans on weekends.

  • F0008.1 Add 'video' to import_source_type enum.
  • F0008.2 Python worker: new video parser using Gemini 2.5 multimodal (video-in, structured JSON-out).
    • F0008.2.a Pipeline dispatch on source_type='video'.
    • F0008.2.b Gemini Files API upload + poll-until-ACTIVE.
    • F0008.2.c Cost ledger entries from Gemini usage_metadata.
    • F0008.2.d source_timestamp_sec per activity (optional).
  • F0008.3 Storage convention reusing trip-attachments bucket at <trip_id>/imports/<import_id>/source.mp4.
  • F0008.4 NestJS imports module accepts source_type='video'.
    • F0008.4.a SignImportUploadDto allows video/mp4, video/quicktime, video/webm, video/3gpp, video/x-matroska.
    • F0008.4.b CreateImportDto / ImportSourceInputDto accept 'video' for source_type (both legacy single-source and combined sources[] shapes).
    • F0008.4.c ImportedActivityPayload surface includes the optional source_timestamp_sec field from the worker.
    • F0008.4.d Custom IsVideoSourceUri DTO validator: video source_uri must be either a storage path or an http(s) URL on the Instagram / TikTok / YouTube host allowlist.
  • F0008.5 Reel URL resolver (worker): reel_fetcher.fetch(url) downloads share URLs via yt-dlp.
    • F0008.5.a Host allowlist (instagram.com, tiktok.com, youtube.com, youtu.be).
    • F0008.5.b SSRF guard rejects private / loopback / link-local / reserved / multicast IPs at DNS time.
    • F0008.5.c Size cap via yt-dlp.max_filesize, plus a post-download belt-and-suspenders check.
    • F0008.5.d Mirror to trip-attachments/<trip>/imports/<id>/source.<ext> so re-extract with a newer model is a single LLM call.
    • F0008.5.e Map known yt-dlp errors (private / unavailable / filesize) to short user-facing strings in imports.error.
    • F0008.5.f Optional cookies file via REEL_COOKIES_FILE env for IG rate-limit relief.
  • Mobile share-sheet handler — separate slice.
  • Audio-only ingest, multi-reel batch, public-URL fetch (reel scraping).
  • Frame-OCR / Whisper pre-extraction — Gemini handles it natively.
  • As P1, I can forward a saved Instagram reel into Treeper and get a draft itinerary I can review, so I don’t have to re-type places.

Worker-only milestone — no UX surface yet. The existing review screen on apps/mobile/lib/features/trips/ will pick up video-sourced drafts unchanged because the output schema is identical to PDF / image (ItineraryDraft).

AC-1 F0008.1 `import_source_type` enum contains 'video' after migration.
AC-2 F0008.2 Worker.process_import on an imports row with
source_type='video' downloads bytes from storage,
calls Gemini 2.5, persists an ItineraryDraft, and
marks status='ready'.
AC-3 F0008.2.c imports row has tokens_in / tokens_out / cost_usd
populated from Gemini usage_metadata.
AC-4 F0008.2.d Activities returned from a reel that mentions a place
at 0:15 have `source_timestamp_sec ≈ 15`.
AC-5 F0008 Failures (Gemini 4xx, file-state TIMEOUT, oversize)
land the row in status='failed' with a non-empty
`error` string ≤ 1000 chars.

Migration infra/supabase/migrations/0014_imports_video_source.sql widens the enum only. import_drafts.payload is JSONB so the new optional source_timestamp_sec per activity needs no DDL.

flowchart LR
Row[imports row<br/>source_type=video]
Decide{source_uri<br/>scheme?}
Fetch[reel_fetcher<br/>yt-dlp]
Mirror[upload → trip-attachments]
Storage[supabase Storage<br/>download bytes]
Gemini[GeminiClient<br/>Files API + generate_content]
Draft[(import_drafts.payload)]
Row --> Decide
Decide -- "http(s)://" --> Fetch --> Mirror --> Gemini
Decide -- "path/in/bucket" --> Storage --> Gemini
Gemini --> Draft

Reuses existing endpoints, no new routes:

POST /trips/:tripId/imports/sign-upload body: { mime_type: 'video/mp4'|..., filename, upload_id }
→ { signed_url, storage_path, ... }
POST /trips/:tripId/imports body: { source_type: 'video', source_uri: '<path>',
source_filename?, source_bytes?, user_context? }
| { sources: [ { source_type: 'video', ... } ], user_context? }
→ 202 { id, status: 'queued', ... }

The mobile share-sheet handler asks for a signed URL, PUTs the bytes, then posts the imports row pointing at the storage path. Worker pipeline picks it up via Realtime (or the HTTP nudge in notifyWorker).

Pipeline entry unchanged:

process_import(import_id)
→ _parse_one(source_type='video')
→ parsers.video.parse(video_bytes, mime_type, gemini, ledger, user_context)

Gemini surface (new GeminiClient):

class GeminiClient:
async def video(
*, model: str, system: str, user: str,
video_bytes: bytes, mime_type: str,
response_model: type[T], ledger: CostLedger | None,
) -> T
AspectTarget
Latency≤ 30s p50 for a ≤ 60s reel (Gemini upload ~5s + generate ~10-20s).
Cost≤ $0.05 per 60s reel using gemini-2.5-pro; cost ceiling reuses existing $0.50 cap.
Video size cap100 MB (rejected before upload).
Duration cap300 s (worker rejects with error="video too long").
PrivacySource video kept in user-scoped bucket; Gemini Files-API upload auto-expires in 48h; we also delete after extraction.
  • R1 — Gemini 2.5 Pro video billing changes mid-flight. Mitigation: _MODEL_PRICES lookup, easy to bump.
  • R2 — Hallucinated places on heavily-music reels with little verbal/textual content. Mitigation: overall_confidence='low' propagates; review screen surfaces it.
  • Q1 — Should we store extracted frame thumbnails for the review UI? Deferred to mobile slice.

Worker slice ships behind no flag — it’s only reachable via an imports row with source_type='video', which nothing inserts yet. Backend + mobile slices land separately; until they do this code path is dormant.