Feature: Saved Content Library

ID:           0014
Status:       Draft
Owner:        @satya
Created:      2026-06-17
Updated:      2026-06-17
Related ADRs: 0003 (Python workers), 0004 (monorepo)
Depends on:   0001 (trips/activities), 0003 (imports pipeline), 0008 (reels)

1. Why

Instagram saved posts are a mess: restaurant recommendations sit next to multi-day trip guides, aesthetic Reels, and travel blog links — all mixed together. Today there is no way to bring this collection into treeper without creating noise in an actual trip (the previous approach of bulk-importing all posts as activities).

The fix: a standalone Saved Content Library where Instagram exports land, get auto-classified by kind, and sit until the user decides to pull them into a trip. Each kind gets the right “add to trip” action — a place becomes an activity, an itinerary triggers the existing import pipeline, a Reel becomes a planning item, an article becomes a reference link.

2. Who it is for

From PRODUCT.md §2:

Solo planner (P1) — primary. Saves 50+ posts while researching a trip; needs to find restaurant recs quickly without wading through unrelated guides.
Inspiration hoarder (P3) — primary. Collects anything travel-related; wants to surface “itineraries I can use” vs. “places I want to eat at” vs. “videos to rewatch for vibes”.

Out of scope for this spec:

Trip-mate (P4) — no sharing of saved content across collaborators.
Curated planner (P5) — no editorial curation or recommendations from saved library.

3. Scope

3.1 In scope

F0014.1 — Instagram ZIP import

F0014.1.a Accept an Instagram data export ZIP file uploaded by the user.
F0014.1.b Sign a Supabase storage URL for the client to PUT the ZIP.
F0014.1.c Trigger the worker to classify each saved post into one of the 4 kinds.
F0014.1.d Return a count of classified items; surface errors (corrupt ZIP, no saved posts) as user-readable messages.

F0014.2 — AI classification

F0014.2.a Tier 1: keyword heuristic classifies each post in <1ms at zero cost. Priority order: reel (has video) → itinerary (day-plan keywords) → place (POI keywords) → link_article (caption URL) → ambiguous.
F0014.2.b Tier 2: LLM call for ambiguous posts only (uses existing LLMClient + instructor). Returns place | itinerary | reel | link_article.
F0014.2.c ai_confidence and ai_model are stored per item for telemetry.
F0014.2.d User can override the kind at any time (PATCH /v1/saved-items/:id).

F0014.3 — Saved Content Library screen

F0014.3.a Dedicated “Saved” tab (web + mobile), separate from Trips.
F0014.3.b Filter bar: All / Places / Itineraries / Reels / Links.
F0014.3.c Card per item showing: kind chip, title, thumbnail (if photo/video), source date, and quick-action button (“Add to trip”).
F0014.3.d Archived items hidden from default view; accessible via toggle.

F0014.4 — Add to trip

F0014.4.a place → creates an activity (kind=sight) in the selected trip/day.
F0014.4.b itinerary → triggers the existing import pipeline on the post URL (or caption text if no URL), opening the draft review screen.
F0014.4.c reel → creates a planning_item (kind=reel) in the trip.
F0014.4.d link_article → creates a planning_item (kind=link) in the trip.
F0014.4.e After “add to trip”, the item is not deleted from the library (it may be added to multiple trips).

F0014.5 — Library management

F0014.5.a Archive an item (hidden from default view, not deleted).
F0014.5.b Delete an item permanently.
F0014.5.c Edit title, tags, and notes on any item.

3.2 Out of scope

Cross-platform imports (TikTok, Google Maps Saved, Pinterest) — separate spec.
Deduplication across multiple Instagram imports (v2).
Sharing saved items with trip collaborators (v2).
Ranking or quality scoring of saved items.
Automatic syncing (requires Instagram API approval; deferred indefinitely).
Converting a saved place into a new trip destination (beyond creating one activity).

4. User stories

As P1, when I upload my Instagram ZIP, I see my saved posts auto-sorted into Places / Itineraries / Reels / Links, so I can immediately filter to just restaurant recs for my trip destination.
As P1, when I tap “Add to trip” on a saved place, I can pick a day and it lands as an activity on that day’s plan, so I don’t have to manually recreate it.
As P3, when I tap “Add to trip” on a saved itinerary, the existing import pipeline opens, extracts the activities from the post, and I review them before committing, so the structured plan ends up in my trip.
As P1, when the AI mis-classifies a Reel as a Place, I can tap the kind chip and change it to Reel in one tap, so my library stays clean.

5. UX notes

Library screen layout

[Saved]                             [Import from Instagram] (+)

  [All] [Places ●] [Itineraries] [Reels] [Links]

  ┌─────────────────────────────────────────────┐
  │ 🏠 Place  •  15 Jul 2026                    │
  │ Best café in Ubud — the one with rice       │
  │ field view...                               │
  │ [thumbnail]                  [Add to trip ▸]│
  └─────────────────────────────────────────────┘

  ┌─────────────────────────────────────────────┐
  │ 🗺️ Itinerary  •  16 Jul 2026                │
  │ 5-day Bali itinerary — day 1: arrive...     │
  │                               [Import ▸]   │
  └─────────────────────────────────────────────┘

  ┌─────────────────────────────────────────────┐
  │ 🎬 Reel  •  17 Jul 2026                     │
  │ [video thumbnail]                           │
  │ Watch this sunset reel                [Add ▸]│
  └─────────────────────────────────────────────┘

Kind chip — tap to override

Tapping the kind chip on any card opens a bottom sheet:

Change kind:
○ Place
○ Itinerary
● Reel  ← current
○ Link / Article

Add to trip — bottom sheet

Add to which trip?
  ○ Japan 2026 (current)
  ○ Bali 2026
  ○ New trip...

Add to day:
  ○ Day 1 (15 Jul)
  ● Day 3 (17 Jul)  ← suggested from source_date
  ○ Unscheduled

6. Acceptance criteria

AC-1   F0014.1.a  Given a valid Instagram ZIP with saved_posts.json, POST /v1/saved-items/import-instagram returns { items_created: N }.
AC-2   F0014.1.d  Given a corrupt ZIP, the endpoint returns 422 with error "Invalid Instagram export format".
AC-3   F0014.2.a  A post with video_list present is classified as "reel" without an LLM call.
AC-4   F0014.2.a  A post with caption "5-day Bali itinerary day 1:" is classified as "itinerary" without an LLM call.
AC-5   F0014.2.a  A post with caption "Best restaurant in Ubud!" is classified as "place" without an LLM call.
AC-6   F0014.2.b  A post with empty caption and no video triggers an LLM classification call.
AC-7   F0014.2.d  PATCH /v1/saved-items/:id with { kind: "reel" } updates the item and returns 200.
AC-8   F0014.3.b  GET /v1/saved-items?kind=place returns only items with kind="place".
AC-9   F0014.4.a  POST /v1/saved-items/:id/add-to-trip with a "place" item creates an activity in the target trip with kind="sight".
AC-10  F0014.4.b  POST /v1/saved-items/:id/add-to-trip with an "itinerary" item creates an import with status="queued" in the target trip.
AC-11  F0014.4.c  POST /v1/saved-items/:id/add-to-trip with a "reel" item creates a planning_item with kind="reel" in the target trip.
AC-12  F0014.4.d  POST /v1/saved-items/:id/add-to-trip with a "link_article" item creates a planning_item with kind="link" in the target trip.
AC-13  F0014.5.a  PATCH /v1/saved-items/:id with { archived: true } hides the item from GET /v1/saved-items (default).
AC-14  F0014.5.b  DELETE /v1/saved-items/:id removes the item; subsequent GET returns 404.

7. Data model

-- New enum
saved_item_kind: 'place' | 'itinerary' | 'reel' | 'link_article'

-- New table: saved_items
id              uuid PK
user_id         uuid  → auth.users
kind            saved_item_kind  NOT NULL
title           text  (1–200)
notes           text  (1–4000)     -- full caption
url             text  (1–2048)     -- primary link extracted from caption
image_urls      text[]  default '{}'
video_url       text  (1–2048)
ai_confidence   numeric(3,2)       -- 0.0–1.0; null if keyword-classified
ai_model        text               -- which LLM was used; null if keyword
source          text  default 'instagram_export'
source_date     date               -- taken_at from the post
tags            text[]  default '{}'
archived        boolean  default false
created_at      timestamptz
updated_at      timestamptz

-- Constraint: at least one content field present
CHECK (title IS NOT NULL OR notes IS NOT NULL OR url IS NOT NULL)

-- Indexes
(user_id, kind, created_at DESC)
(user_id, created_at DESC)
(user_id, source_date DESC)

-- RLS: owner-scoped (like liked_itineraries)

No foreign keys to other entity tables — items are self-contained captures. Cross-linking saved places to trip_destinations or itineraries is deferred to v2.

8. APIs / contracts

POST /v1/saved-items/sign-upload
  → { url: string, path: string }   // client PUTs ZIP directly to Supabase

POST /v1/saved-items/import-instagram
  body: { storage_path, filename, bytes? }
  → { items_created: number, kinds: { place, itinerary, reel, link_article } }

GET /v1/saved-items
  query: kind?, archived?, limit?, cursor?
  → { items: SavedItem[], next_cursor? }

PATCH /v1/saved-items/:id
  body: { kind?, title?, notes?, tags?, archived? }
  → SavedItem

DELETE /v1/saved-items/:id  → 204

POST /v1/saved-items/:id/add-to-trip
  body: { trip_id, day_id? }
  → { result_type: 'activity'|'planning_item'|'import', result_id: string }

Worker (internal, X-Workers-Token auth):

POST /ai/saved-items/classify-instagram
  body: { zip_storage_path: string, user_id: string }
  → { items: SavedItemCreate[], model_used: string, tokens_in, tokens_out, cost_usd }

8a. Reel → multi-place extraction (extension)

A pasted reel/video URL (Instagram / TikTok / YouTube) is not classified as a single reel. Instead it runs the same rich extraction as trip imports — yt-dlp download → Gemini watches the video → ItineraryDraft with many activities → the region-anchored geocoder — and fans out one saved_item per place (café, beach, sight…), each with its own location_lat/lng/address. The source reel is also kept as one reel item. Every produced row carries source_url = the reel permalink (items are flat in the library, linked only by that URL). Reuses the global reel_extractions cache, so a reel already extracted for a trip is free here.

Because a Gemini video call is slow, this path is async via a job row the client polls. Plain (non-video) URLs keep the synchronous import-url OG-scrape path.

New columns on saved_items: location_lat, location_lng, location_address, source_url. New table saved_item_jobs(id, user_id, source_url, status [queued|extracting|ready|failed], items_created, error, …) (RLS owner-scoped, Realtime-published) — migration 0032.

POST /v1/saved-items/import-reel
  body: { url }                         // 400 if not an IG/TikTok/YouTube URL
  → { job_id: string, status: 'queued' }

GET /v1/saved-items/jobs/:id
  → { id, status, items_created, error }   // client polls until ready|failed

Worker (internal):

POST /ai/saved-items/import-reel
  body: { url, user_id, job_id }
  → 202 Accepted; runs extraction in the background, writes saved_items +
    updates saved_item_jobs (service-role).

Kind mapping for fanned-out activities: skip transport and non-stops; run the keyword sub-classifier on the title for a fine-grained place kind, else map food→restaurant, lodging→accommodation, sight/freeform→attraction.

9. Non-functional requirements

Aspect	Target
Import latency	≤ 45s for 100 posts (synchronous; keyword path fast, LLM batched)
LLM cost	≤ $0.05 per import of 100 posts (keyword handles ≥80%; LLM only for ambiguous)
Classification accuracy	≥ 80% correct kind without user override (measured post-ship on sample)
Offline access	Saved library loads from local cache when offline; add-to-trip requires connection
Privacy	ZIP is stored in user-private path, deleted after classification

10. Risks & open questions

R1 — Caption-only posts (no URL, no POI name) may all fall to LLM, raising cost. Mitigation: Default ambiguous posts to place (most common save type) rather than calling LLM; accuracy will be lower but cost is bounded.
R2 — Instagram export format changes without notice (Meta has done this). Mitigation: ig_parser.py is isolated; format changes only break the parser, not the rest of the pipeline. Monitor for errors in saved_item_import_jobs.
Q1 — Should a link_article with no URL just store the caption? Or is an empty URL a validation error? Resolved: store caption as notes; URL is optional.
Q2 — What happens if the user imports twice (duplicate posts)? Deferred to v2 via source_date + title dedup. For now, duplicate items are allowed.

11. Rollout plan

Worker + classification ships first (no UI dependency).
Backend API ships with worker (can be verified with curl/Postman).
Web library screen — second slice (shows library, filter, add-to-trip).
Mobile library screen — third slice (after web is stable).
No feature flag needed — new endpoints/screen; no impact on existing flows.

12. Extension — reel → multi-place extraction

The base flow assigns ONE kind per saved post from its caption/OG text. This extension runs the same rich extraction the trip importer uses (yt-dlp downloads the video → Gemini watches it → ItineraryDraft with many activities → region-anchored geocode) and fans out one saved_item per location, plus keeps the source reel itself. Items are flat in the library; each carries a source_url reference back to the reel.

Data model (migration 0032_saved_items_locations_jobs.sql):

saved_items gains location_lat, location_lng, location_address, source_url.
saved_item_jobs (id, user_id, source_url, status, items_created, error, …) tracks the async run: queued → extracting → ready | failed. Realtime-published so clients can watch it; clients may also poll.

Flow:

POST /v1/saved-items/import-reel { url }      ← only IG/TikTok/YouTube hosts
   → insert saved_item_jobs (queued)
   → worker POST /ai/saved-items/import-reel { url, user_id, job_id }  (202, bg task)
        extract_reel_draft(url)               ← shared cache-aware helper (ai/reel_extract.py)
          cache HIT  → reuse reel_extractions payload (no fetch/LLM)
          cache MISS → yt-dlp fetch → Gemini → persist extraction + asset
        DEDUP: if user already has reel_extract rows for this source_url → stop (no dupes)
        geocode (region-anchored) + media (per-place photos), best-effort
        fan out: 1 reel item + 1 item per qualifying activity → bulk insert
        job → ready (items_created = N)
GET /v1/saved-items/jobs/:id                   ← client polls until ready/failed

Kind mapping (_activity_to_saved_kind): skip transport and is_stop=false; keyword_classify the title for a fine-grained place sub-type; else fall back food→restaurant, lodging→accommodation, sight/freeform→attraction.

Caching & dedup:

Extraction is cached globally in reel_extractions (keyed by canonical URL hash, shared with trip imports) — re-paste never re-runs Gemini.
Fan-out is deduped per (user_id, source='reel_extract', source_url) — a re-paste finishes ready with the existing count and inserts nothing.

Provider note: video uses the Gemini Files API directly (GeminiClient), NOT LiteLLM/OpenRouter — LiteLLM’s chat-completions interface can’t do the upload-and-poll Files API flow for a ≥20MB reel. Requires GEMINI_API_KEY (AI Studio) or the Vertex backend; MODEL_VIDEO=gemini-2.5-flash keeps cost low. Instagram fetch reliability from a datacenter IP needs REEL_COOKIES_FILE.

Q2 (revisited) — duplicate imports of the same reel are now deduped by source_url. Cross-source duplicates (same place from two different reels) are still allowed by design.