Skip to content

0003 — Trip Imports (PDF / Image / Blog → Itinerary)

Status: Shipped (v0 ingest pipeline live end-to-end) Owners: @satya Updated: 2026-05-12 Depends on: 0001 trip-planner (trips, activities, attachments bucket)

Live surfaces: Python worker pipeline (apps/workers/), NestJS imports module (apps/backend/src/imports/), mobile import sheets + review screen (apps/mobile/lib/features/trips/). See ../progress.md for the milestone log.

Users plan trips by collecting fragments: a PDF voucher from a tour operator, a screenshot of a friend’s reel, a brochure photo, a travel blog with a day-by-day breakdown. Re-typing those into the trip itinerary is the single biggest friction point post-M8.

Let a user attach any of {PDF, image, blog URL} to a trip and get a draft itinerary back within ~30s that they can review, edit, and selectively commit into the trip’s activities table.

  • YouTube transcripts / Reels / TikTok (M10 — needs scraper + Whisper).
  • Audio note imports.
  • Multi-trip batch imports.
  • Cross-user sharing of parsed drafts.
flowchart LR
Flutter[["Flutter<br/>mobile app"]]
Nest[["NestJS backend<br/>(apps/backend)"]]
Worker[["Python workers<br/>(FastAPI · apps/workers)"]]
Parser["parser router<br/>(pdf · image · url)"]
LLM["LLM client<br/>(OpenRouter + instructor)"]
Drafts[("public.import_drafts<br/>service-role writes")]
Realtime{{"Supabase Realtime<br/>postgres_changes"}}
Flutter -- "POST /trips/:id/imports" --> Nest
Nest -- "insert imports row" --> Realtime
Realtime -- "status='queued'" --> Worker
Worker --> Parser --> LLM --> Drafts
Drafts -- "row update" --> Realtime
Realtime -- "subscription" --> Flutter
Flutter -- "commit → POST activities" --> Nest

Why workers in Python: mature pypdf / pdf2image / trafilatura / instructor ecosystem. apps/workers already exists.

Why Realtime over polling: mobile already uses Supabase Realtime for M7 sync chip; reuses the same channel infra. Worker subscribes via postgres_changes filtered on status = 'queued'.

  • Gateway: OpenRouter. One key, OpenAI-compatible API, fallback chain.
  • SDK layer: openai python client + instructor for Pydantic-typed structured outputs across any backing model.
  • Abstraction: LLMClient protocol in treeper_workers/ai/llm/client.pyOpenRouterClient is the default; future direct providers can drop in.
  • Model routing (env-overridable):
    • Vision (image / scanned PDF): anthropic/claude-sonnet-4.5
    • Text structuring: anthropic/claude-haiku-4.5
    • Fallbacks: google/gemini-2.5-pro, google/gemini-2.5-flash
SourceStep 1Step 2Step 3
PDF (text)pypdf extractLLM structuring (Haiku)merge
PDF (scanned)pdf2image → PNGVision LLM per page (batched 5)merge + page confidences
Image(skip OCR)Vision LLM
Blog URLtrafilatura extract; playwright only if <500 charsLLM structuringcache by URL hash 30d

PDF text-vs-vision split: if avg_chars_per_page < 100, route to vision.

See infra/supabase/migrations/0007_imports.sql.

  • imports — one per upload; status machine queued → parsing → ready → committed|discarded, or failed.
  • import_drafts — one per successful parse; JSONB payload + page confidences.
  • activities.source_import_id / source_snippet — provenance on committed rows.

Storage reuses trip-attachments bucket with key prefix <trip_id>/imports/<import_id>/<filename>. Existing trip-scoped storage RLS applies unchanged.

8. API (NestJS — to be implemented in next slice)

Section titled “8. API (NestJS — to be implemented in next slice)”
POST /trips/:id/imports body: { source_type, source_uri, filename?, bytes? } → 202 { id }
GET /imports/:id → { status, draft?, error? }
POST /imports/:id/commit body: { activity_indexes: int[] } → { committed: Activity[] }
DELETE /imports/:id → { ok: true } (sets status=discarded)

Mobile gets a signed PUT URL from the existing attachments service to upload bytes, then calls POST /trips/:id/imports with the storage path.

Worker exposes (already mounted, currently stubs):

POST /ai/imports/start body: { import_id } → 202 { accepted: true }

In normal flow the worker discovers jobs via Realtime; the explicit endpoint exists for retries / dev triggers. All worker→DB writes use the service-role key.

class ImportedActivity(BaseModel):
day_index: int | None
date: date | None
time: time | None
title: str
location: str | None
notes: str | None
source_snippet: str
class PageConfidence(BaseModel):
page: int
confidence: Literal['high','medium','low']
class ItineraryDraft(BaseModel):
destinations: list[str]
activities: list[ImportedActivity]
page_confidences: list[PageConfidence] = []
overall_confidence: Literal['high','medium','low']

Undated activities anchor day_index=0 → trip start_date; user edits in review screen.

  • File size cap: 10MB (signed-URL constraint + worker re-check).
  • Page cap: 20 pages per PDF.
  • URL: http(s) only; SSRF guard rejects private IPs / metadata endpoints.
  • Rate limit: 20 imports/user/day (DB count check in NestJS).
  • Cost ceiling: $0.50/import — worker tracks running tokens × model price and aborts with failed if exceeded.
  • LLM keys env-only on worker. Backend never sees them.
  • Auto-delete imports storage after 90 days (cron, M10).
  • Migration 0007_imports.sql
  • Spec
  • Worker: LLM client (OpenRouter + instructor) + ItineraryDraft schema
  • Worker: blog parser end-to-end (no upload, easy to validate)
  • Worker: PDF text + vision pipeline
  • Worker: image pipeline
  • Worker: Realtime subscriber main loop
  • Backend: imports module (controller, service, DTOs)
  • Mobile: import sheet on trip detail
  • Mobile: review screen + commit
  • Guardrails (rate limit, SSRF, cost ceiling)
  • E2E tests (one fixture per source)

Each step is independently shippable behind an imports_enabled feature flag on the user row (added later if needed).

  • Pricing data for cost ceiling: hard-code per-model $/Mtok in worker config; revisit when OpenRouter publishes a usage endpoint.
  • Streaming partial results from worker for very long PDFs (M10).
  • Hand-off to “Reels” path: now its own spec — see 0008 Reel / Video Imports. Uses source_type='video' (enum widened in migration 0014) and a Gemini 2.5 multimodal parser.