Feature: Trip AI Chat — per-trip assistant in the AI Chat tab

ID:           0012
Status:       Draft
Owner:        @satya
Created:      2026-05-15
Updated:      2026-05-15
Related ADRs: 0002 (Supabase + Nest), 0003 (Python workers for AI), 0009 (TBD — streaming over SSE via Nest)
Depends on:   0001 (trip planner), 0008 (reel imports — for trip context), 0011 (notifications — reuses session conventions)
Supersedes:   —

1. Why

The fourth tab on the trip detail PageView is currently a “coming soon” placeholder (apps/mobile/lib/features/trips/view/ai_chat_tab.dart). Once a trip exists, the user has nowhere to ask questions about it (“what’s day 3 looking like?”, “any veg-friendly food spots near Shibuya?”, “summarise the budget”), nor any conversational way to lean on AI for planning help without leaving the screen and going to imports. A per-trip assistant fills that gap. It is intentionally trip-scoped — consistent with the product vision’s “AI as a power-tool, not a chatbot” stance (specs/product-vision.md) — every session is bound to one trip and grounded in that trip’s data.

2. Who it is for

P1 — Solo planner: needs quick answers about their own trip without scrolling tabs (“when do I land in Tokyo?”, “how much have I budgeted for day 4?”).
P3 — Inspiration hoarder: after a reel import (spec 0008), wants to ask follow-ups (“what else is near this café?”) without leaving the trip.
P4 — Active traveller (deferred): on-trip mode benefits later; v0 still works fine in live mode but no special handling.

Out: P2 (group lead) — multi-user shared chat is explicitly out of scope for v0; sessions are per-user.

3. Scope

In scope

F0012.1 Trip-scoped chat surface in the AI Chat tab.
- F0012.1.a Replace placeholder body in ai_chat_tab.dart with a TripChatCubit-backed conversation list + composer.
- F0012.1.b One default chat session per (trip_id, user_id) — auto-created on first visit. Multiple sessions per trip are out of v0 scope (table supports it; UI does not).
- F0012.1.c Composer: multiline text input, send button, attach button (image from camera/gallery), inline cancel for streaming responses.
- F0012.1.d Empty state: 3–5 suggested starter prompts derived from the loaded trip (e.g. “Summarise day 1”, “What’s the food plan?”, “What’s missing?”).
F0012.2 Persistent history in Supabase.
- F0012.2.a trip_chat_sessions and trip_chat_messages tables (see §7), RLS-scoped to trip members.
- F0012.2.b GET /v1/trips/:tripId/chat/sessions/default returns (and lazily creates) the user’s session for that trip.
- F0012.2.c GET /v1/chat/sessions/:id/messages?limit=50&before=… paginates oldest-first within the page, newest page first; client renders most-recent at the bottom.
F0012.3 Streamed responses through Nest → workers → LiteLLM proxy.
- F0012.3.a POST /v1/chat/sessions/:id/messages accepts a user message (+ optional attachment refs), persists it status='sent', then opens a Server-Sent Events stream and relays token deltas from workers as data: events.
- F0012.3.b Workers exposes POST /ai/chat/stream (auth: workers token, called only by Nest). Builds the prompt from trip_context_snapshot + last N messages, calls LiteLLM with streaming enabled, yields SSE delta / done / error frames.
- F0012.3.c On done, workers persists the assistant message (with model, tokens_in, tokens_out, cost_usd) and Nest closes the SSE.
- F0012.3.d Cancellation: client disconnect → Nest aborts upstream request → workers persists a partial assistant message with status='cancelled' and the prefix received so far.
F0012.4 Trip context injection (“grounding”).
- F0012.4.a Workers builds a compact TripContextSnapshot once per request: trip title, dates, destinations, day-by-day activities (kind, title, time, cost), cost rollup, traveller list. Hard cap at ~6k tokens; truncate oldest/lowest-priority activities first.
- F0012.4.b Snapshot is read from Supabase via the user’s JWT (Nest forwards the bearer; workers uses anon client + that JWT) so RLS still applies — never the service-role client for this path.
- F0012.4.c System prompt names the trip and instructs the model to refuse / clarify when asked about anything outside the trip scope; no general-purpose chatbot behaviour.
F0012.5 Attachments (read-only, image-only in v0).
- F0012.5.a POST /v1/trips/:tripId/chat/sign-upload mints a signed PUT URL into trip-attachments bucket under <trip_id>/chat/<session_id>/<message_id>/….
- F0012.5.b Client uploads, then sends the message with attachments: [{path, mime, bytes}]. Workers downloads via service role, base64-encodes, sends as vision parts (reusing LiteLLMClient.vision-style multimodal path adapted for chat).
- F0012.5.c Max 4 images per message, ≤8 MB each, jpeg/png/webp.
F0012.6 Cost + safety guards.
- F0012.6.a Per-message ceiling: $0.02 default, configurable via workers settings.chat_cost_ceiling_usd. Refusal message persisted on breach with status='failed'.
- F0012.6.b Per-session daily ceiling: $0.50; subsequent sends return 429 with a friendly error.
- F0012.6.c Model defaults to claude-haiku-4-5-20251001 for text-only, claude-sonnet-4-5-20250929 when attachments are present. Both routed through LiteLLM proxy (apps/workers/src/treeper_workers/ai/llm.py).

Out of scope (this milestone)

Write-back / tool-calling that mutates the trip. The assistant can suggest, never auto-edit. A future spec (0013-trip-ai-actions) adds tool calls to insert/modify activities with a confirm step.
Shared / multi-user sessions. Each user has their own thread.
Cross-trip chat (“plan a brand new trip for me”) — handled by the prompt-to-itinerary spec (F3.5), not here.
Voice / audio input or TTS playback.
Web search / live retrieval. v0 grounds only on Supabase trip data and the model’s parametric knowledge.
End-of-trip recap generation (F2.6) — owned by tracker spec.

4. User stories

As P1, I open my trip → AI Chat tab → see a fresh thread with starter chips → tap “Summarise day 1” → get a streamed answer grounded in my actual itinerary within seconds.
As P1, I close the app mid-stream; reopening shows my user message preserved and the partial assistant message marked as cancelled, with a “retry” affordance.
As P3, after a reel imports successfully, I switch to AI Chat and ask “any veg restaurants near these spots?” — the assistant uses the activities I just imported, not generic city advice.
As P1, I attach a photo of a paper itinerary and ask “is this already in my plan?” — assistant compares against the trip and flags overlaps.
As P1, I send 100 messages in a day on one trip — the 101st returns a clear “daily AI budget reached, resets at midnight UTC”.

5. UX notes

Layout (replaces current placeholder; same surfaceCream background, design-system tokens):
- Top bar: implicit (uses the existing trip AppBar) — no extra title needed.
- Message list: reverse-chronological in code, visually bottom-anchored. User bubbles right-aligned with accentLime fill; assistant bubbles left-aligned, surface-card. Cancelled / failed bubbles get a muted treatment + small footer line.
- Streaming bubble shows a typing-dot indicator until first delta, then progressively fills.
- Composer pinned to the bottom (above the floating bottom nav — keep 120 px nav clearance pattern already used in the placeholder).
Starter chips (empty state only): wrap-row of 3–5 chips; tap drops the prompt straight into the composer (does not auto-send).
Attachments: bottom-sheet picker reusing activity_attachments_sheet.dart patterns; pill row of thumbnails above the composer before sending.
Error states:
- Network drop mid-stream → snackbar + retry button on the bubble.
- Budget breach → inline assistant bubble with system styling, no retry.
Accessibility: VoiceOver/TalkBack reads new assistant messages on stream completion (not every delta); send button is the primary focus target after send-success.

6. Acceptance criteria

AC-1  F0012.1.a  Visiting the AI Chat tab on a trip with no prior
                 history shows the empty state and starter chips
                 within 200 ms of the cubit's first emit.
AC-2  F0012.1.b  First send creates exactly one row in
                 trip_chat_sessions for (trip_id, user_id); a second
                 visit reuses the same session id.
AC-3  F0012.2.c  GET /v1/chat/sessions/:id/messages returns the user
                 and assistant messages of the prior turn in
                 created_at order; tokens_in/out/cost_usd are
                 populated on the assistant row.
AC-4  F0012.3.a  POST /v1/chat/sessions/:id/messages opens an SSE
                 response; the first `data:` frame arrives within
                 1.5 s p95 on the staging proxy.
AC-5  F0012.3.d  Disconnecting the SSE mid-stream produces an
                 assistant row with status='cancelled' and `content`
                 equal to the prefix already streamed (no nulls, no
                 dupes).
AC-6  F0012.4.b  A user without trip membership receives 403 on the
                 messages endpoint; the workers context-snapshot
                 query returns zero rows under their JWT (RLS proof).
AC-7  F0012.4.c  Asked "what's the weather in Paris today?" on a
                 Tokyo trip, the assistant declines or redirects to
                 the trip — no general-knowledge answer is returned
                 (manual eval, ≥4/5 prompts).
AC-8  F0012.5.b  An image attachment is downloaded via service role,
                 sent to the vision-capable model, and the
                 attachment row links the storage path; assistant
                 references the image content in its reply.
AC-9  F0012.6.a  A message whose run exceeds the per-message ceiling
                 stops streaming, persists status='failed' with the
                 partial content, and returns a 200 SSE terminated
                 by an `error` frame (not 5xx).
AC-10 F0012.6.b  The 51st message in a single UTC day on one
                 session returns 429 with code `daily_budget_reached`.
AC-11 F0012.3.c  Across 20 sequential turns, no orphan rows are
                 produced (every `user` message has either a paired
                 `assistant` row or a `failed`/`cancelled` row).

7. Data model

Table sketch only; real schema lives in infra/supabase/migrations/0020_trip_chat.sql.

trip_chat_sessions
  id              uuid pk
  trip_id         uuid fk → trips(id) on delete cascade
  user_id         uuid fk → auth.users(id)
  title           text                       -- short auto-generated label
  model           text                       -- last model used
  created_at      timestamptz default now()
  updated_at      timestamptz
  unique (trip_id, user_id)                  -- v0: one session per (trip,user)

trip_chat_messages
  id              uuid pk
  session_id      uuid fk → trip_chat_sessions(id) on delete cascade
  role            chat_role enum ('user','assistant','system','tool')
  content         text                       -- final text (may be partial if cancelled/failed)
  parts           jsonb                      -- multimodal/tool parts, future-proof
  status          chat_message_status enum   -- 'sent','streaming','done','cancelled','failed'
  error           text
  model           text
  tokens_in       int
  tokens_out      int
  cost_usd        numeric(10,4)
  created_at      timestamptz default now()

trip_chat_attachments
  id              uuid pk
  message_id      uuid fk → trip_chat_messages(id) on delete cascade
  storage_path    text                       -- 'trip-attachments' bucket key
  mime            text
  bytes           bigint
  created_at      timestamptz default now()

RLS (mirrors existing trip-scoped policies):

trip_chat_sessions: select/insert/update where auth.uid() = user_id AND user is a member of trip_id (reuse is_trip_member(trip_id, auth.uid()) helper from earlier migrations).
trip_chat_messages: select/insert via session membership check.
trip_chat_attachments: same, joined through message → session.
Workers’ service role bypasses RLS as today; user-context calls go through the user JWT (see F0012.4.b).

Storage: reuse the trip-attachments bucket; path prefix <trip_id>/chat/<session_id>/<message_id>/…. Existing bucket policies already grant trip-member read/write on that prefix — no new bucket needed.

8. APIs / contracts

Backend (Nest, JWT-guarded, mounted under existing v1 prefix):

GET    /v1/trips/:tripId/chat/sessions/default
       → { session: TripChatSession }   (creates lazily)

GET    /v1/chat/sessions/:id/messages?limit=50&before=<iso>
       → { messages: TripChatMessage[], hasMore: boolean }

POST   /v1/chat/sessions/:id/messages
       body: { content: string, attachments?: AttachmentRef[] }
       response: text/event-stream
         event: ack       data: { user_message_id, assistant_message_id }
         event: delta     data: { text }
         event: done      data: { tokens_in, tokens_out, cost_usd, model }
         event: error     data: { code, message }

POST   /v1/trips/:tripId/chat/sign-upload
       body: { mime, bytes, message_session_id }
       → { uploadUrl, storagePath, expiresAt }

DELETE /v1/chat/sessions/:id           (purge entire session for the user)

Workers (workers-token-guarded, called only by Nest):

POST   /ai/chat/stream
       body: { session_id, trip_id, user_jwt, history_cutoff, attachments? }
       response: text/event-stream (delta | done | error)

Mobile (Flutter):

TripChatRepository → TripChatRemoteSource (Dio) + TripChatStreamClient (SSE; flutter_client_sse or hand-rolled over http.Client.send).
TripChatCubit states: Initial | Loading | Ready(messages, sending?, streamingId?) | Failure.

9. Non-functional requirements

Concern	Target
Time to first token	≤1.5 s p95 (staging LiteLLM → Haiku); ≤3.0 s p95 with image
Full turn latency	≤6 s p95 text-only at ~300 output tokens
Cost	≤$0.02/message average; ≤$0.50/session/day hard cap
Offline	Tab shows an empty/offline state with composer disabled and a “reconnect” hint; no local Drift cache in v0 (see Q3 / R-deferred)
Privacy	RLS enforced via user JWT for context snapshot read; service role only used for storage download + history writes
Observability	Structlog events `chat.turn.started/streamed/finished/failed` with `session_id`, `trip_id`, `tokens`, `cost`, `model`
Accessibility	Screen-reader announces assistant reply once (on `done`), not per delta
Failure isolation	A failing LiteLLM call must persist a `failed` row + error string; never leave a `streaming` row to rot

10. Risks & open questions

Q1: SSE through Nest vs direct mobile→workers? Going through Nest preserves the single auth boundary and matches every other feature, at the cost of one extra hop. Decision pending; if latency budget is missed, fold into ADR 0009.
Q2: Do we need a true streaming chat method on LiteLLMClient? Current client is instructor-only (structured). Likely add a parallel chat_stream(...) -> AsyncIterator[str] next to complete() rather than refactoring instructor.
Q3: Local Drift schema for cached messages — resolved: v0 skips local caching. History is fetched from Supabase on tab open; offline = read-only empty state. Revisit if users complain.
R1: Token blow-out on long trips. Mitigation: snapshot cap + last N (≤20) message rolling window; older messages summarised by a cheap pre-pass only when the window is full (defer to v0.1 if not needed).
R2: Prompt injection from imported reels’ captions appearing in trip data. Mitigation: snapshot renders trip data inside a clearly fenced block with an instruction to ignore directives within it; add a regression eval set (≥10 prompts) before shipping.
R3: Per-user daily cap is global per session; abusive users could create new trips to multiply quota. Track usage by user_id too (column on user_prefs or a separate ai_usage_daily table) — add in a follow-up if abuse appears.
R4: Cancellation race — client retries while workers still streaming the previous reply. Mitigation: assistant message id is minted up-front (returned in ack) and used as an idempotency key; second POST with the same in-flight assistant id is rejected.

11. Rollout plan

Migration 0020 — tables + RLS, no behaviour change. Ship independent of clients.
Workers — /ai/chat/stream endpoint behind a feature env CHAT_ENABLED=true. Smoke-tested with httpx --stream and a golden trip fixture.
Backend — /v1/chat/... endpoints, SSE relay, sign-upload. Integration tests cover AC-2, AC-4, AC-9, AC-10 with a stubbed workers fake.
Mobile — replace AiChatTab placeholder with the real widget tree behind a remote chat_enabled flag (or local debug toggle if no flag service yet). Internal dogfood first.
GA — flip the flag globally; AppBar / tab icon unchanged.

No data backfill needed (history begins on launch). Rollback = flip flag off + keep tables (no destructive change).

12. References

Placeholder being replaced: apps/mobile/lib/features/trips/view/ai_chat_tab.dart
LiteLLM client to extend: apps/workers/src/treeper_workers/ai/llm.py
Worker config (LiteLLM env): apps/workers/src/treeper_workers/config.py
Reel imports (source of trip context that this assistant grounds on): specs/features/0008-reel-video-imports.md
Notifications outbox + RLS pattern reused for session writes: specs/features/0011-notifications.md
ADR on Supabase + Nest split: specs/adr/0002-supabase-with-nestjs-api.md
ADR on Python workers boundary: specs/adr/0003-python-workers-for-ai-and-scraping.md