ADR-0003: Python (FastAPI) workers for AI ingest and scraping

Status:  Accepted
Date:    2026-05-02
Owner:   @satya

Context

Two product pillars are best served by Python:

AI ingest — Instagram reel transcript, YouTube transcript, blog parsing, prompt-to-itinerary. The ecosystem (yt-dlp, instaloader, trafilatura, LangChain / LiteLLM, vendor SDKs) is Python-native.
Scrapers — verified guides, day-adventure operators, deal feeds. Same story: best parsers and headless-browser drivers are in Python.

Doing this in NestJS would mean shelling out to Python anyway and giving up half the libraries.

Decision

Run a single Python service, apps/workers, built on FastAPI, that exposes two route groups:

/ai/...       ingest URLs, prompt-to-itinerary, plan rewrites
/scrape/...   directory builders, deal feeds, on-demand fetches

NestJS is the only caller. Service-to-service auth is a shared bearer token (WORKERS_SHARED_SECRET). Long-running jobs are async with a job-status endpoint; we add a real queue (Celery / Redis Streams) only when concurrent jobs justify it.

Alternatives considered

Option	Why not
Two separate services (AI vs scrape)	Doubles infra and CI for no v0 benefit; revisit when teams split.
Lambda / Cloud Functions per job	Cold starts on Python+headless-browser are awful for our use cases.
Run Python as a CLI shelled from NestJS	Hides operations, makes deploys brittle, hard to scale independently.

Consequences

Positive

One Dockerfile, one service to deploy alongside NestJS on Coolify.
Shared auth, shared logging shape, shared config style.
Easy to swap LLM providers behind a thin adapter.

Negative / risks

A bug in scraping can starve AI jobs (and vice-versa) until we add a queue and worker pool. Mitigation: tight per-route timeouts.
Single language runtime in the worker means concurrency limits from GIL-heavy code; mitigated by I/O-bound async + provider SDK threads.

Follow-ups

ADR on LLM provider strategy (single, or LiteLLM-style routing).
ADR on scraping ToS / fair-use stance and rate-limiting policy.
Add a queue once we routinely hit > 5 concurrent jobs.

References

apps/workers/README.md