As a user I want to generate a summary and suggest tags from the transcription so I don't have to do both by hand #310
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Goal
Give the user two buttons on a document that already has a transcription: "Zusammenfassung generieren" and "Tags vorschlagen". Both are user-triggered — nothing runs automatically, so there's no background job load and no surprise costs on the NAS. Everything self-hosted — no external LLM or embedding APIs, for privacy.
Architecture
Two new containers in
docker-compose.yml:ollama— stock Ollama image (ollama/ollama). Pullsgemma3:4b(quantized, ~3 GB) on first start. Exposes11434only on the internal Docker network. Model selection via env var so we can swap later (qwen2.5:7b,llama3.1:8b, etc.). Volume-mountollama_modelsso the model survives container recreates.lang-service— new FastAPI app underlang-service/(mirrors the layout ofocr-service/). Owns:multilingual-e5-smallsentence-embedding model (~470 MB, CPU, loaded at startup).Keeping
lang-serviceseparate fromocr-servicebecause (a) the OCR container is already close to its 12 GB memory limit; (b) the two concerns have independent lifecycles (model upgrades, scaling, restarts).Endpoints —
lang-servicePOST /summarize{ "transcription": "…", "scriptType": "TYPEWRITER" | "HANDWRITING_LATIN" | "HANDWRITING_KURRENT" }{ "summary": "…" }— single blocking call (timeout ~120 s). User watches a spinner; no streaming.POST /suggest-tags{ "transcription": "…", "tags": [{ "id": "uuid", "name": "Reise" }, …] }— backend passes the full tag taxonomy solang-servicestays stateless about business data.{ "suggestions": [{ "tagId": "uuid", "score": 0.71 }, …] }— top 10, sorted by cosine similarity, score ≥ configurable threshold (default 0.35).GET /health— 200 once both the embedding model and Ollama are reachable.Endpoints ��� backend
POST /api/documents/{id}/generate-summaryWRITE_ALL(same as editing summary manually).transcription_blocksto exist for the doc; 409 otherwise.sortOrder.lang-service /summarize. Does not mutateDocument.summary— returns the generated text so the user can edit before saving via the normal edit form. This keeps the action reversible and matches how manual summaries work.POST /api/documents/{id}/suggest-tagsREAD_ALL(suggestion is non-mutating; applying tags still needsWRITE_ALL).[{ tagId, tagName, color, score }]— the backend hydrates the tag names/colors fromlang-service's id + score response.No changes to the
Documententity or generated API types beyond two new endpoints.Frontend
Both buttons live on the document detail page (
/documents/[id]), disabled with an explanatory tooltip when no transcription exists.Tag-embedding cache
lang-servicestartup, fetch the tag list via a newGET /api/tags/allbackend endpoint (or have the backend POST a warm-up call).{ tagId: embedding_vector }in memory.TagService, it fires aPOST /internal/refresh-tagstolang-serviceafter the DB write. Fire-and-forget from the backend's perspective; if it fails,lang-servicefalls back to lazy refresh on the next/suggest-tagscall that sees an unknowntagId.Failure modes
lang-serviceorollamadown → backend returns 503 with a clearErrorCode(LANG_SERVICE_UNAVAILABLE); frontend surfaces "KI-Dienst nicht erreichbar, bitte später erneut versuchen".{ summary, truncated: true }. Frontend shows a small warning.Testing
lang-serviceunit tests (pure logic):lang-serviceintegration test (mocked Ollama viahttpx.MockTransport).@WebMvcTest) for the two new endpoints with a mockedLangServiceClient.gemma3:4b— too slow for CI. Manual smoke test documented in the PR description.Resource budget
Rough RAM on the NAS with the new containers running idle:
ollama(model loaded): ~4 GBlang-service(embedding model): ~0.7 GBocr-service+ backend + DB + MinIO + frontend + mail): ~14 GB peakNeeds confirmation that the NAS has the headroom.
Implementation order
ollamaservice todocker-compose.yml, pullgemma3:4b, verifycurlworks from inside the network.lang-service/(FastAPI, Dockerfile,/health, embedding model loading)./suggest-tagsfirst (simpler, no LLM dependency)./summarize.LangServiceClient+ two REST endpoints + error codes.TagService.Each step is an independent commit. No migrations needed.
Out of scope