- Restore JavaDoc on DocumentSearchResult.of() and .paged() factory methods
- Remove redundant null guards on @Builder.Default collections in toListItem()
- Map DocumentListItem fields explicitly in DocumentMultiSelect before cast
- Add DocumentListItem required fields to docFactory in spec
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Use documentService.getDocumentById() in detail_stillReturnsTrainingLabels
so the Document.full entity graph eager-loads trainingLabels
- Flatten makeItem() factory in DocumentList.svelte.test.ts (nested
document: {} overrides broke item.id / item.documentDate access)
- Remove { document: {} } wrapper from DocumentMultiSelect.svelte.spec.ts
mock responses — component now reads body.items directly as flat items
- Flatten single nested item in page.svelte.test.ts document list test
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
All components, specs, and the generated API client now use the new
DocumentListItem shape — flat access (item.title, item.sender) instead of
the removed item.document.* nesting.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Remove trainingLabels from Document.list entity graph now that DocumentListItem
does not touch that association. Integration tests guard against future
LazyInitializationException regressions and confirm Document.full still
loads trainingLabels for the detail endpoint.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Eliminates excessive data exposure (OWASP API3:2023) — transcription,
filePath, fileHash, thumbnailKey, scriptType and other detail-only fields
are no longer serialised in the list API response.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
ADR-024 records the deliberate cross-domain link (obs-grafana joins
archiv-net to query archive-db via the SELECT-only grafana_reader role),
the rejected alternatives (Prometheus exporter, read replica, versioned
migration + flyway repair, hardcoded fallback), and the consequences —
specifically that a Grafana compromise gains TCP reach to archive-db
but is bounded by the role's least-privilege grants.
The DEPLOYMENT.md runbook documents the rotation procedure that
R__grafana_reader_password.sql now enables: bump GRAFANA_DB_PASSWORD,
restart backend (Flyway re-applies because the resolved checksum
changed), restart obs-grafana (datasource picks up the new env var).
Also calls out the fail-closed startup behavior so operators who hit
IllegalStateException know it is deliberate.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The original 4 tests asserted SELECT existed on the three granted tables
and was absent on app_users. That left two gaps a future migration could
slip through silently:
- INSERT/UPDATE/DELETE on the granted tables — if someone GRANTed write
access on, say, documents to grafana_reader, the SELECT positives stay
green and the boundary is breached invisibly.
- Other PII / sensitive tables — the single app_users negative checks
one table; a wildcard "GRANT SELECT ON ALL TABLES IN SCHEMA public"
would still leave it green by accident if app_users wasn't the only
sensitive table.
Switch to a hasPrivilege(table, privilege) helper, add three write-deny
tests (INSERT/UPDATE/DELETE on each granted table), and replace the
single app_users negative with a parameterized sweep over app_users,
user_groups, persons, notifications, document_comments,
document_annotations, geschichten. New sensitive tables get added to
that list as they appear.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
V68 used to set the role's password in a versioned migration, which Flyway
applies exactly once per database. Rotating GRAFANA_DB_PASSWORD therefore
had no effect on the DB role — operators would need a manual ALTER ROLE
or a `flyway repair` that nobody documented. The shape conflated two
lifecycles: schema migration (one-shot, immutable) and credential
provisioning (rotatable).
Split into:
- V68 (versioned, immutable): creates the role and applies SELECT grants
on audit_log, documents, transcription_blocks.
- R__grafana_reader_password.sql (repeatable): issues ALTER ROLE … PASSWORD
with the placeholder. Flyway computes the checksum on the resolved
content, so any change to GRAFANA_DB_PASSWORD changes the checksum and
re-applies the migration on the next boot. Rotation becomes "bump env
var + restart backend".
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
FlywayConfig used to fall back to a hardcoded "changeme-grafana-db-password"
string when the env var was missing. That published a known credential for
the grafana_reader role (SELECT on audit_log, documents, transcription_blocks)
into git history and made silent fail-open the default for any deploy that
forgot the secret. Now resolution goes through Spring's Environment and
throws IllegalStateException at startup when the value is unset or blank —
same shape as UserDataInitializer's refusal to seed default admin creds.
Tests inject via the global GRAFANA_DB_PASSWORD entry in test-resources
application.properties so existing Flyway-loading test classes keep
booting without per-class TestPropertySource boilerplate. FlywayConfigTest
covers both branches against MockEnvironment without a Spring context.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Wires the new GRAFANA_DB_PASSWORD secret through the deploy pipeline:
- docker-compose.prod.yml: backend env now passes GRAFANA_DB_PASSWORD
through so Flyway V68 can resolve the ${grafanaDbPassword} placeholder
in production and staging (it already worked in local dev via
docker-compose.yml).
- release.yml + nightly.yml: declare GRAFANA_DB_PASSWORD as a required
Gitea secret, write it into .env.production / .env.staging (consumed
by archive-backend), and into /opt/familienarchiv/obs-secrets.env
(consumed by obs-grafana's PostgreSQL datasource).
Operator action before the next deploy: add a GRAFANA_DB_PASSWORD value
to the Gitea repo secrets (openssl rand -hex 32).
Refs #651.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Adds the new read-only connection from Grafana to archive-db (via the
grafana_reader role) introduced by the PO Overview dashboard.
Refs #651.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Adds GRAFANA_DB_PASSWORD to the observability-stack env-var table, the
Gitea secrets table, and the obs-secrets.env reference, so operators see
the variable wherever they look for related secrets.
Refs #651.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Provisioned dashboard for the product owner's weekly check-in: system
health (Prometheus + Loki), user activity (PostgreSQL audit_log), archive
progress (PostgreSQL transcription_blocks + audit_log), and OCR quality
(Prometheus ocr-service metrics). Default range 7d, manual refresh,
thresholds per the issue spec.
Refs #651.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
.env.example: declare GRAFANA_DB_PASSWORD with an openssl rand -hex 32 hint
so a missing value fails loudly (NFR-OPS-02). obs.env: add a comment
explaining that the real value comes from CI's obs-secrets.env, matching
the pattern used for other secrets in that file.
Refs #651.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Adds a read-only datasource pointing at archive-db using the grafana_reader
role (provisioned by Flyway V68). The password is interpolated from the
GRAFANA_DB_PASSWORD env var passed to obs-grafana, and the connection is
locked to editable: false so the credential cannot be inspected via the UI.
sslmode=disable is intentional: traffic stays inside archiv-net.
Refs #651.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Flyway runs inside the backend container at startup; V68's
${grafanaDbPassword} placeholder is resolved from this env var.
Refs #651.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
obs-grafana now joins archiv-net so it can resolve archive-db:5432 for the
PO Overview dashboard's PostgreSQL datasource, and receives GRAFANA_DB_PASSWORD
so provisioning can interpolate it into the datasource config.
Refs #651.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Add Flyway V68 migration that provisions a read-only PostgreSQL role
scoped to audit_log, documents, and transcription_blocks. The role's
password is injected via the new ${grafanaDbPassword} Flyway placeholder,
which FlywayConfig reads from the GRAFANA_DB_PASSWORD env var. The
migration is idempotent: CREATE on first run, ALTER on re-run.
Adds a Testcontainers integration test asserting positive grants on the
three intended tables and a negative grant on app_users (NFR-SEC-01).
Refs #651.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Sara's cycle-2 S2: clarify the latent (but not actual) cross-test state
risk on the two metrics tests that hit the global REGISTRY instead of
the per-test fresh_metrics fixture. Migrating them would actually break
them — the /metrics endpoint is served by prometheus-fastapi-instrumentator
which binds to the default REGISTRY at app-construction time, and the
http_requests_total assertion only finds counters on that global
registry. Both tests already assert response shape only (status code,
content-type substring, body substrings), not numeric values, so the
shared-registry caveat is documented for future readers rather than
treated as a bug to fix.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Sara's cycle-2 S1: the wall-clock assertion at < 0.05s could trip on a
slow CI runner under load even when the timer correctly excludes
spell-check. Sara's preferred structural fix (patch main.time.monotonic
with a deterministic sequence) proved awkward — the patched attribute is
the *global* time.monotonic which httpx and asyncio consume, exhausting
the sequence before the request reaches the engine loop.
Take the documented fallback: widen the bound to 0.09s and explain why.
The failure mode the test guards against (spell-check inside the timer)
would add 0.1s (2 × 0.05s sleep), so 0.09s catches the bug while leaving
~90ms of headroom for slow CI runners. Verified red→green by temporarily
moving correct_text inside the timer block: bound trips at 0.101s; the
fixed code reads ~0.001s.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
ADR-023 captures why prometheus-fastapi-instrumentator was chosen,
the build_metrics(registry) factory pattern, and the test rebinding
seam. The glossary gains four ops-aligned terms — illegible word,
models-ready gauge, recognition vs segmentation accuracy — so the
metrics documentation in OBSERVABILITY.md has a vocabulary to lean on.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- L2 container diagram now shows the Prometheus -> ocr:8000 scrape edge
(plus the previously-undrawn Prometheus -> backend edge for symmetry).
- OBSERVABILITY.md gains a full ocr_* metrics table with labels, units,
and the canonical example queries from issue #652.
- New "Internal-only endpoints" subsection captures the unauthenticated
/metrics caveat and provides the Caddy block snippet for the case
where the service ever gets a host port.
- Explicit note that MetricsPathFilter only quiets uvicorn stdout, and
the OCR metrics must never carry PII or document content.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Three small drops that pay back later:
- Note that main.metrics is import-time bound and tests must
monkeypatch `main.metrics`, not the registry.
- Flag the /metrics endpoint as unauthenticated and cross-link the
Caddy-block snippet in docs/OBSERVABILITY.md.
- Pin prometheus-client to the exact 0.25.0 patch version already
resolved by prometheus-fastapi-instrumentator 7.0.0, so an upstream
bump cannot silently slip in.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Each metrics test was repeating the same five-line block — patch
kraken_engine.load_models, patch load_spell_checker, instantiate the
AsyncClient, force _models_ready True, restore it. Lift the lot into a
single async context manager so each test body shrinks to its real
arrange / act / assert intent.
Tests that drive the lifespan directly (models_ready gauge) or stub
asyncio.to_thread for /train (which already patches _models_ready) stay
unchanged.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The /train, /train-sender, and /segtrain endpoints each duplicated the
same eight-line try/except + counter + gauge block around the
asyncio.to_thread call. Lift it into _record_training(runner, kind),
which accepts a sync- or async-returning callable for flexibility.
Each endpoint now ends with a single return line. Behaviour preserved —
status codes, error propagation, and metric labels stay identical.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The two block-iteration loops (/ocr and /ocr/stream's standard generator)
both ran the same word-total and illegible-word increments. Lift them
into a single helper so each call site becomes one line and the counter
intent reads cleanly. Pure refactor — no behavior change, tests stay green.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The asyncio.to_thread patch stubbed out the entire _run_training call,
hiding the real error path. Replacing it with a failing CompletedProcess
from subprocess.run exercises the actual ketos-failed branch and keeps
the test's intent — error counter bumps, 500 surfaces — intact.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
If uvicorn's access log format ever changes (args=None, or shorter
than 3 elements), the filter must keep forwarding records rather than
silently dropping them. Two extra LogRecords cover both edge cases.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Two regression tests:
- /train-sender hitting the success path bumps the recognition counter
(previously only /train and /segtrain were covered).
- A successful run whose result.accuracy is None must not call set() on
ocr_model_accuracy — the gauge stays at its default 0.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Previously the guided generator's page_started timer wrapped the entire
region loop including the synchronous correct_text() call, inflating
ocr_processing_seconds with spell-check latency. Sum the per-region
engine.extract_region_text durations instead so the histogram matches
the unguided stream's "engine only" semantic.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Locks in the post-download placement of the counter increment so a
regression that moves it back above _download_and_convert_pdf would fail.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The TODO was a placeholder for this work — the OCR service now exposes
/metrics so the target will flip from DOWN to UP on next image rebuild.
Refs #652
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds a logging.Filter on uvicorn.access that drops records whose request
path is /metrics or /health. Each is hit on a tight schedule (Prometheus
scrape interval and Docker healthcheck), so unfiltered they dominate the
access log without carrying any information about real traffic.
Refs #652 (Nora's recommendation)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Mirrors the existing _models_ready bool so Prometheus has a time-series
liveness/readiness signal for future alerting rules (e.g.
ocr_models_ready < 1 for 2m).
Refs #652 (AC7)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Hits /train then /segtrain through the same test, each with a distinct
mocked accuracy, and asserts the labelled gauges reflect the two values.
Locks down the kind-label separation between recognition and segmentation
accuracy (decision #2).
Refs #652 (AC6)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Wraps the await asyncio.to_thread(_run_*) calls in /train, /train-sender,
and /segtrain with try/except. Recognition training (/train, /train-sender)
shares kind="recognition"; /segtrain uses kind="segmentation". The
ocr_model_accuracy gauge is set per kind on success.
Refs #652 (AC6, decision #2)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Wraps every asyncio.to_thread(engine.extract_*) call with time.monotonic()
deltas in /ocr (per document) and in both /ocr/stream generators (per page).
Streaming buckets are the useful operational signal; the non-streaming
observation is a bonus.
Refs #652 (AC5)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Walks block["words"] before apply_confidence_markers strips the list, then
increments ocr_words_total by len(words) and ocr_illegible_words_total by
the count below threshold. Same pattern in both /ocr and /ocr/stream so the
ratio illegible/words is a faithful quality signal across endpoints.
Refs #652 (AC4)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Bumps the counter in both /ocr/stream except blocks (standard and guided
generators) so the existing skipped_pages local variable now also flows
into Prometheus.
Refs #652 (AC3b)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Bumps the counter inside both the standard and guided /ocr/stream
generators after a page yields its blocks, before the per-page json line is
emitted. Also moves the ocr_jobs_total increment for /ocr/stream right after
engine selection so the counter still fires when a page later errors out.
Refs #652 (AC3a)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Locks down AC2 for the non-Kurrent path. The same code branch in /ocr that
sets engine_name from script_type now has explicit coverage for both
HANDWRITING_KURRENT → kraken and TYPEWRITER → surya.
Refs #652 (AC2)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Pick engine="kraken" for HANDWRITING_KURRENT, engine="surya" otherwise,
then increment after the blocks have been extracted.
Refs #652 (AC2)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Encapsulates every custom OCR metric in an OcrMetrics frozen dataclass and
exposes a `build_metrics(registry)` factory. Production main.py binds against
the default REGISTRY; tests construct a fresh CollectorRegistry per case and
monkeypatch main.metrics, so counter values stay isolated between tests
(decision #3 on issue #652, Option A).
Refs #652
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Locks down AC1: prometheus-fastapi-instrumentator must keep auto-exposing
http_requests_total and http_request_duration_seconds for application
traffic, not just register the /metrics endpoint.
Refs #652
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Mount the instrumentator immediately after FastAPI app creation, excluding
/health and /metrics from request metrics to keep http_requests_total focused
on real application traffic.
Refs #652
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
A symlink placed inside importDir pointing to a file outside it would pass
isValidImportFilename (no forbidden chars in the symlink name) and be found
by Files.walk. Now checks candidate.getCanonicalPath() against
baseDir.getCanonicalPath() — if the resolved path escapes importDir,
throws DomainException.internal and aborts the import. Adds regression
test using @TempDir + Files.createSymbolicLink.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Introduces MassImportService.SkipReason with all five values —
INVALID_FILENAME_PATH_TRAVERSAL, INVALID_PDF_SIGNATURE, FILE_READ_ERROR,
ALREADY_EXISTS, S3_UPLOAD_FAILED — making the full set of reasons greppable
and type-safe. SkippedFile.reason changes from String to SkipReason;
importSingleDocument return type updated accordingly. JSON serialisation
is unchanged (Jackson serialises enums by name). All tests updated.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Documents that .hidden.pdf and "Brief an Oma.pdf" correctly pass the
isValidImportFilename guard — both are valid basenames common in the archive.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds checks for U+2215 DIVISION SLASH (∕), U+FF0F FULLWIDTH SOLIDUS (/),
and U+29F5 REVERSE SOLIDUS OPERATOR (⧵) — all of which bypass the existing
ASCII separator checks on Linux path resolution. Adds a clarifying comment on
the Paths.get().isAbsolute() call explaining its InvalidPathException safety
boundary. Adds 3 regression tests.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Rejects path-traversal filenames before findFileRecursive runs.
Guard runs on the derived filename (after the ternary) as specified.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>