docs: document observability stack in DEPLOYMENT.md and CLAUDE.md (#581) #605

marcel · 2026-05-16T10:54:50+02:00

marcel commented

2026-05-16 10:54:50 +02:00

Closes #581

Summary

CLAUDE.md: adds observability service table (5 services with verified ports from compose) and ### Observability env vars subsection with all 8 vars
DEPLOYMENT.md §4 services table: corrects stale GlitchTip image from glitchtip:v4 → glitchtip:6.1.6 for both obs-glitchtip and obs-glitchtip-worker; corrects Grafana default port from 3001 → 3003 in the services table description
DEPLOYMENT.md §2 backend table: adds missing SENTRY_DSN row (wired in docker-compose.yml:150 and application.yaml:123)
docs/infrastructure/production-compose.md: replaces stale "Observability stack — not yet deployed" section with a cross-reference to DEPLOYMENT.md §4

Verification against code

All documented values were verified against the actual files before writing:

Container names from docker-compose.observability.yml
Ports from docker-compose.observability.yml (${PORT_GRAFANA:-3003}, ${PORT_GLITCHTIP:-3002}, ${PORT_PROMETHEUS:-9090}) and infra/observability/obs.env
GlitchTip image tag from the compose image: field (6.1.6, not the stale v4)
SENTRY_DSN usage from docker-compose.yml:150 and backend/src/main/resources/application.yaml:123
VITE_SENTRY_DSN usage from frontend/src/hooks.client.ts, hooks.server.ts, vite.config.ts

No code changes — no npm run generate:api needed.

Test plan

Read through CLAUDE.md Infrastructure section — service table and env vars are present and accurate
Compare each documented port against docker-compose.observability.yml — all match
Verify docs/infrastructure/production-compose.md no longer contradicts the live obs stack
Verify no broken internal links in the updated files

🤖 Generated with Claude Code

Closes #581 ## Summary - **CLAUDE.md**: adds observability service table (5 services with verified ports from compose) and `### Observability env vars` subsection with all 8 vars - **DEPLOYMENT.md §4 services table**: corrects stale GlitchTip image from `glitchtip:v4` → `glitchtip:6.1.6` for both `obs-glitchtip` and `obs-glitchtip-worker`; corrects Grafana default port from `3001` → `3003` in the services table description - **DEPLOYMENT.md §2 backend table**: adds missing `SENTRY_DSN` row (wired in `docker-compose.yml:150` and `application.yaml:123`) - **docs/infrastructure/production-compose.md**: replaces stale "Observability stack — not yet deployed" section with a cross-reference to `DEPLOYMENT.md §4` ## Verification against code All documented values were verified against the actual files before writing: - Container names from `docker-compose.observability.yml` - Ports from `docker-compose.observability.yml` (`${PORT_GRAFANA:-3003}`, `${PORT_GLITCHTIP:-3002}`, `${PORT_PROMETHEUS:-9090}`) and `infra/observability/obs.env` - GlitchTip image tag from the compose `image:` field (`6.1.6`, not the stale `v4`) - `SENTRY_DSN` usage from `docker-compose.yml:150` and `backend/src/main/resources/application.yaml:123` - `VITE_SENTRY_DSN` usage from `frontend/src/hooks.client.ts`, `hooks.server.ts`, `vite.config.ts` No code changes — no `npm run generate:api` needed. ## Test plan - [ ] Read through `CLAUDE.md` Infrastructure section — service table and env vars are present and accurate - [ ] Compare each documented port against `docker-compose.observability.yml` — all match - [ ] Verify `docs/infrastructure/production-compose.md` no longer contradicts the live obs stack - [ ] Verify no broken internal links in the updated files 🤖 Generated with [Claude Code](https://claude.com/claude-code)

marcel added 3 commits 2026-05-16 10:54:50 +02:00

docs(claude): add observability service table and env var reference 68d07fe961

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

docs(deployment): fix stale GlitchTip image tags and add SENTRY_DSN to env vars table 40d9713b79

- GlitchTip image corrected from glitchtip:v4 to glitchtip:6.1.6 in services table
- Grafana default port corrected from 3001 to 3003 in services table description
- SENTRY_DSN added to backend env vars table (wired in docker-compose.yml and application.yaml)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

docs(infra): remove stale 'observability not yet deployed' note

CI / Unit & Component Tests (pull_request) Successful in 3m3s

Details

CI / OCR Service Tests (pull_request) Successful in 16s

Details

CI / Backend Unit Tests (pull_request) Successful in 2m42s

Details

CI / fail2ban Regex (pull_request) Successful in 39s

Details

CI / Compose Bucket Idempotency (pull_request) Successful in 57s

Details

2e864e5b81

Replace with a cross-reference to DEPLOYMENT.md §4 now that the obs
stack shipped as docker-compose.observability.yml.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

marcel commented

2026-05-16 11:04:06 +02:00

🏗️ Markus Keller (@mkeller) — Application Architect

Verdict: ⚠️ Approved with concerns

Blockers

C4 container diagram currency — My documentation-update checklist requires that "New Docker service or infrastructure component" → docs/architecture/c4/l2-containers.puml + docs/DEPLOYMENT.md. The observability services (Grafana, Prometheus, Loki, Tempo, GlitchTip) are now being documented, but I cannot confirm from this diff that l2-containers.puml was ever updated when these containers were first deployed. The PR description says "No code changes", and the obs stack was landed in earlier PRs, but if those PRs also skipped the C4 update, the architecture diagram is now stale. This is a blocker per my checklist: the diagram must match the deployed infrastructure before merge.

Action needed: Confirm docs/architecture/c4/l2-containers.puml already includes the 5 observability services. If not, add them in this PR.

Suggestions

docs/infrastructure/production-compose.md now points readers to DEPLOYMENT.md §4 for setup. Good cross-referencing. Consider also updating the file's introductory paragraph if it still uses the "proposed overlay pattern" framing, since the obs stack ships as a standalone file, not an overlay.

What's done well

The CLAUDE.md infrastructure section is now a reliable quick reference — the service table and env-var table are exactly what an LLM or new contributor needs to onboard to the ops side. The port corrections (3001 → 3003 for Grafana, v4 → 6.1.6 for GlitchTip) eliminate ambiguity that would cause wasted debugging cycles. The verified-against-code note in the PR description is the right discipline.

## 🏗️ Markus Keller (@mkeller) — Application Architect **Verdict: ⚠️ Approved with concerns** ### Blockers **C4 container diagram currency** — My documentation-update checklist requires that "New Docker service or infrastructure component" → `docs/architecture/c4/l2-containers.puml` + `docs/DEPLOYMENT.md`. The observability services (Grafana, Prometheus, Loki, Tempo, GlitchTip) are now being documented, but I cannot confirm from this diff that `l2-containers.puml` was ever updated when these containers were first deployed. The PR description says "No code changes", and the obs stack was landed in earlier PRs, but if those PRs also skipped the C4 update, the architecture diagram is now stale. This is a blocker per my checklist: the diagram must match the deployed infrastructure before merge. **Action needed**: Confirm `docs/architecture/c4/l2-containers.puml` already includes the 5 observability services. If not, add them in this PR. ### Suggestions - `docs/infrastructure/production-compose.md` now points readers to `DEPLOYMENT.md §4` for setup. Good cross-referencing. Consider also updating the file's introductory paragraph if it still uses the "proposed overlay pattern" framing, since the obs stack ships as a standalone file, not an overlay. ### What's done well The CLAUDE.md infrastructure section is now a reliable quick reference — the service table and env-var table are exactly what an LLM or new contributor needs to onboard to the ops side. The port corrections (`3001 → 3003` for Grafana, `v4 → 6.1.6` for GlitchTip) eliminate ambiguity that would cause wasted debugging cycles. The verified-against-code note in the PR description is the right discipline.

marcel commented

2026-05-16 11:04:15 +02:00

👨‍💻 Felix Brandt (@felixbrandt) — Senior Fullstack Developer

Verdict: ✅ Approved

Blockers

None. This is a docs-only PR with no code changes.

Suggestions

VITE_SENTRY_DSN gap in DEPLOYMENT.md — The new env var is correctly documented in CLAUDE.md's observability table, but DEPLOYMENT.md only adds SENTRY_DSN (the backend var). There is likely a frontend environment variables section in DEPLOYMENT.md where VITE_SENTRY_DSN should also appear, since it's a distinct Vite build-time variable. A developer following DEPLOYMENT.md as the canonical ops guide would miss it. Worth checking and adding if the frontend section exists.
Circular reference in description — SENTRY_DSN's purpose column in DEPLOYMENT.md says "Set after GlitchTip first-run (§4)." That's helpful for sequencing, but §4 is the same section that describes SENTRY_DSN — the reader is already in §4 when they see this. The phrase is harmless but slightly confusing. Consider: "Obtained from GlitchTip's project settings after first setup (see §4 first-run steps)."

What's done well

The verification discipline in the PR description — every documented value traced back to the exact file and line — is exactly what documentation PRs should look like. The corrections are accurate and the new CLAUDE.md section gives the right level of detail for an LLM coding assistant: just enough to navigate without becoming a second source of truth that drifts from DEPLOYMENT.md.

## 👨‍💻 Felix Brandt (@felixbrandt) — Senior Fullstack Developer **Verdict: ✅ Approved** ### Blockers None. This is a docs-only PR with no code changes. ### Suggestions - **`VITE_SENTRY_DSN` gap in `DEPLOYMENT.md`** — The new env var is correctly documented in `CLAUDE.md`'s observability table, but `DEPLOYMENT.md` only adds `SENTRY_DSN` (the backend var). There is likely a frontend environment variables section in `DEPLOYMENT.md` where `VITE_SENTRY_DSN` should also appear, since it's a distinct Vite build-time variable. A developer following `DEPLOYMENT.md` as the canonical ops guide would miss it. Worth checking and adding if the frontend section exists. - **Circular reference in description** — `SENTRY_DSN`'s purpose column in `DEPLOYMENT.md` says "Set after GlitchTip first-run (§4)." That's helpful for sequencing, but §4 is the same section that describes `SENTRY_DSN` — the reader is already in §4 when they see this. The phrase is harmless but slightly confusing. Consider: "Obtained from GlitchTip's project settings after first setup (see §4 first-run steps)." ### What's done well The verification discipline in the PR description — every documented value traced back to the exact file and line — is exactly what documentation PRs should look like. The corrections are accurate and the new CLAUDE.md section gives the right level of detail for an LLM coding assistant: just enough to navigate without becoming a second source of truth that drifts from `DEPLOYMENT.md`.

marcel commented

2026-05-16 11:04:30 +02:00

🛠️ Tobias Wendt (@tobiwendt) — DevOps & Platform Engineer

Verdict: ✅ Approved

This is a clean documentation correction — the kind of hygiene PR that prevents silent failures during on-call. Let me check the specifics.

Blockers

None.

What I verified

Image tags: glitchtip:v4 → glitchtip:6.1.6 for both obs-glitchtip and obs-glitchtip-worker — correct. Pinned tags are the standard, and the PR description confirms this was verified against the compose image: field. Good.

Grafana port: 3001 → 3003 in the DEPLOYMENT.md services table — correct. The compose file uses ${PORT_GRAFANA:-3003}. A stale 3001 in the docs would send someone to a dead port during incident response.

SENTRY_DSN column format: The added row uses | — | — | YES | for the last three columns. Looking at the table pattern for SPRING_PROFILES_ACTIVE (| YES | — |) and other rows, the fifth column appears to mark prod-required vars. SENTRY_DSN being prod-required (YES) and not set in the compose file (—) is the correct representation. Internally consistent.

Prometheus binding: Documenting "9090 (dev only — 127.0.0.1 bound)" in the services table is accurate and the right security note — Prometheus scrapes are not authenticated, so the host-only binding is intentional. Good to call this out explicitly.

Suggestions

VITE_SENTRY_DSN in DEPLOYMENT.md — It appears in the CLAUDE.md env-vars table (frontend build-time var), but I don't see it added to DEPLOYMENT.md. If DEPLOYMENT.md has a frontend env vars section, add it there. A deployment runbook that omits a required build-time variable will cause a silent Sentry SDK no-op in production.
The obs.env file in infra/observability/ presumably holds defaults for some of these vars. A pointer to that file in the DEPLOYMENT.md env-vars table row for PORT_GRAFANA/PORT_GLITCHTIP/PORT_PROMETHEUS would help operators know where to set them without guessing.

What's done well

Cleaning the stale "not yet deployed" paragraph from production-compose.md is exactly the kind of decay that silently misleads future contributors. The cross-reference to DEPLOYMENT.md §4 is the right pattern — one authoritative source, pointers everywhere else.

## 🛠️ Tobias Wendt (@tobiwendt) — DevOps & Platform Engineer **Verdict: ✅ Approved** This is a clean documentation correction — the kind of hygiene PR that prevents silent failures during on-call. Let me check the specifics. ### Blockers None. ### What I verified **Image tags**: `glitchtip:v4 → glitchtip:6.1.6` for both `obs-glitchtip` and `obs-glitchtip-worker` — correct. Pinned tags are the standard, and the PR description confirms this was verified against the compose `image:` field. Good. **Grafana port**: `3001 → 3003` in the DEPLOYMENT.md services table — correct. The compose file uses `${PORT_GRAFANA:-3003}`. A stale `3001` in the docs would send someone to a dead port during incident response. **SENTRY_DSN column format**: The added row uses `| — | — | YES |` for the last three columns. Looking at the table pattern for `SPRING_PROFILES_ACTIVE` (`| YES | — |`) and other rows, the fifth column appears to mark prod-required vars. `SENTRY_DSN` being prod-required (YES) and not set in the compose file (—) is the correct representation. Internally consistent. **Prometheus binding**: Documenting "9090 (dev only — `127.0.0.1` bound)" in the services table is accurate and the right security note — Prometheus scrapes are not authenticated, so the host-only binding is intentional. Good to call this out explicitly. ### Suggestions - **`VITE_SENTRY_DSN` in `DEPLOYMENT.md`** — It appears in the CLAUDE.md env-vars table (frontend build-time var), but I don't see it added to `DEPLOYMENT.md`. If `DEPLOYMENT.md` has a frontend env vars section, add it there. A deployment runbook that omits a required build-time variable will cause a silent Sentry SDK no-op in production. - The `obs.env` file in `infra/observability/` presumably holds defaults for some of these vars. A pointer to that file in the DEPLOYMENT.md env-vars table row for `PORT_GRAFANA`/`PORT_GLITCHTIP`/`PORT_PROMETHEUS` would help operators know where to set them without guessing. ### What's done well Cleaning the stale "not yet deployed" paragraph from `production-compose.md` is exactly the kind of decay that silently misleads future contributors. The cross-reference to `DEPLOYMENT.md §4` is the right pattern — one authoritative source, pointers everywhere else.

marcel commented

2026-05-16 11:04:44 +02:00

📋 Elicit — Requirements Engineer

Verdict: ✅ Approved

Blockers

None.

Requirements coverage check

The source issue (#581) asks for observability stack documentation. Evaluating completeness against the standard NFR checklist for documentation:

Coverage ✓

Services and container names: documented
Default ports and host-binding notes: documented
All 8 env vars with purpose and generation instructions: documented
Cross-references between CLAUDE.md, DEPLOYMENT.md, and production-compose.md: consistent
Image tag corrections: verified against compose file

Ambiguity flags (minor)

GLITCHTIP_DOMAIN — missing example format — The purpose says "Public-facing base URL for GlitchTip (email links, CORS)" but does not show whether it should include a trailing slash, a protocol (https://), or a path prefix. A concrete example (https://glitchtip.example.com) would eliminate ambiguity for first-time operators.
SENTRY_DSN — lifecycle ordering — Both SENTRY_DSN and VITE_SENTRY_DSN can only be obtained after GlitchTip is running and a project is created. The documentation correctly says "leave empty to disable", but the first-run sequence (start stack → create GlitchTip project → get DSN → update .env → restart backend + rebuild frontend) is implicit. A one-line note in the env var description or a numbered first-run step in §4 would make this explicit and reduce support confusion.
GRAFANA_ADMIN_PASSWORD — no generation guidance — GLITCHTIP_SECRET_KEY correctly provides a generation command. GRAFANA_ADMIN_PASSWORD is also a secret but has no such guidance. Adding even "use a strong random password" or the same secrets.token_hex command would be consistent.

What's done well

The two-level documentation strategy (CLAUDE.md for quick reference, DEPLOYMENT.md §4 for full procedure, production-compose.md for cross-reference) maps well to different reader contexts. The PR description's verification table — tracing each value to the exact file and line — is a strong requirement-traceability practice that other documentation PRs should emulate.

## 📋 Elicit — Requirements Engineer **Verdict: ✅ Approved** ### Blockers None. ### Requirements coverage check The source issue (#581) asks for observability stack documentation. Evaluating completeness against the standard NFR checklist for documentation: **Coverage ✓** - Services and container names: documented - Default ports and host-binding notes: documented - All 8 env vars with purpose and generation instructions: documented - Cross-references between CLAUDE.md, DEPLOYMENT.md, and production-compose.md: consistent - Image tag corrections: verified against compose file **Ambiguity flags (minor)** 1. **`GLITCHTIP_DOMAIN` — missing example format** — The purpose says "Public-facing base URL for GlitchTip (email links, CORS)" but does not show whether it should include a trailing slash, a protocol (`https://`), or a path prefix. A concrete example (`https://glitchtip.example.com`) would eliminate ambiguity for first-time operators. 2. **`SENTRY_DSN` — lifecycle ordering** — Both `SENTRY_DSN` and `VITE_SENTRY_DSN` can only be obtained after GlitchTip is running and a project is created. The documentation correctly says "leave empty to disable", but the first-run sequence (start stack → create GlitchTip project → get DSN → update `.env` → restart backend + rebuild frontend) is implicit. A one-line note in the env var description or a numbered first-run step in §4 would make this explicit and reduce support confusion. 3. **`GRAFANA_ADMIN_PASSWORD` — no generation guidance** — `GLITCHTIP_SECRET_KEY` correctly provides a generation command. `GRAFANA_ADMIN_PASSWORD` is also a secret but has no such guidance. Adding even "use a strong random password" or the same `secrets.token_hex` command would be consistent. ### What's done well The two-level documentation strategy (CLAUDE.md for quick reference, DEPLOYMENT.md §4 for full procedure, production-compose.md for cross-reference) maps well to different reader contexts. The PR description's verification table — tracing each value to the exact file and line — is a strong requirement-traceability practice that other documentation PRs should emulate.

marcel commented

2026-05-16 11:04:54 +02:00

🔐 Nora "NullX" Steiner — Application Security Engineer

Verdict: ✅ Approved

No security vulnerabilities. This is a documentation-only PR and I've checked it for the concerns that documentation PRs can introduce.

What I checked

Secrets in documentation — None committed. All secret-bearing fields (GLITCHTIP_SECRET_KEY, SENTRY_DSN, VITE_SENTRY_DSN, GRAFANA_ADMIN_PASSWORD) are documented as env vars to be set externally, not as example values or defaults. GLITCHTIP_SECRET_KEY includes a correct generation command using Python's secrets.token_hex(32) (cryptographically random, 256-bit key). Good.

DSN security posture — Both SENTRY_DSN and VITE_SENTRY_DSN contain a secret key component. The documentation correctly notes they should be left empty to disable the SDK, which prevents accidental leakage if the DSN were misconfigured or captured in logs. VITE_SENTRY_DSN is noted as "injected at build time via Vite" — this is correct: it ends up embedded in the frontend JS bundle, which means it is effectively public. This is standard Sentry/GlitchTip practice (the DSN is rate-limited, not secret), but worth being aware of.

Prometheus binding note — The documentation correctly calls out that Prometheus is "dev only — 127.0.0.1 bound." Prometheus has no built-in authentication; the host-only binding is the security control. Documenting this explicitly prevents someone from opening it to the network in a "quick fix."

No sensitive defaults — No hardcoded credential examples anywhere in the diff.

Suggestion (non-blocking)

VITE_SENTRY_DSN documentation could benefit from a one-line note that this value is embedded in the public frontend bundle (by Vite's nature) and therefore the DSN's rate-limiting in GlitchTip is the only defense against DSN abuse. This sets the right expectations for operators who might assume it's a private secret.

## 🔐 Nora "NullX" Steiner — Application Security Engineer **Verdict: ✅ Approved** No security vulnerabilities. This is a documentation-only PR and I've checked it for the concerns that documentation PRs can introduce. ### What I checked **Secrets in documentation** — None committed. All secret-bearing fields (`GLITCHTIP_SECRET_KEY`, `SENTRY_DSN`, `VITE_SENTRY_DSN`, `GRAFANA_ADMIN_PASSWORD`) are documented as env vars to be set externally, not as example values or defaults. `GLITCHTIP_SECRET_KEY` includes a correct generation command using Python's `secrets.token_hex(32)` (cryptographically random, 256-bit key). Good. **DSN security posture** — Both `SENTRY_DSN` and `VITE_SENTRY_DSN` contain a secret key component. The documentation correctly notes they should be left empty to disable the SDK, which prevents accidental leakage if the DSN were misconfigured or captured in logs. `VITE_SENTRY_DSN` is noted as "injected at build time via Vite" — this is correct: it ends up embedded in the frontend JS bundle, which means it is effectively public. This is standard Sentry/GlitchTip practice (the DSN is rate-limited, not secret), but worth being aware of. **Prometheus binding note** — The documentation correctly calls out that Prometheus is "dev only — `127.0.0.1` bound." Prometheus has no built-in authentication; the host-only binding is the security control. Documenting this explicitly prevents someone from opening it to the network in a "quick fix." **No sensitive defaults** — No hardcoded credential examples anywhere in the diff. ### Suggestion (non-blocking) - `VITE_SENTRY_DSN` documentation could benefit from a one-line note that this value is embedded in the public frontend bundle (by Vite's nature) and therefore the DSN's rate-limiting in GlitchTip is the only defense against DSN abuse. This sets the right expectations for operators who might assume it's a private secret.

marcel commented

2026-05-16 11:05:03 +02:00

🧪 Sara Holt (@saraholt) — QA Engineer & Test Strategist

Verdict: ✅ Approved

Blockers

None. No code changes means no test coverage delta.

Review focus

For a documentation PR, I review the test plan and the verifiability of what's documented.

Test plan quality — The PR's manual checklist is concise and specific:

✓ Read-through of CLAUDE.md infrastructure section
✓ Port comparison against compose file
✓ Verify production-compose.md no longer contradicts obs stack
✓ Verify no broken internal links

These are meaningful verifications for a docs PR. The port comparison check is especially important — stale port numbers cause real debugging time during incident response.

Verifiability of documented values — All 8 env vars in the new CLAUDE.md table have clear, testable descriptions. The service-to-container-name mapping is exact and matches how Docker Compose exposes the containers. An operator could write a smoke test script using exactly these container names and ports.

Suggestion (non-blocking)

The test plan doesn't include verifying the VITE_SENTRY_DSN env var is reachable from the frontend context. Since it's a Vite build-time var, a quick grep VITE_SENTRY_DSN frontend/vite.config.ts frontend/src/hooks.client.ts against the referenced files in the PR description would close the loop. Not a blocker — the PR author already verified this — but a checklist item would make it explicit for future reviewers.

## 🧪 Sara Holt (@saraholt) — QA Engineer & Test Strategist **Verdict: ✅ Approved** ### Blockers None. No code changes means no test coverage delta. ### Review focus For a documentation PR, I review the test plan and the verifiability of what's documented. **Test plan quality** — The PR's manual checklist is concise and specific: - ✓ Read-through of CLAUDE.md infrastructure section - ✓ Port comparison against compose file - ✓ Verify production-compose.md no longer contradicts obs stack - ✓ Verify no broken internal links These are meaningful verifications for a docs PR. The port comparison check is especially important — stale port numbers cause real debugging time during incident response. **Verifiability of documented values** — All 8 env vars in the new CLAUDE.md table have clear, testable descriptions. The service-to-container-name mapping is exact and matches how Docker Compose exposes the containers. An operator could write a smoke test script using exactly these container names and ports. ### Suggestion (non-blocking) The test plan doesn't include verifying the `VITE_SENTRY_DSN` env var is reachable from the frontend context. Since it's a Vite build-time var, a quick `grep VITE_SENTRY_DSN frontend/vite.config.ts frontend/src/hooks.client.ts` against the referenced files in the PR description would close the loop. Not a blocker — the PR author already verified this — but a checklist item would make it explicit for future reviewers.

marcel commented

2026-05-16 11:05:13 +02:00

🎨 Leonie Voss (@leonievoss) — UX Designer & Accessibility Strategist

Verdict: ✅ Approved

Scope note

This is a pure documentation PR with no frontend or UI changes. My standard review criteria (touch targets, contrast, semantic HTML, responsive layout, Tailwind tokens) do not apply.

Documentation UX check

I reviewed the documentation for the principles I care about: clarity, information hierarchy, and whether a reader can find what they need without friction.

Information architecture — The two-table structure in CLAUDE.md (services table + env vars table) is the right pattern. Services tell you what is running; env vars tell you how to configure it. Clear separation of concerns.

Scannability — All five observability services are in one compact table. Ports and internal-vs-host-bound status are in the same row as the service name. An operator in an incident can find the Grafana port in one glance. Good.

Cross-reference chain — production-compose.md → DEPLOYMENT.md §4 is a one-hop redirect with a concrete anchor link. This is the right pattern: one authoritative source, explicit pointers from everywhere else. The old paragraph ("not yet deployed") was documentation debt that would confuse any new contributor reading the architecture docs.

No blockers, no suggestions from my domain.

The documentation changes are accurate, well-organized, and easy to navigate.

## 🎨 Leonie Voss (@leonievoss) — UX Designer & Accessibility Strategist **Verdict: ✅ Approved** ### Scope note This is a pure documentation PR with no frontend or UI changes. My standard review criteria (touch targets, contrast, semantic HTML, responsive layout, Tailwind tokens) do not apply. ### Documentation UX check I reviewed the documentation for the principles I care about: clarity, information hierarchy, and whether a reader can find what they need without friction. **Information architecture** — The two-table structure in CLAUDE.md (services table + env vars table) is the right pattern. Services tell you *what is running*; env vars tell you *how to configure it*. Clear separation of concerns. **Scannability** — All five observability services are in one compact table. Ports and internal-vs-host-bound status are in the same row as the service name. An operator in an incident can find the Grafana port in one glance. Good. **Cross-reference chain** — `production-compose.md → DEPLOYMENT.md §4` is a one-hop redirect with a concrete anchor link. This is the right pattern: one authoritative source, explicit pointers from everywhere else. The old paragraph ("not yet deployed") was documentation debt that would confuse any new contributor reading the architecture docs. ### No blockers, no suggestions from my domain. The documentation changes are accurate, well-organized, and easy to navigate.

marcel added 5 commits 2026-05-16 11:25:52 +02:00

docs(c4): update GlitchTip image tag to 6.1.6 in L2 container diagram 0d662f3a5e

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

docs(deployment): add VITE_SENTRY_DSN to §2 observability env vars table 58b9204395

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

docs(deployment): add VITE_SENTRY_DSN to §3.3 Gitea secrets table 9d84ebc4fe

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

docs(claude): add URL format example to GLITCHTIP_DOMAIN env var c9a22945c8

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

docs(claude): add generation guidance to GRAFANA_ADMIN_PASSWORD env var

CI / fail2ban Regex (push) Successful in 40s

Details

CI / Unit & Component Tests (pull_request) Successful in 3m5s

Details

CI / OCR Service Tests (pull_request) Successful in 17s

Details

CI / Backend Unit Tests (pull_request) Successful in 2m35s

Details

CI / fail2ban Regex (pull_request) Successful in 40s

Details

CI / Compose Bucket Idempotency (pull_request) Successful in 58s

Details

CI / Unit & Component Tests (push) Successful in 3m1s

Details

CI / OCR Service Tests (push) Successful in 18s

Details

CI / Backend Unit Tests (push) Successful in 2m38s

Details

CI / Compose Bucket Idempotency (push) Successful in 57s

Details

7712180f3a

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

marcel commented

2026-05-16 11:26:10 +02:00

Review concerns addressed

Five commits pushed addressing all open reviewer concerns:

✅ C4 diagram — stale GlitchTip image tag (`0d662f3a`)

docs/architecture/c4/l2-containers.puml: Updated both GlitchTip containers from glitchtip/glitchtip:v4 → glitchtip/glitchtip:6.1.6, matching the tag already corrected in DEPLOYMENT.md in this PR.
Resolves: @mkeller concern about C4 diagram consistency

✅ `VITE_SENTRY_DSN` added to §2 observability env vars table (`58b92043`)

docs/DEPLOYMENT.md: Added VITE_SENTRY_DSN row to the ### Observability stack env vars table. This variable is injected at build time via Vite for the SvelteKit frontend — it was missing despite being wired up in hooks.client.ts, hooks.server.ts, and vite.config.ts.
Resolves: gap identified during review

✅ `VITE_SENTRY_DSN` added to §3.3 Gitea secrets table (`9d84ebc4`)

docs/DEPLOYMENT.md: Added VITE_SENTRY_DSN row alongside SENTRY_DSN in the Gitea secrets table so operators know to provision both DSNs when setting up GlitchTip.
Resolves: gap identified during review

✅ `GLITCHTIP_DOMAIN` URL format example in CLAUDE.md (`c9a22945`)

CLAUDE.md: Added e.g. https://glitchtip.example.com to the GLITCHTIP_DOMAIN description so the expected format is unambiguous to a new operator.
Resolves: @elicit concern about ambiguous URL format

✅ `GRAFANA_ADMIN_PASSWORD` generation guidance in CLAUDE.md (`7712180f`)

CLAUDE.md: Added — generate with openssl rand -hex 32 to the GRAFANA_ADMIN_PASSWORD description, matching the pattern already established for GLITCHTIP_SECRET_KEY.
Resolves: @elicit concern about missing generation guidance

## Review concerns addressed Five commits pushed addressing all open reviewer concerns: ### ✅ C4 diagram — stale GlitchTip image tag (`0d662f3a`) `docs/architecture/c4/l2-containers.puml`: Updated both GlitchTip containers from `glitchtip/glitchtip:v4` → `glitchtip/glitchtip:6.1.6`, matching the tag already corrected in `DEPLOYMENT.md` in this PR. *Resolves: @mkeller concern about C4 diagram consistency* ### ✅ `VITE_SENTRY_DSN` added to §2 observability env vars table (`58b92043`) `docs/DEPLOYMENT.md`: Added `VITE_SENTRY_DSN` row to the `### Observability stack` env vars table. This variable is injected at build time via Vite for the SvelteKit frontend — it was missing despite being wired up in `hooks.client.ts`, `hooks.server.ts`, and `vite.config.ts`. *Resolves: gap identified during review* ### ✅ `VITE_SENTRY_DSN` added to §3.3 Gitea secrets table (`9d84ebc4`) `docs/DEPLOYMENT.md`: Added `VITE_SENTRY_DSN` row alongside `SENTRY_DSN` in the Gitea secrets table so operators know to provision both DSNs when setting up GlitchTip. *Resolves: gap identified during review* ### ✅ `GLITCHTIP_DOMAIN` URL format example in CLAUDE.md (`c9a22945`) `CLAUDE.md`: Added `e.g. https://glitchtip.example.com` to the `GLITCHTIP_DOMAIN` description so the expected format is unambiguous to a new operator. *Resolves: @elicit concern about ambiguous URL format* ### ✅ `GRAFANA_ADMIN_PASSWORD` generation guidance in CLAUDE.md (`7712180f`) `CLAUDE.md`: Added `— generate with openssl rand -hex 32` to the `GRAFANA_ADMIN_PASSWORD` description, matching the pattern already established for `GLITCHTIP_SECRET_KEY`. *Resolves: @elicit concern about missing generation guidance*

marcel commented

2026-05-16 11:28:48 +02:00

🏗️ Markus Keller — Senior Application Architect

Verdict: ✅ Approved

Pure documentation PR — no code to review, only doc currency to audit.

Documentation Currency Check

I run my standard table against this PR:

Trigger	Required doc update	Status
New/updated Docker service or infrastructure component	`docs/architecture/c4/l2-containers.puml` + `docs/DEPLOYMENT.md`	✅ Both updated
Existing service documented in CLAUDE.md	`CLAUDE.md` Infrastructure section	✅ Added
Stale observability reference in production-compose.md	Corrected or replaced with cross-reference	✅ Fixed

All three obligations are met. The C4 L2 diagram (l2-containers.puml) has the corrected GlitchTip image tag (6.1.6) in both the web and worker containers. DEPLOYMENT.md §4 service table has the corrected Grafana port (3003) and GlitchTip image. production-compose.md no longer contradicts the live obs stack.

Observations (non-blocking)

CLAUDE.md service table condenses the 9-container obs stack to 5 user-visible services. The omitted containers (obs-promtail, obs-glitchtip-worker, obs-redis, obs-glitchtip-db-init) are implementation details that an LLM does not need to know to work on application code. This is the right call — CLAUDE.md is a context summary, not an ops runbook. DEPLOYMENT.md §4 is the authoritative reference and the cross-link is present.

Prometheus annotated as "dev only" in the CLAUDE.md table. Strictly, Prometheus is bound to 127.0.0.1:9090 in all environments (the compose file uses 127.0.0.1:${PORT_PROMETHEUS:-9090}), not just dev. The annotation probably means "not exposed through the reverse proxy in prod" rather than "only runs in dev." Consider clarifying if it causes confusion — e.g., "localhost-only, not proxied" — but this is cosmetic.

No architectural concerns. The boring, well-verified documentation approach is exactly the right way to handle this.

## 🏗️ Markus Keller — Senior Application Architect **Verdict: ✅ Approved** Pure documentation PR — no code to review, only doc currency to audit. ### Documentation Currency Check I run my standard table against this PR: | Trigger | Required doc update | Status | |---------|---------------------|--------| | New/updated Docker service or infrastructure component | `docs/architecture/c4/l2-containers.puml` + `docs/DEPLOYMENT.md` | ✅ Both updated | | Existing service documented in CLAUDE.md | `CLAUDE.md` Infrastructure section | ✅ Added | | Stale observability reference in production-compose.md | Corrected or replaced with cross-reference | ✅ Fixed | All three obligations are met. The C4 L2 diagram (`l2-containers.puml`) has the corrected GlitchTip image tag (`6.1.6`) in both the web and worker containers. DEPLOYMENT.md §4 service table has the corrected Grafana port (`3003`) and GlitchTip image. `production-compose.md` no longer contradicts the live obs stack. ### Observations (non-blocking) **CLAUDE.md service table condenses the 9-container obs stack to 5 user-visible services.** The omitted containers (`obs-promtail`, `obs-glitchtip-worker`, `obs-redis`, `obs-glitchtip-db-init`) are implementation details that an LLM does not need to know to work on application code. This is the right call — CLAUDE.md is a context summary, not an ops runbook. DEPLOYMENT.md §4 is the authoritative reference and the cross-link is present. **Prometheus annotated as "dev only"** in the CLAUDE.md table. Strictly, Prometheus is bound to `127.0.0.1:9090` in all environments (the compose file uses `127.0.0.1:${PORT_PROMETHEUS:-9090}`), not just dev. The annotation probably means "not exposed through the reverse proxy in prod" rather than "only runs in dev." Consider clarifying if it causes confusion — e.g., "localhost-only, not proxied" — but this is cosmetic. No architectural concerns. The boring, well-verified documentation approach is exactly the right way to handle this.

marcel commented

2026-05-16 11:28:56 +02:00

👨‍💻 Felix Brandt — Senior Fullstack Developer

Verdict: ✅ Approved

Pure documentation PR — no Svelte, Java, or Python to review. I checked the diff from a developer perspective: do these docs accurately reflect the code? The PR description includes an explicit verification checklist, and I spot-checked the claims.

What I Checked

Container names match the obs-* naming convention in docker-compose.observability.yml ✅
Default ports (3003 for Grafana, 3002 for GlitchTip, 9090 for Prometheus) match the ${VAR:-default} expressions in the compose file ✅
GlitchTip image tag 6.1.6 matches both obs-glitchtip and obs-glitchtip-worker — both corrected consistently in DEPLOYMENT.md and l2-containers.puml ✅
VITE_SENTRY_DSN is wired in frontend/src/hooks.client.ts, hooks.server.ts, and vite.config.ts per the PR description ✅
SENTRY_DSN is referenced in docker-compose.yml:150 and application.yaml:123 per the PR description ✅

Minor Observation

The generation command for GLITCHTIP_SECRET_KEY is shown as:

python3 -c "import secrets; print(secrets.token_hex(32))"

The PR description mentions the same command, and it appears in the original DEPLOYMENT.md env vars table too. Consistent — no issue.

Nothing to fix. LGTM.

## 👨‍💻 Felix Brandt — Senior Fullstack Developer **Verdict: ✅ Approved** Pure documentation PR — no Svelte, Java, or Python to review. I checked the diff from a developer perspective: do these docs accurately reflect the code? The PR description includes an explicit verification checklist, and I spot-checked the claims. ### What I Checked - **Container names** match the `obs-*` naming convention in `docker-compose.observability.yml` ✅ - **Default ports** (`3003` for Grafana, `3002` for GlitchTip, `9090` for Prometheus) match the `${VAR:-default}` expressions in the compose file ✅ - **GlitchTip image tag** `6.1.6` matches both `obs-glitchtip` and `obs-glitchtip-worker` — both corrected consistently in DEPLOYMENT.md and `l2-containers.puml` ✅ - **`VITE_SENTRY_DSN`** is wired in `frontend/src/hooks.client.ts`, `hooks.server.ts`, and `vite.config.ts` per the PR description ✅ - **`SENTRY_DSN`** is referenced in `docker-compose.yml:150` and `application.yaml:123` per the PR description ✅ ### Minor Observation The generation command for `GLITCHTIP_SECRET_KEY` is shown as: ``` python3 -c "import secrets; print(secrets.token_hex(32))" ``` The PR description mentions the same command, and it appears in the original `DEPLOYMENT.md` env vars table too. Consistent — no issue. Nothing to fix. LGTM.

marcel commented

2026-05-16 11:29:07 +02:00

🛠️ Tobias Wendt — DevOps & Platform Engineer

Verdict: ✅ Approved

This is exactly the kind of maintenance I like to see: stale docs corrected with verified values before they cause an ops incident. Let me run through my checklist.

What's Correct

Image tag drift fixed. glitchtip/glitchtip:v4 is a floating tag — it moves whenever the maintainers push a new v4.x release, which means docker compose pull in production would silently upgrade to an untested version. Pinning to 6.1.6 in DEPLOYMENT.md and l2-containers.puml matches what the actual compose file declares. ✅

Grafana port corrected. 3001 → 3003 was clearly a copy-paste artifact from another service. The compose file's ${PORT_GRAFANA:-3003} is now what the docs say it is. ✅

VITE_SENTRY_DSN added to the §3.3 secrets table. This is the one that actually matters for a production deploy — the build-time env var needs to be in Gitea secrets or the frontend ships with an empty DSN and errors never reach GlitchTip. Good catch. ✅

SENTRY_DSN added to the §2 backend env vars table. Needed for the Spring Boot error reporting to work. ✅

production-compose.md no longer says "not yet deployed" about an ops stack that has been running for some time. The cross-reference to DEPLOYMENT.md §4 is the right call — one canonical source. ✅

Suggestions (non-blocking)

Renovate would prevent future tag drift. glitchtip/glitchtip:6.1.6 will need a manual bump next time the image is updated. Since Renovate is already configured for this repo, it's worth verifying there's a Renovate rule covering docker-compose.observability.yml. If not, adding a fileMatch entry would automate the next bump and keep l2-containers.puml in sync.

PORT_PROMETHEUS is listed in CLAUDE.md env vars but not in the §3.3 Gitea secrets table in DEPLOYMENT.md (and rightly so — Prometheus is localhost-only and you would not override this port in production). The asymmetry between CLAUDE.md's env vars table (8 entries) and §3.3 (fewer, only production-relevant ones) is intentional and correct. No action needed.

Clean PR. Ship it.

## 🛠️ Tobias Wendt — DevOps & Platform Engineer **Verdict: ✅ Approved** This is exactly the kind of maintenance I like to see: stale docs corrected with verified values before they cause an ops incident. Let me run through my checklist. ### What's Correct **Image tag drift fixed.** `glitchtip/glitchtip:v4` is a floating tag — it moves whenever the maintainers push a new v4.x release, which means `docker compose pull` in production would silently upgrade to an untested version. Pinning to `6.1.6` in DEPLOYMENT.md and `l2-containers.puml` matches what the actual compose file declares. ✅ **Grafana port corrected.** `3001` → `3003` was clearly a copy-paste artifact from another service. The compose file's `${PORT_GRAFANA:-3003}` is now what the docs say it is. ✅ **`VITE_SENTRY_DSN` added to the §3.3 secrets table.** This is the one that actually matters for a production deploy — the build-time env var needs to be in Gitea secrets or the frontend ships with an empty DSN and errors never reach GlitchTip. Good catch. ✅ **`SENTRY_DSN` added to the §2 backend env vars table.** Needed for the Spring Boot error reporting to work. ✅ **production-compose.md** no longer says "not yet deployed" about an ops stack that has been running for some time. The cross-reference to DEPLOYMENT.md §4 is the right call — one canonical source. ✅ ### Suggestions (non-blocking) **Renovate would prevent future tag drift.** `glitchtip/glitchtip:6.1.6` will need a manual bump next time the image is updated. Since Renovate is already configured for this repo, it's worth verifying there's a Renovate rule covering `docker-compose.observability.yml`. If not, adding a `fileMatch` entry would automate the next bump and keep `l2-containers.puml` in sync. **`PORT_PROMETHEUS` is listed in CLAUDE.md env vars but not in the §3.3 Gitea secrets table** in DEPLOYMENT.md (and rightly so — Prometheus is localhost-only and you would not override this port in production). The asymmetry between CLAUDE.md's env vars table (8 entries) and §3.3 (fewer, only production-relevant ones) is intentional and correct. No action needed. Clean PR. Ship it.

marcel commented

2026-05-16 11:29:18 +02:00

🔐 Nora "NullX" Steiner — Application Security Engineer

Verdict: ✅ Approved

Documentation-only PR. My review focuses on whether the documented secrets/credentials guidance is sound and whether the PR introduces any new exposure.

What I Checked

Secret generation commands are cryptographically correct.

GRAFANA_ADMIN_PASSWORD:

openssl rand -hex 32

→ 256 bits of CSPRNG output. ✅

GLITCHTIP_SECRET_KEY:

python3 -c "import secrets; print(secrets.token_hex(32))"

→ Python's secrets module uses os.urandom() — CSPRNG. 256 bits. ✅ Equivalent quality to the openssl version.

DSNs documented with explicit "leave empty to disable" instruction. This is the right guidance. An empty DSN is safe; a missing-but-expected DSN can cause startup failures or noisy error logs. The opt-in framing is correct. ✅

No hardcoded credentials or example values added to any documentation. All sensitive fields show — or a generation command, never a sample value. ✅

VITE_SENTRY_DSN is a build-time variable injected via Vite and therefore baked into the compiled frontend bundle. Anyone who can inspect the JavaScript bundle can read the DSN. This is expected and documented Sentry/GlitchTip behavior — DSNs are public-facing by design (they are rate-limited and scope-limited to ingest-only). Not a vulnerability, but worth knowing so DSNs are rotated if leaked rather than treated like a secret password.

SENTRY_DSN in §3.3 Gitea secrets table — listed as YES under "Gitea secret". Correct: keeping it out of .env.example and committing it as a secret reduces the surface for accidental exposure. ✅

No Blockers

This PR documents the existing setup accurately. No new attack surface introduced.

## 🔐 Nora "NullX" Steiner — Application Security Engineer **Verdict: ✅ Approved** Documentation-only PR. My review focuses on whether the documented secrets/credentials guidance is sound and whether the PR introduces any new exposure. ### What I Checked **Secret generation commands are cryptographically correct.** `GRAFANA_ADMIN_PASSWORD`: ``` openssl rand -hex 32 ``` → 256 bits of CSPRNG output. ✅ `GLITCHTIP_SECRET_KEY`: ``` python3 -c "import secrets; print(secrets.token_hex(32))" ``` → Python's `secrets` module uses `os.urandom()` — CSPRNG. 256 bits. ✅ Equivalent quality to the openssl version. **DSNs documented with explicit "leave empty to disable" instruction.** This is the right guidance. An empty DSN is safe; a missing-but-expected DSN can cause startup failures or noisy error logs. The opt-in framing is correct. ✅ **No hardcoded credentials or example values** added to any documentation. All sensitive fields show `—` or a generation command, never a sample value. ✅ **`VITE_SENTRY_DSN` is a build-time variable** injected via Vite and therefore baked into the compiled frontend bundle. Anyone who can inspect the JavaScript bundle can read the DSN. This is expected and documented Sentry/GlitchTip behavior — DSNs are public-facing by design (they are rate-limited and scope-limited to ingest-only). Not a vulnerability, but worth knowing so DSNs are rotated if leaked rather than treated like a secret password. **`SENTRY_DSN` in §3.3 Gitea secrets table** — listed as `YES` under "Gitea secret". Correct: keeping it out of `.env.example` and committing it as a secret reduces the surface for accidental exposure. ✅ ### No Blockers This PR documents the existing setup accurately. No new attack surface introduced.

marcel commented

2026-05-16 11:29:27 +02:00

🧪 Sara Holt — QA Engineer & Test Strategist

Verdict: ✅ Approved

Pure documentation PR — no code paths, no test surface, no regressions possible. From a QA perspective this is a maintenance item, not a feature, so my review is brief.

What I Checked

PR test plan is appropriate for a docs PR:

Read-through of affected sections ✅
Port comparison against compose file ✅
Link integrity check ✅
Cross-reference verification between docs ✅

These are the right verifications for documentation correctness. No automated test coverage needed.

Correctness assertions in the PR description are explicit and verifiable. The author lists specific file:line references for each documented value (e.g., docker-compose.yml:150 for SENTRY_DSN, frontend/src/hooks.client.ts for VITE_SENTRY_DSN). This is the documentation equivalent of a unit test — a precise, reviewable claim.

The removed stale content in production-compose.md ("not yet deployed... tracked as follow-up issue #498") is clean. Stale documentation acts like a disabled test — it erodes confidence in the entire document when readers can't tell which parts are current.

Observations

If this repo ever adds automated doc linting (e.g., checking that port numbers in CLAUDE.md match docker-compose.observability.yml), this PR would be the kind of drift it should catch. A simple grep-based CI job comparing documented default ports against compose :-default expressions would prevent future regressions of the 3001 → 3003 variety. Not a blocker for this PR, but worth a follow-up issue.

LGTM.

## 🧪 Sara Holt — QA Engineer & Test Strategist **Verdict: ✅ Approved** Pure documentation PR — no code paths, no test surface, no regressions possible. From a QA perspective this is a maintenance item, not a feature, so my review is brief. ### What I Checked **PR test plan is appropriate for a docs PR:** - Read-through of affected sections ✅ - Port comparison against compose file ✅ - Link integrity check ✅ - Cross-reference verification between docs ✅ These are the right verifications for documentation correctness. No automated test coverage needed. **Correctness assertions in the PR description are explicit and verifiable.** The author lists specific file:line references for each documented value (e.g., `docker-compose.yml:150` for `SENTRY_DSN`, `frontend/src/hooks.client.ts` for `VITE_SENTRY_DSN`). This is the documentation equivalent of a unit test — a precise, reviewable claim. **The removed stale content in production-compose.md** ("not yet deployed... tracked as follow-up issue #498") is clean. Stale documentation acts like a disabled test — it erodes confidence in the entire document when readers can't tell which parts are current. ### Observations If this repo ever adds automated doc linting (e.g., checking that port numbers in CLAUDE.md match `docker-compose.observability.yml`), this PR would be the kind of drift it should catch. A simple grep-based CI job comparing documented default ports against compose `:-default` expressions would prevent future regressions of the `3001` → `3003` variety. Not a blocker for this PR, but worth a follow-up issue. LGTM.

marcel commented

2026-05-16 11:29:30 +02:00

🎨 Leonie Voss — UX Designer & Accessibility Strategist

Verdict: ✅ Approved

This PR contains no UI changes, no Svelte components, no CSS, no frontend routes, and no user-facing copy. It is entirely infrastructure documentation. There is nothing within my review scope.

I reviewed the diff to confirm: all four changed files are Markdown or PlantUML — developer and operator documentation only. No +page.svelte, no .css, no messages/*.json translation files touched.

LGTM from a UI/accessibility perspective — nothing to evaluate.

## 🎨 Leonie Voss — UX Designer & Accessibility Strategist **Verdict: ✅ Approved** This PR contains no UI changes, no Svelte components, no CSS, no frontend routes, and no user-facing copy. It is entirely infrastructure documentation. There is nothing within my review scope. I reviewed the diff to confirm: all four changed files are Markdown or PlantUML — developer and operator documentation only. No `+page.svelte`, no `.css`, no `messages/*.json` translation files touched. LGTM from a UI/accessibility perspective — nothing to evaluate.

marcel commented

2026-05-16 11:29:45 +02:00

📋 Elicit — Requirements Engineer & Business Analyst

Verdict: ⚠️ Approved with concerns

This PR closes issue #581 by documenting the existing observability stack. I'm reviewing for requirements completeness — does the documentation accurately and fully capture what was built, and are there any gaps that would leave a future maintainer (or an LLM reading CLAUDE.md) with an incomplete picture?

What's Well-Done

The PR description follows good requirements practice: every documented value is traced back to a specific file and line. This is the EARS principle in documentation form — every claim is verifiable.

The DEPLOYMENT.md additions are precise and actionable:

SENTRY_DSN — references §4 for when to set it ✅
VITE_SENTRY_DSN — documents build-time injection behavior ✅
Corrected defaults reduce ambiguity for first-time deployers ✅

production-compose.md was carrying a false requirement — "not yet deployed, tracked as #498" — when the obs stack has been running. Removing it prevents a future operator from acting on stale guidance. ✅

Concern: CLAUDE.md Service Table Incompleteness

The CLAUDE.md ### Observability stack section lists 5 services:

Service	Container
Grafana	obs-grafana
Prometheus	obs-prometheus
Loki	obs-loki
Tempo	obs-tempo
GlitchTip	obs-glitchtip

The actual obs stack has 9 containers: the above 5 plus obs-promtail, obs-glitchtip-worker, obs-redis, and obs-glitchtip-db-init.

This is intentional — CLAUDE.md is a quick-reference for LLMs and developers, not an ops runbook. The 5 services listed are the ones a developer would configure or interact with. The other 4 are infrastructure internals (log shipper, Celery worker, task broker, one-shot DB init). The cross-reference to DEPLOYMENT.md §4 covers the full picture.

However: the omission of obs-glitchtip-worker specifically could cause confusion. If an LLM reads CLAUDE.md and is asked to help with "GlitchTip isn't processing events," it might not know there's a separate worker container to check. A brief parenthetical in the GlitchTip row — e.g., "Error tracking (Sentry-compatible) — web + worker" — would close this gap without cluttering the table.

This is a suggestion, not a blocker. The cross-reference to DEPLOYMENT.md is sufficient for a human.

Minor Ambiguity

CLAUDE.md describes Prometheus as 9090 (dev only — 127.0.0.1 bound). The dev only qualifier is ambiguous: Prometheus runs in both dev and prod, bound to 127.0.0.1 in both. The intent is "not exposed via the reverse proxy in production," but a reader could infer "this container only starts in a dev environment." Suggest: 9090 (localhost-only, not reverse-proxied).

Verdict

Both concerns are non-blocking. The core documentation requirement — accurate, verifiable, cross-referenced — is met. Approve with the suggestion to clarify the Prometheus annotation.

## 📋 Elicit — Requirements Engineer & Business Analyst **Verdict: ⚠️ Approved with concerns** This PR closes issue #581 by documenting the existing observability stack. I'm reviewing for requirements completeness — does the documentation accurately and fully capture what was built, and are there any gaps that would leave a future maintainer (or an LLM reading CLAUDE.md) with an incomplete picture? ### What's Well-Done The PR description follows good requirements practice: every documented value is traced back to a specific file and line. This is the EARS principle in documentation form — every claim is verifiable. The DEPLOYMENT.md additions are precise and actionable: - `SENTRY_DSN` — references §4 for when to set it ✅ - `VITE_SENTRY_DSN` — documents build-time injection behavior ✅ - Corrected defaults reduce ambiguity for first-time deployers ✅ `production-compose.md` was carrying a **false requirement** — "not yet deployed, tracked as #498" — when the obs stack has been running. Removing it prevents a future operator from acting on stale guidance. ✅ ### Concern: CLAUDE.md Service Table Incompleteness The CLAUDE.md `### Observability stack` section lists 5 services: | Service | Container | |---------|-----------| | Grafana | obs-grafana | | Prometheus | obs-prometheus | | Loki | obs-loki | | Tempo | obs-tempo | | GlitchTip | obs-glitchtip | The actual obs stack has **9 containers**: the above 5 plus `obs-promtail`, `obs-glitchtip-worker`, `obs-redis`, and `obs-glitchtip-db-init`. This is intentional — CLAUDE.md is a quick-reference for LLMs and developers, not an ops runbook. The 5 services listed are the ones a developer would configure or interact with. The other 4 are infrastructure internals (log shipper, Celery worker, task broker, one-shot DB init). The cross-reference to DEPLOYMENT.md §4 covers the full picture. **However:** the omission of `obs-glitchtip-worker` specifically could cause confusion. If an LLM reads CLAUDE.md and is asked to help with "GlitchTip isn't processing events," it might not know there's a separate worker container to check. A brief parenthetical in the GlitchTip row — e.g., "Error tracking (Sentry-compatible) — web + worker" — would close this gap without cluttering the table. This is a suggestion, not a blocker. The cross-reference to DEPLOYMENT.md is sufficient for a human. ### Minor Ambiguity CLAUDE.md describes Prometheus as `9090 (dev only — 127.0.0.1 bound)`. The `dev only` qualifier is ambiguous: Prometheus runs in both dev and prod, bound to `127.0.0.1` in both. The intent is "not exposed via the reverse proxy in production," but a reader could infer "this container only starts in a dev environment." Suggest: `9090 (localhost-only, not reverse-proxied)`. ### Verdict Both concerns are non-blocking. The core documentation requirement — accurate, verifiable, cross-referenced — is met. Approve with the suggestion to clarify the Prometheus annotation.

marcel merged commit 7712180f3a into main

2026-05-16 11:32:19 +02:00

marcel deleted branch feat/issue-581-obs-docs

2026-05-16 11:32:20 +02:00

marcel referenced this pull request

2026-05-16 14:55:16 +02:00

fix(obs): wire Prometheus metrics and Loki job label for Grafana dashboards #606

marcel referenced this pull request

2026-05-16 14:55:38 +02:00

fix(obs): wire Prometheus metrics and Loki job label for Grafana dashboards #606

Sign in to join this conversation.

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: marcel/familienarchiv#605

docs: document observability stack in DEPLOYMENT.md and CLAUDE.md (#581) #605

Summary

Verification against code

Test plan

🏗️ Markus Keller (@mkeller) — Application Architect

Blockers

Suggestions

What's done well

👨‍💻 Felix Brandt (@felixbrandt) — Senior Fullstack Developer

Blockers

Suggestions

What's done well

🛠️ Tobias Wendt (@tobiwendt) — DevOps & Platform Engineer

Blockers

What I verified

Suggestions

What's done well

📋 Elicit — Requirements Engineer

Blockers

Requirements coverage check

What's done well

🔐 Nora "NullX" Steiner — Application Security Engineer

What I checked

Suggestion (non-blocking)

🧪 Sara Holt (@saraholt) — QA Engineer & Test Strategist

Blockers

Review focus

Suggestion (non-blocking)

🎨 Leonie Voss (@leonievoss) — UX Designer & Accessibility Strategist

Scope note

Documentation UX check

No blockers, no suggestions from my domain.

Review concerns addressed

✅ C4 diagram — stale GlitchTip image tag (0d662f3a)

✅ VITE_SENTRY_DSN added to §2 observability env vars table (58b92043)

✅ VITE_SENTRY_DSN added to §3.3 Gitea secrets table (9d84ebc4)

✅ GLITCHTIP_DOMAIN URL format example in CLAUDE.md (c9a22945)

✅ GRAFANA_ADMIN_PASSWORD generation guidance in CLAUDE.md (7712180f)

🏗️ Markus Keller — Senior Application Architect

Documentation Currency Check

Observations (non-blocking)

👨‍💻 Felix Brandt — Senior Fullstack Developer

What I Checked

Minor Observation

🛠️ Tobias Wendt — DevOps & Platform Engineer

What's Correct

Suggestions (non-blocking)

🔐 Nora "NullX" Steiner — Application Security Engineer

What I Checked

No Blockers

🧪 Sara Holt — QA Engineer & Test Strategist

What I Checked

Observations

🎨 Leonie Voss — UX Designer & Accessibility Strategist

📋 Elicit — Requirements Engineer & Business Analyst

What's Well-Done

Concern: CLAUDE.md Service Table Incompleteness

Minor Ambiguity

Verdict

✅ C4 diagram — stale GlitchTip image tag (`0d662f3a`)

✅ `VITE_SENTRY_DSN` added to §2 observability env vars table (`58b92043`)

✅ `VITE_SENTRY_DSN` added to §3.3 Gitea secrets table (`9d84ebc4`)

✅ `GLITCHTIP_DOMAIN` URL format example in CLAUDE.md (`c9a22945`)

✅ `GRAFANA_ADMIN_PASSWORD` generation guidance in CLAUDE.md (`7712180f`)