feat(observability): add GlitchTip error tracking infrastructure #590
Reference in New Issue
Block a user
Delete Branch "feat/issue-578-glitchtip"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Summary
docker-compose.observability.yml:obs-glitchtip-db-init,obs-redis,obs-glitchtip,obs-glitchtip-workerarchive-dbPostgreSQL (dedicatedglitchtipdatabase, created automatically by init container)redis:7-alpine) used as Celery task queue, internal toobs-netonlyhttp://localhost:${PORT_GLITCHTIP:-3002}127.0.0.1docs/DEPLOYMENT.mdwith first-run steps and env var tabledocs/architecture/c4/l2-containers.pumlwith GlitchTip + Redis containersCloses #578
🤖 Generated with Claude Code
🏗️ Markus Keller — Senior Application Architect
Verdict: 🚫 Changes requested
Blocker — C4 diagram references undefined PlantUML alias
In
docs/architecture/c4/l2-containers.puml, the diff adds:But
obs_glitchtip_workeris never defined as aContainer()element. PlantUML will either silently drop the relationship or render a dangling node. The C4 diagram is broken.Fix — choose one of:
Add a second container for the worker:
And update the existing
glitchtipcontainer description to clarify it's the web process only.Or collapse web + worker into one logical container and remove the
obs_glitchtip_workerRel (documenting the internal relationship in the description instead). This is defensible since both run the same Docker image.What's Done Right
archive-dbinstead of adding a new PostgreSQL container is correct — one DB per operator, not one per service.obs-glitchtip-db-initas a one-shot idempotent init container is the right pattern (not a startup script, not a migration hook in the app).condition: service_completed_successfullyon the init container — correct use of the compose healthcheck DSL for one-shot jobs.Doc Table Pass
l2-containers.puml+DEPLOYMENT.md🔧 Tobias Wendt — DevOps & Platform Engineer
Verdict: ⚠️ Approved with concerns
Concerns (non-blocking, but worth addressing before prod)
1. No healthcheck on
obs-glitchtipGlitchTip's REST API at
/api/0/returns HTTP 200 when ready. Without a healthcheck, operators can't distinguish "container started" from "Django app is accepting requests," anddocker compose psshows no health status. The worker (obs-glitchtip-worker) also has no healthcheck — though Celery workers are harder to health-check, the web process definitely should have one.Suggested fix:
2.
glitchtip/glitchtip:v4is a major version tag, not a patch-pinned versionv4will advance asv4.x.yreleases ship. This is better than:latestbut still moves. In production I'd preferglitchtip/glitchtip:v4.1.4(or whatever the current patch). For now it's acceptable given this is the same version constraint the issue spec requested, and Renovate can track it. Flagging for awareness.What's Done Right
redis:7-alpinepinned ✓postgres:16-alpinepinned (matches the main stack's postgres version) ✓redis-cli ping✓condition: service_healthyon Redis dependency ✓condition: service_completed_successfullyon db-init dependency ✓restart: "no"on the init container — correct ✓127.0.0.1✓glitchtip_datanamed volume for Redis persistence ✓🔒 Nora "NullX" Steiner — Application Security Engineer
Verdict: ⚠️ Approved with concerns
Security Observations
1. GlitchTip on
archiv-netexpands attack surface — informationalobs-glitchtipandobs-glitchtip-workerare both onarchiv-net, which is necessary to reacharchive-dbandmailpit. However, this also means a compromised GlitchTip container can reach every other container onarchiv-net(backend, frontend, MinIO, etc.). This is an acceptable trade-off for a single-operator self-hosted stack, but worth documenting as a known risk.If this becomes a concern later: the DB access could be isolated by creating a dedicated
glitchtip-netthat has only GlitchTip ↔ archive-db connectivity, instead of the fullarchiv-net.2.
POSTGRES_USERshell substitution in db-init command — very low risk, informationalThe command uses:
If
POSTGRES_USERcontained shell metacharacters, this could be a command injection. In practice,POSTGRES_USERis operator-controlled and contains only safe identifiers. The actual risk is negligible — noted for completeness.3. No GlitchTip registration restriction set
GlitchTip (Django) allows open user registration by default unless
REGISTRATION_OPEN=Falseis set. On a family archive where only the admin should access the error tracker, consider adding:This prevents anyone who discovers the port from creating an account.
What's Done Right
SECRET_KEY: ${GLITCHTIP_SECRET_KEY}— no default value, fail-closed ✓127.0.0.1— not internet-reachable ✓DATABASE_URLuses env vars for credentials — no hardcoded secrets ✓EMAIL_URL: smtp://mailpit:1025— dev mail catcher, no credential exposure ✓GLITCHTIP_MAX_EVENT_LIFE_DAYS: 90— data retention limit set, good privacy practice ✓👨💻 Felix Brandt — Senior Fullstack Developer
Verdict: ✅ Approved
What I Checked
Observations
Pure infrastructure — no Java, Svelte, or Python code changed. YAML is clean and readable.
The
obs-*naming prefix is consistent with the rest of the observability stack. The four service names (obs-redis,obs-glitchtip,obs-glitchtip-worker,obs-glitchtip-db-init) are self-documenting.The init container command is a clean idiom:
The idempotency guard (
grep -q 1 || ...) means re-running the stack doesn't fail on an existing database.No TDD concerns — infrastructure configuration, not application logic.
🧪 Sara Holt — QA Engineer & Test Strategist
Verdict: ✅ Approved
What I Checked
Observations
The
obs-glitchtip-db-initinit container is the correct pattern for one-time database setup:restart: "no"— won't loop ✓condition: service_completed_successfully— web container only starts after init exits 0 ✓SELECT 1 FROM pg_database WHERE datname = 'glitchtip') — safe to re-run ✓This is much better than a startup script in the application container or a manual step — it's reproducible and testable.
Startup ordering chain is correct:
Missing: no healthcheck on
obs-glitchtip. This meansdocker compose pscan't tell you if GlitchTip's web process is actually serving requests. Non-blocking for merge but worth adding (Tobias has the suggested test command).First-run documentation in DEPLOYMENT.md is complete — the
createsuperusercommand and project creation steps are clearly documented. ✓🎨 Leonie Voss — UX Design Lead
Verdict: ✅ Approved
What I Checked
Result
This PR modifies only
docker-compose.observability.yml,docs/DEPLOYMENT.md, anddocs/architecture/c4/l2-containers.puml. No Svelte components, frontend routes, or UI code were changed.GlitchTip's own UI is a third-party application — out of scope for this review. The DEPLOYMENT.md documents the first-run setup steps clearly so operators know what to do.
No UX concerns. ✅
📋 Elicit — Requirements Engineer
Verdict: ✅ Approved
Requirements Alignment
Issue #578 called for GlitchTip + worker + Redis, using the existing archive-db PostgreSQL, with a one-shot db-init container. This PR delivers all of it:
obs-rediswith healthcheckobs-glitchtipon port 3002obs-glitchtip-workerobs-glitchtip-db-initDATABASE_URLpoints toarchive-db:5432127.0.0.1:${PORT_GLITCHTIP:-3002}:8080GLITCHTIP_SECRET_KEYandGLITCHTIP_DOMAINfrom.env.examplecreatesuperuser, org + 2 project creationObservation
The acceptance criteria in issue #578 include verifying
curl -s http://localhost:3002/api/0/returns HTTP 200. This is a manual verification step that can't be automated in CI (requires a running stack). The DEPLOYMENT.md documents the first-run steps but doesn't explicitly list this verification — acceptable since it's operator-facing infrastructure.The two-project setup (familienarchiv-frontend + familienarchiv-backend) maps directly to the subsequent issues #579 and #580. The handoff is clean.