devops(observability): add GlitchTip error tracking infrastructure (GlitchTip + worker + Redis) #578

Open
opened 2026-05-14 15:05:09 +02:00 by marcel · 9 comments
Owner

Context

GlitchTip is a lightweight, actively maintained Sentry-compatible error tracker. It receives error events from the SvelteKit frontend (browser exceptions) and the Spring Boot backend (unhandled exceptions), groups them by fingerprint, and provides an issue-list UI with stack traces.

GlitchTip connects to the existing archive-db PostgreSQL instance using a dedicated glitchtip database — no new database container needed. It needs Redis only for its Celery task queue.

Depends on: scaffold issue (compose file must exist); archive-db must be running (main stack must be up)

Services to Add

# docker-compose.observability.yml

redis:
  image: redis:7-alpine
  container_name: obs-redis
  restart: unless-stopped
  expose:
    - "6379"
  networks:
    - obs-net

glitchtip:
  image: glitchtip/glitchtip:latest
  container_name: obs-glitchtip
  restart: unless-stopped
  depends_on:
    - redis
    - glitchtip-db-init
  environment:
    DATABASE_URL: postgresql://${POSTGRES_USER}:${POSTGRES_PASSWORD}@db:5432/glitchtip
    REDIS_URL: redis://redis:6379/0
    SECRET_KEY: ${GLITCHTIP_SECRET_KEY}
    GLITCHTIP_DOMAIN: ${GLITCHTIP_DOMAIN:-http://localhost:3002}
    DEFAULT_FROM_EMAIL: ${APP_MAIL_FROM:-noreply@familienarchiv.local}
    EMAIL_URL: smtp://mailpit:1025
    GLITCHTIP_MAX_EVENT_LIFE_DAYS: 90
  ports:
    - "${PORT_GLITCHTIP:-3002}:8080"
  networks:
    - archiv-net   # reach archive-db and mailpit by container name
    - obs-net

glitchtip-worker:
  image: glitchtip/glitchtip:latest
  container_name: obs-glitchtip-worker
  restart: unless-stopped
  command: ./bin/run-celery-with-beat.sh
  depends_on:
    - redis
  environment:
    DATABASE_URL: postgresql://${POSTGRES_USER}:${POSTGRES_PASSWORD}@db:5432/glitchtip
    REDIS_URL: redis://redis:6379/0
    SECRET_KEY: ${GLITCHTIP_SECRET_KEY}
  networks:
    - archiv-net
    - obs-net

glitchtip-db-init:
  image: postgres:16-alpine
  container_name: obs-glitchtip-db-init
  restart: "no"
  environment:
    PGPASSWORD: ${POSTGRES_PASSWORD}
  command: >
    psql -h db -U ${POSTGRES_USER} -tc
    "SELECT 1 FROM pg_database WHERE datname = 'glitchtip'" |
    grep -q 1 ||
    psql -h db -U ${POSTGRES_USER} -c "CREATE DATABASE glitchtip;"
  networks:
    - archiv-net

Acceptance Criteria

  • docker compose -f docker-compose.observability.yml up -d redis glitchtip glitchtip-worker starts all containers without error
  • docker exec archive-db psql -U $POSTGRES_USER -c '\l' shows a glitchtip database
  • GlitchTip UI accessible at http://localhost:3002
  • curl -s http://localhost:3002/api/0/ returns HTTP 200
  • A superuser can be created: docker exec obs-glitchtip ./manage.py createsuperuser
  • After logging in, a new Organization and a new Project (type: Django) can be created and a DSN copied from project settings — this DSN will be used in the backend error tracking issue
  • A second Project (type: JavaScript) can be created and a DSN copied — this DSN will be used in the frontend error tracking issue

First-Run Steps (document in commit message or PR body)

# 1. Start the stack
docker compose -f docker-compose.observability.yml up -d

# 2. Create superuser
docker exec -it obs-glitchtip ./manage.py createsuperuser

# 3. Open http://localhost:3002, log in, create org + two projects (backend + frontend)

# 4. Copy both DSNs into .env:
#    SENTRY_DSN=http://<key>@localhost:3002/<project-id>
#    VITE_SENTRY_DSN=http://<key>@localhost:3002/<project-id>

Definition of Done

  • All acceptance criteria checked
  • Committed on a feature branch, PR opened against main
## Context GlitchTip is a lightweight, actively maintained Sentry-compatible error tracker. It receives error events from the SvelteKit frontend (browser exceptions) and the Spring Boot backend (unhandled exceptions), groups them by fingerprint, and provides an issue-list UI with stack traces. GlitchTip connects to the existing `archive-db` PostgreSQL instance using a dedicated `glitchtip` database — no new database container needed. It needs Redis only for its Celery task queue. **Depends on:** scaffold issue (compose file must exist); `archive-db` must be running (main stack must be up) ## Services to Add ```yaml # docker-compose.observability.yml redis: image: redis:7-alpine container_name: obs-redis restart: unless-stopped expose: - "6379" networks: - obs-net glitchtip: image: glitchtip/glitchtip:latest container_name: obs-glitchtip restart: unless-stopped depends_on: - redis - glitchtip-db-init environment: DATABASE_URL: postgresql://${POSTGRES_USER}:${POSTGRES_PASSWORD}@db:5432/glitchtip REDIS_URL: redis://redis:6379/0 SECRET_KEY: ${GLITCHTIP_SECRET_KEY} GLITCHTIP_DOMAIN: ${GLITCHTIP_DOMAIN:-http://localhost:3002} DEFAULT_FROM_EMAIL: ${APP_MAIL_FROM:-noreply@familienarchiv.local} EMAIL_URL: smtp://mailpit:1025 GLITCHTIP_MAX_EVENT_LIFE_DAYS: 90 ports: - "${PORT_GLITCHTIP:-3002}:8080" networks: - archiv-net # reach archive-db and mailpit by container name - obs-net glitchtip-worker: image: glitchtip/glitchtip:latest container_name: obs-glitchtip-worker restart: unless-stopped command: ./bin/run-celery-with-beat.sh depends_on: - redis environment: DATABASE_URL: postgresql://${POSTGRES_USER}:${POSTGRES_PASSWORD}@db:5432/glitchtip REDIS_URL: redis://redis:6379/0 SECRET_KEY: ${GLITCHTIP_SECRET_KEY} networks: - archiv-net - obs-net glitchtip-db-init: image: postgres:16-alpine container_name: obs-glitchtip-db-init restart: "no" environment: PGPASSWORD: ${POSTGRES_PASSWORD} command: > psql -h db -U ${POSTGRES_USER} -tc "SELECT 1 FROM pg_database WHERE datname = 'glitchtip'" | grep -q 1 || psql -h db -U ${POSTGRES_USER} -c "CREATE DATABASE glitchtip;" networks: - archiv-net ``` ## Acceptance Criteria - [ ] `docker compose -f docker-compose.observability.yml up -d redis glitchtip glitchtip-worker` starts all containers without error - [ ] `docker exec archive-db psql -U $POSTGRES_USER -c '\l'` shows a `glitchtip` database - [ ] GlitchTip UI accessible at `http://localhost:3002` - [ ] `curl -s http://localhost:3002/api/0/` returns HTTP 200 - [ ] A superuser can be created: `docker exec obs-glitchtip ./manage.py createsuperuser` - [ ] After logging in, a new **Organization** and a new **Project** (type: Django) can be created and a DSN copied from project settings — this DSN will be used in the backend error tracking issue - [ ] A second **Project** (type: JavaScript) can be created and a DSN copied — this DSN will be used in the frontend error tracking issue ## First-Run Steps (document in commit message or PR body) ```bash # 1. Start the stack docker compose -f docker-compose.observability.yml up -d # 2. Create superuser docker exec -it obs-glitchtip ./manage.py createsuperuser # 3. Open http://localhost:3002, log in, create org + two projects (backend + frontend) # 4. Copy both DSNs into .env: # SENTRY_DSN=http://<key>@localhost:3002/<project-id> # VITE_SENTRY_DSN=http://<key>@localhost:3002/<project-id> ``` ## Definition of Done - All acceptance criteria checked - Committed on a feature branch, PR opened against `main`
marcel added this to the Observability Stack — Grafana LGTM + GlitchTip milestone 2026-05-14 15:05:09 +02:00
marcel added the P2-mediumdevopsphase-7: monitoring labels 2026-05-14 15:06:13 +02:00
Author
Owner

🔧 Tobias Wendt — DevOps & Platform Engineer

Observations

  • :latest tag on GlitchTipimage: glitchtip/glitchtip:latest appears three times (web, worker, db-init). This is a production-bound service in a milestone called "Observability Stack". The ToBI persona rule is clear: :latest is not a version. GlitchTip tags releases as v4.x.y — pin to a specific version now.
  • glitchtip-db-init uses postgres:16-alpine — the version is pinned here (good), but the container runs a psql command against archive-db. That command must succeed before GlitchTip can start. The depends_on: glitchtip-db-init chain only works if the init container exits 0 — but if archive-db isn't on obs-net it will fail silently. The init container is on archiv-net only, which is correct since archive-db is also on archiv-net. That works, but the container name archive-db relies on the main stack being up — a runtime dependency that the compose file cannot enforce.
  • No healthcheck on GlitchTip — both glitchtip and glitchtip-worker lack healthchecks. GlitchTip exposes /api/0/ which returns 200. A simple curl -sf http://localhost:8080/api/0/ covers it. The worker has no HTTP interface — a celery inspect ping would work but adds celery-cli overhead; pg_isready against the DB is acceptable as a proxy.
  • Redis has no healthcheck — a redis-cli ping check is trivial and prevents glitchtip from starting before Redis is ready. depends_on: redis without a condition is a startup race.
  • No named volume for GlitchTip — GlitchTip stores attachments and avatars locally by default. If the container is recreated, that data is lost. The issue says "connected to archive-db PostgreSQL" — verify that GlitchTip is configured with DISABLE_COLLECTSTATIC=1 or has an explicit MEDIA_ROOT volume if attachments need to survive.
  • obs-net isn't declared — the compose snippet defines two networks (archiv-net and obs-net) and places services on both, but the snippet doesn't include a networks: top-level declaration. If obs-net is a new network it must be declared. If it's an external network shared across stacks, it needs external: true.
  • Port binding is 0.0.0.0"${PORT_GLITCHTIP:-3002}:8080" binds to all interfaces. In production this should be 127.0.0.1:${PORT_GLITCHTIP:-3002}:8080 (behind Caddy). The issue is a dev-first issue, so 0.0.0.0 is acceptable for now — but a note in the commit message or PR body to restrict this before the prod compose file is written would prevent it being copy-pasted as-is.
  • ADR-009 pattern — the project uses standalone compose files, not overlays (ADR-009). docker-compose.observability.yml is a standalone file — consistent with the pattern. Good.
  • No restart: "no" on db-init — the issue shows restart: "no" (correct). That's good — one-shot containers should not restart on failure.

Recommendations

  • Pin the GlitchTip image: image: glitchtip/glitchtip:v4.0.4 (or whatever the current stable release is). This is not negotiable for a service that stores error telemetry.
  • Add a healthcheck to the redis service: test: ["CMD", "redis-cli", "ping"], interval: 5s, timeout: 3s, retries: 5. Then update glitchtip.depends_on.redis to condition: service_healthy.
  • Add a healthcheck to the glitchtip service: test: ["CMD-SHELL", "curl -sf http://localhost:8080/api/0/ || exit 1"]. Update glitchtip-worker.depends_on.glitchtip to condition: service_healthy.
  • Declare obs-net at the top level of the compose file. If it's meant to be the new observability-stack-internal network, add obs-net: driver: bridge. If GlitchTip only needs to reach archive-db and mailpit, it only needs archiv-net — the obs-net is redundant.
  • Add a volume for GlitchTip media if attachments are expected: glitchtip_media:/code/uploads (check GlitchTip docs for the exact mount path).
  • Document in the PR description that PORT_GLITCHTIP should be bound to 127.0.0.1 in the production compose overlay.
  • Update docs/architecture/c4/l2-containers.puml and docs/DEPLOYMENT.md — a new container was added (GlitchTip). Per the CLAUDE.md doc update table, this is required.
## 🔧 Tobias Wendt — DevOps & Platform Engineer ### Observations - **`:latest` tag on GlitchTip** — `image: glitchtip/glitchtip:latest` appears three times (web, worker, db-init). This is a production-bound service in a milestone called "Observability Stack". The ToBI persona rule is clear: `:latest` is not a version. GlitchTip tags releases as `v4.x.y` — pin to a specific version now. - **`glitchtip-db-init` uses `postgres:16-alpine`** — the version is pinned here (good), but the container runs a `psql` command against `archive-db`. That command must succeed before GlitchTip can start. The `depends_on: glitchtip-db-init` chain only works if the init container exits 0 — but if `archive-db` isn't on `obs-net` it will fail silently. The init container is on `archiv-net` only, which is correct since `archive-db` is also on `archiv-net`. That works, but the container name `archive-db` relies on the main stack being up — a runtime dependency that the compose file cannot enforce. - **No healthcheck on GlitchTip** — both `glitchtip` and `glitchtip-worker` lack healthchecks. GlitchTip exposes `/api/0/` which returns 200. A simple `curl -sf http://localhost:8080/api/0/` covers it. The worker has no HTTP interface — a `celery inspect ping` would work but adds celery-cli overhead; `pg_isready` against the DB is acceptable as a proxy. - **Redis has no healthcheck** — a `redis-cli ping` check is trivial and prevents `glitchtip` from starting before Redis is ready. `depends_on: redis` without a condition is a startup race. - **No named volume for GlitchTip** — GlitchTip stores attachments and avatars locally by default. If the container is recreated, that data is lost. The issue says "connected to `archive-db` PostgreSQL" — verify that GlitchTip is configured with `DISABLE_COLLECTSTATIC=1` or has an explicit `MEDIA_ROOT` volume if attachments need to survive. - **`obs-net` isn't declared** — the compose snippet defines two networks (`archiv-net` and `obs-net`) and places services on both, but the snippet doesn't include a `networks:` top-level declaration. If `obs-net` is a new network it must be declared. If it's an external network shared across stacks, it needs `external: true`. - **Port binding is `0.0.0.0`** — `"${PORT_GLITCHTIP:-3002}:8080"` binds to all interfaces. In production this should be `127.0.0.1:${PORT_GLITCHTIP:-3002}:8080` (behind Caddy). The issue is a dev-first issue, so `0.0.0.0` is acceptable for now — but a note in the commit message or PR body to restrict this before the prod compose file is written would prevent it being copy-pasted as-is. - **ADR-009 pattern** — the project uses standalone compose files, not overlays (ADR-009). `docker-compose.observability.yml` is a standalone file — consistent with the pattern. Good. - **No `restart: "no"` on db-init** — the issue shows `restart: "no"` (correct). That's good — one-shot containers should not restart on failure. ### Recommendations - Pin the GlitchTip image: `image: glitchtip/glitchtip:v4.0.4` (or whatever the current stable release is). This is not negotiable for a service that stores error telemetry. - Add a healthcheck to the `redis` service: `test: ["CMD", "redis-cli", "ping"]`, `interval: 5s`, `timeout: 3s`, `retries: 5`. Then update `glitchtip.depends_on.redis` to `condition: service_healthy`. - Add a healthcheck to the `glitchtip` service: `test: ["CMD-SHELL", "curl -sf http://localhost:8080/api/0/ || exit 1"]`. Update `glitchtip-worker.depends_on.glitchtip` to `condition: service_healthy`. - Declare `obs-net` at the top level of the compose file. If it's meant to be the new observability-stack-internal network, add `obs-net: driver: bridge`. If GlitchTip only needs to reach `archive-db` and `mailpit`, it only needs `archiv-net` — the `obs-net` is redundant. - Add a volume for GlitchTip media if attachments are expected: `glitchtip_media:/code/uploads` (check GlitchTip docs for the exact mount path). - Document in the PR description that `PORT_GLITCHTIP` should be bound to `127.0.0.1` in the production compose overlay. - Update `docs/architecture/c4/l2-containers.puml` and `docs/DEPLOYMENT.md` — a new container was added (GlitchTip). Per the CLAUDE.md doc update table, this is required.
Author
Owner

🏛️ Markus Keller — Application Architect

Observations

  • Redis addition is the real cost of this issue — GlitchTip itself is a single-container Django app that could run without a broker if using database-backed task queues. However, GlitchTip's architecture requires Celery, and Celery requires a broker. The issue correctly identifies this. From an architecture standpoint, this is a new infrastructure dependency — Redis — that the current stack does not have. The question is whether the team is comfortable owning Redis long-term, or whether a database-backed alternative exists.
  • "Two networks" design is overcomplicated — the compose snippet puts services on both archiv-net and obs-net. Looking at the actual dependencies: GlitchTip needs to reach archive-db (on archiv-net) and mailpit (on archiv-net). Redis is internal to the observability stack. The worker needs Redis and the DB. The clearest design is: put everything on archiv-net. obs-net adds isolation that serves no present purpose and adds mental overhead when reading the file.
  • Shared database instance design — using the existing archive-db PostgreSQL container for the glitchtip database is the architecturally sound choice for this scale. The self-hosted-catalogue.md explicitly endorses this pattern. The glitchtip-db-init container handles the CREATE DATABASE idempotently. This is consistent with the project's "boring technology wins" principle.
  • ADR needed — adding Redis is a significant infrastructure addition. ADR-009 covers the standalone compose pattern. ADR-010 covers MinIO retention. This decision — "add Redis as a Celery broker for GlitchTip, accept Redis as a new runtime dependency" — warrants an ADR. The alternatives to document: (a) in-memory broker (not durable, data loss on restart), (b) database-backed broker via django-celery-results + PostgreSQL (no Redis needed, but GlitchTip doesn't support this out of the box), (c) Redis as chosen (durable, standard, operationally simple). The next ADR number is 015.
  • glitchtip-db-init external dependency — the init container depends on archive-db by DNS name, but archive-db is in the main stack (docker-compose.yml). This creates a cross-stack runtime dependency with no compose-level enforcement. The acceptance criteria require "main stack must be up" — that's documented as a precondition, which is acceptable. But it means docker compose -f docker-compose.observability.yml up will fail silently if the main stack is down, rather than failing with a clear error. Worth noting in the PR.
  • l2-containers.puml update required — per CLAUDE.md's doc update table, adding a new Docker service requires updating docs/architecture/c4/l2-containers.puml. GlitchTip is a new external-facing container.

Recommendations

  • Write ADR-015 before implementing: "Add Redis as Celery broker for GlitchTip". Document: context (GlitchTip requires Celery, Celery needs a broker), decision (Redis 7-alpine, minimal config), alternatives considered (in-memory broker: rejected — not durable; PostgreSQL broker: rejected — not supported by GlitchTip), consequences (Redis adds ~30MB RAM, 1 new failure mode, but is operationally trivial and broadly understood).
  • Collapse obs-net — use only archiv-net. There's no isolation benefit to a second network here. Redis should only be reachable by glitchtip and glitchtip-worker, which expose (not ports) already handles.
  • Verify whether GlitchTip's Celery beat (the run-celery-with-beat.sh command) needs a persistent volume for its beat schedule. If the worker is recreated, the schedule resets. For error tracking this is usually acceptable, but it's worth checking.
  • Update docs/architecture/c4/l2-containers.puml to add GlitchTip as a new container in the system boundary, and update docs/DEPLOYMENT.md with the startup sequence.
## 🏛️ Markus Keller — Application Architect ### Observations - **Redis addition is the real cost of this issue** — GlitchTip itself is a single-container Django app that could run without a broker if using database-backed task queues. However, GlitchTip's architecture requires Celery, and Celery requires a broker. The issue correctly identifies this. From an architecture standpoint, this is a new infrastructure dependency — Redis — that the current stack does not have. The question is whether the team is comfortable owning Redis long-term, or whether a database-backed alternative exists. - **"Two networks" design is overcomplicated** — the compose snippet puts services on both `archiv-net` and `obs-net`. Looking at the actual dependencies: GlitchTip needs to reach `archive-db` (on `archiv-net`) and `mailpit` (on `archiv-net`). Redis is internal to the observability stack. The worker needs Redis and the DB. The clearest design is: put everything on `archiv-net`. `obs-net` adds isolation that serves no present purpose and adds mental overhead when reading the file. - **Shared database instance design** — using the existing `archive-db` PostgreSQL container for the `glitchtip` database is the architecturally sound choice for this scale. The `self-hosted-catalogue.md` explicitly endorses this pattern. The `glitchtip-db-init` container handles the CREATE DATABASE idempotently. This is consistent with the project's "boring technology wins" principle. - **ADR needed** — adding Redis is a significant infrastructure addition. ADR-009 covers the standalone compose pattern. ADR-010 covers MinIO retention. This decision — "add Redis as a Celery broker for GlitchTip, accept Redis as a new runtime dependency" — warrants an ADR. The alternatives to document: (a) in-memory broker (not durable, data loss on restart), (b) database-backed broker via `django-celery-results` + PostgreSQL (no Redis needed, but GlitchTip doesn't support this out of the box), (c) Redis as chosen (durable, standard, operationally simple). The next ADR number is **015**. - **`glitchtip-db-init` external dependency** — the init container depends on `archive-db` by DNS name, but `archive-db` is in the main stack (`docker-compose.yml`). This creates a cross-stack runtime dependency with no compose-level enforcement. The acceptance criteria require "main stack must be up" — that's documented as a precondition, which is acceptable. But it means `docker compose -f docker-compose.observability.yml up` will fail silently if the main stack is down, rather than failing with a clear error. Worth noting in the PR. - **`l2-containers.puml` update required** — per CLAUDE.md's doc update table, adding a new Docker service requires updating `docs/architecture/c4/l2-containers.puml`. GlitchTip is a new external-facing container. ### Recommendations - Write **ADR-015** before implementing: "Add Redis as Celery broker for GlitchTip". Document: context (GlitchTip requires Celery, Celery needs a broker), decision (Redis 7-alpine, minimal config), alternatives considered (in-memory broker: rejected — not durable; PostgreSQL broker: rejected — not supported by GlitchTip), consequences (Redis adds ~30MB RAM, 1 new failure mode, but is operationally trivial and broadly understood). - Collapse `obs-net` — use only `archiv-net`. There's no isolation benefit to a second network here. Redis should only be reachable by `glitchtip` and `glitchtip-worker`, which `expose` (not `ports`) already handles. - Verify whether GlitchTip's Celery beat (the `run-celery-with-beat.sh` command) needs a persistent volume for its beat schedule. If the worker is recreated, the schedule resets. For error tracking this is usually acceptable, but it's worth checking. - Update `docs/architecture/c4/l2-containers.puml` to add GlitchTip as a new container in the system boundary, and update `docs/DEPLOYMENT.md` with the startup sequence.
Author
Owner

🔒 Nora "NullX" Steiner — Application Security Engineer

Observations

  • GLITCHTIP_SECRET_KEY — hardcoded fallback risk — the compose snippet uses SECRET_KEY: ${GLITCHTIP_SECRET_KEY} with no default. This is correct — do not add a default. If the env var is missing, Django should fail to start rather than use a weak key. Verify that GlitchTip does indeed fail loudly when SECRET_KEY is empty (most Django apps do — confirm by checking GlitchTip's startup behavior). If it silently uses an empty string, that's a critical signing-key vulnerability: session cookies and CSRF tokens would be forgeable.
  • Port binding exposes GlitchTip directly"${PORT_GLITCHTIP:-3002}:8080" binds to 0.0.0.0:3002 by default. GlitchTip's admin panel is at /admin/ and gives full access to all error events, which include stack traces, request parameters, and potentially session tokens or credentials that appear in error reports. In a dev environment this is acceptable; in production it must be 127.0.0.1:${PORT_GLITCHTIP}:8080 behind Caddy with authentication. The issue's ACs don't mention access control — this should be an explicit AC for the production phase.
  • Error events may contain sensitive data — stack traces from the Spring Boot backend can include database connection strings, user data from request parameters, and internal service URLs. GlitchTip should be configured with GLITCHTIP_MAX_EVENT_LIFE_DAYS: 90 (already in the issue — good) and data scrubbing rules. The official Sentry/GlitchTip SDK supports before_send hooks to strip PII before transmission. This is a follow-up concern, but it's worth noting now: the DSN issue (the next issue that uses these DSNs) should include PII scrubbing in its ACs.
  • Redis has no authentication configured — the snippet uses redis://redis:6379/0 with no password. Since Redis is only on the internal Docker network (expose, not ports), this is acceptable for a dev setup. In production, Redis should either have a password (redis://:password@redis:6379/0) or be network-isolated. The current expose-only approach is the right minimal fix — but document it explicitly in the ADR or PR so a future operator doesn't accidentally add ports: to Redis.
  • glitchtip-db-init runs psql with PGPASSWORD — the environment variable PGPASSWORD: ${POSTGRES_PASSWORD} is the standard way to pass PostgreSQL passwords to psql. This is fine. The alternative (-W interactive prompt) doesn't work in Docker. No issue here.
  • DATABASE_URL contains credentials in plaintextpostgresql://${POSTGRES_USER}:${POSTGRES_PASSWORD}@db:5432/glitchtip is a URL with embedded credentials. This is the standard GlitchTip configuration. The risk is that if these values appear in container inspect output or logs, credentials leak. Mitigation: use Docker secrets in production. For the scope of this issue (dev stack), using env vars is acceptable.
  • GlitchTip Django admin — the first-run step creates a superuser via ./manage.py createsuperuser. This creates a Django admin account separate from the GlitchTip UI account. The Django admin at /admin/ exposes raw database access. The PR body should note that the Django admin path should be blocked at the Caddy level in production (respond 404 to glitchtip.example.com/admin/ from external access) — only the GlitchTip UI at / needs to be publicly accessible.

Recommendations

  • Add an explicit note in the issue or PR body: "In production, bind PORT_GLITCHTIP to 127.0.0.1 and front with Caddy; block /admin/ at the proxy level."
  • Confirm that GlitchTip fails loudly on empty SECRET_KEY — check the Docker startup logs when GLITCHTIP_SECRET_KEY is unset. If it starts with an empty key, the GLITCHTIP_SECRET_KEY env var needs a mandatory check (same pattern as IMPORT_HOST_DIR in the prod compose: ${GLITCHTIP_SECRET_KEY:?Set GLITCHTIP_SECRET_KEY}).
  • Add to the follow-up DSN integration issues: implement before_send PII scrubbing in both the Sentry-Java SDK (Spring Boot) and @sentry/sveltekit to strip email addresses, user IDs, and request bodies from error events before they're stored in GlitchTip.
  • No Redis password is needed for this dev-scope issue, but document in the ADR that production should either add requirepass to Redis config or keep it expose-only (not ports-exposed) behind the Docker network boundary.
## 🔒 Nora "NullX" Steiner — Application Security Engineer ### Observations - **`GLITCHTIP_SECRET_KEY` — hardcoded fallback risk** — the compose snippet uses `SECRET_KEY: ${GLITCHTIP_SECRET_KEY}` with no default. This is correct — do not add a default. If the env var is missing, Django should fail to start rather than use a weak key. Verify that GlitchTip does indeed fail loudly when `SECRET_KEY` is empty (most Django apps do — confirm by checking GlitchTip's startup behavior). If it silently uses an empty string, that's a critical signing-key vulnerability: session cookies and CSRF tokens would be forgeable. - **Port binding exposes GlitchTip directly** — `"${PORT_GLITCHTIP:-3002}:8080"` binds to `0.0.0.0:3002` by default. GlitchTip's admin panel is at `/admin/` and gives full access to all error events, which include stack traces, request parameters, and potentially session tokens or credentials that appear in error reports. In a dev environment this is acceptable; in production it must be `127.0.0.1:${PORT_GLITCHTIP}:8080` behind Caddy with authentication. The issue's ACs don't mention access control — this should be an explicit AC for the production phase. - **Error events may contain sensitive data** — stack traces from the Spring Boot backend can include database connection strings, user data from request parameters, and internal service URLs. GlitchTip should be configured with `GLITCHTIP_MAX_EVENT_LIFE_DAYS: 90` (already in the issue — good) and data scrubbing rules. The official Sentry/GlitchTip SDK supports `before_send` hooks to strip PII before transmission. This is a follow-up concern, but it's worth noting now: the DSN issue (the next issue that uses these DSNs) should include PII scrubbing in its ACs. - **Redis has no authentication configured** — the snippet uses `redis://redis:6379/0` with no password. Since Redis is only on the internal Docker network (`expose`, not `ports`), this is acceptable for a dev setup. In production, Redis should either have a password (`redis://:password@redis:6379/0`) or be network-isolated. The current `expose`-only approach is the right minimal fix — but document it explicitly in the ADR or PR so a future operator doesn't accidentally add `ports:` to Redis. - **`glitchtip-db-init` runs psql with `PGPASSWORD`** — the environment variable `PGPASSWORD: ${POSTGRES_PASSWORD}` is the standard way to pass PostgreSQL passwords to `psql`. This is fine. The alternative (`-W` interactive prompt) doesn't work in Docker. No issue here. - **`DATABASE_URL` contains credentials in plaintext** — `postgresql://${POSTGRES_USER}:${POSTGRES_PASSWORD}@db:5432/glitchtip` is a URL with embedded credentials. This is the standard GlitchTip configuration. The risk is that if these values appear in container inspect output or logs, credentials leak. Mitigation: use Docker secrets in production. For the scope of this issue (dev stack), using env vars is acceptable. - **GlitchTip Django admin** — the first-run step creates a superuser via `./manage.py createsuperuser`. This creates a Django admin account separate from the GlitchTip UI account. The Django admin at `/admin/` exposes raw database access. The PR body should note that the Django admin path should be blocked at the Caddy level in production (respond 404 to `glitchtip.example.com/admin/` from external access) — only the GlitchTip UI at `/` needs to be publicly accessible. ### Recommendations - Add an explicit note in the issue or PR body: "In production, bind `PORT_GLITCHTIP` to `127.0.0.1` and front with Caddy; block `/admin/` at the proxy level." - Confirm that GlitchTip fails loudly on empty `SECRET_KEY` — check the Docker startup logs when `GLITCHTIP_SECRET_KEY` is unset. If it starts with an empty key, the `GLITCHTIP_SECRET_KEY` env var needs a mandatory check (same pattern as `IMPORT_HOST_DIR` in the prod compose: `${GLITCHTIP_SECRET_KEY:?Set GLITCHTIP_SECRET_KEY}`). - Add to the follow-up DSN integration issues: implement `before_send` PII scrubbing in both the Sentry-Java SDK (Spring Boot) and `@sentry/sveltekit` to strip email addresses, user IDs, and request bodies from error events before they're stored in GlitchTip. - No Redis password is needed for this dev-scope issue, but document in the ADR that production should either add `requirepass` to Redis config or keep it `expose`-only (not `ports`-exposed) behind the Docker network boundary.
Author
Owner

👨‍💻 Felix Brandt — Senior Fullstack Developer

Observations

This issue is pure infrastructure — no application code changes. From a developer ergonomics perspective, the main concern is: does this compose file integrate cleanly into the dev workflow, and does it set up the right plumbing for the backend and frontend error tracking issues that will follow?

  • glitchtip-db-init command is fragile — the multi-line command: block in the compose snippet has a YAML quoting issue. The > block scalar folds newlines into spaces, which means the grep -q 1 || check and the final psql call become a single space-separated string. This may work in some shells but is brittle. A cleaner approach: use a dedicated sh -c entrypoint with explicit semicolons, or use the command: ["sh", "-c", "..."] list form which is unambiguous.
  • No .env.example addition shown — the issue introduces three new required env vars: GLITCHTIP_SECRET_KEY, GLITCHTIP_DOMAIN, and PORT_GLITCHTIP. The project almost certainly has a .env.example file (or equivalent). These vars need to be added there, otherwise the next developer who clones the repo and runs the observability stack will get a confusing failure with no guidance.
  • First-run steps belong in docs/DEPLOYMENT.md — the issue says "document in commit message or PR body." The createsuperuser step is a persistent operational procedure, not a one-time commit note. It should live in docs/DEPLOYMENT.md under an "Observability Stack" section, not buried in git history.
  • docker exec -it obs-glitchtip requires an interactive terminal — the first-run step uses -it. In automated contexts (CI, scripted deploys) this fails. For the createsuperuser step, this is acceptable since it's a one-time interactive procedure. But the acceptance criterion says "a superuser can be created" — not that it happens automatically. That's fine for this scope.
  • Acceptance criterion completeness — the ACs are concrete and verifiable. The curl -s http://localhost:3002/api/0/ check is a good smoke test. The DSN copy steps are documented as manual procedures. The ACs are well-structured for a DevOps issue.

Recommendations

  • Fix the glitchtip-db-init command using the list form to avoid YAML folding ambiguity:
    command:
      - sh
      - -c
      - |
        psql -h db -U ${POSTGRES_USER} -tc "SELECT 1 FROM pg_database WHERE datname = 'glitchtip'" | grep -q 1 || psql -h db -U ${POSTGRES_USER} -c "CREATE DATABASE glitchtip;"
    
  • Add GLITCHTIP_SECRET_KEY, GLITCHTIP_DOMAIN, and PORT_GLITCHTIP to .env.example with documented values (e.g. GLITCHTIP_SECRET_KEY=change-me-in-production, PORT_GLITCHTIP=3002).
  • Move the first-run steps from "commit message or PR body" to docs/DEPLOYMENT.md. Commit messages are not searchable operational runbooks.
  • No new Spring Boot or SvelteKit code in this issue — this is correct scope. The SDK wiring belongs in the follow-up issues that reference the DSNs.
## 👨‍💻 Felix Brandt — Senior Fullstack Developer ### Observations This issue is pure infrastructure — no application code changes. From a developer ergonomics perspective, the main concern is: does this compose file integrate cleanly into the dev workflow, and does it set up the right plumbing for the backend and frontend error tracking issues that will follow? - **`glitchtip-db-init` command is fragile** — the multi-line `command:` block in the compose snippet has a YAML quoting issue. The `>` block scalar folds newlines into spaces, which means the `grep -q 1 ||` check and the final `psql` call become a single space-separated string. This _may_ work in some shells but is brittle. A cleaner approach: use a dedicated `sh -c` entrypoint with explicit semicolons, or use the `command: ["sh", "-c", "..."]` list form which is unambiguous. - **No `.env.example` addition shown** — the issue introduces three new required env vars: `GLITCHTIP_SECRET_KEY`, `GLITCHTIP_DOMAIN`, and `PORT_GLITCHTIP`. The project almost certainly has a `.env.example` file (or equivalent). These vars need to be added there, otherwise the next developer who clones the repo and runs the observability stack will get a confusing failure with no guidance. - **First-run steps belong in `docs/DEPLOYMENT.md`** — the issue says "document in commit message or PR body." The `createsuperuser` step is a persistent operational procedure, not a one-time commit note. It should live in `docs/DEPLOYMENT.md` under an "Observability Stack" section, not buried in git history. - **`docker exec -it obs-glitchtip` requires an interactive terminal** — the first-run step uses `-it`. In automated contexts (CI, scripted deploys) this fails. For the `createsuperuser` step, this is acceptable since it's a one-time interactive procedure. But the acceptance criterion says "a superuser _can_ be created" — not that it happens automatically. That's fine for this scope. - **Acceptance criterion completeness** — the ACs are concrete and verifiable. The `curl -s http://localhost:3002/api/0/` check is a good smoke test. The DSN copy steps are documented as manual procedures. The ACs are well-structured for a DevOps issue. ### Recommendations - Fix the `glitchtip-db-init` command using the list form to avoid YAML folding ambiguity: ```yaml command: - sh - -c - | psql -h db -U ${POSTGRES_USER} -tc "SELECT 1 FROM pg_database WHERE datname = 'glitchtip'" | grep -q 1 || psql -h db -U ${POSTGRES_USER} -c "CREATE DATABASE glitchtip;" ``` - Add `GLITCHTIP_SECRET_KEY`, `GLITCHTIP_DOMAIN`, and `PORT_GLITCHTIP` to `.env.example` with documented values (e.g. `GLITCHTIP_SECRET_KEY=change-me-in-production`, `PORT_GLITCHTIP=3002`). - Move the first-run steps from "commit message or PR body" to `docs/DEPLOYMENT.md`. Commit messages are not searchable operational runbooks. - No new Spring Boot or SvelteKit code in this issue — this is correct scope. The SDK wiring belongs in the follow-up issues that reference the DSNs.
Author
Owner

🧪 Sara Holt — QA Engineer & Test Strategist

Observations

This is an infrastructure-only issue with no application code to test at the unit or integration layer. The relevant test layer is smoke/E2E verification of the running stack.

  • Acceptance criteria are verifiable but not automated — the ACs are good for manual sign-off. None of them are tied to an automated CI check. For a dev-environment observability setup, that's acceptable — you don't run GlitchTip in CI. But the ACs should be explicit about being manual verification steps.
  • curl -s http://localhost:3002/api/0/ returning HTTP 200 is the right smoke test — this is testable and unambiguous. However, the existing E2E test suite (npm run test:e2e) does not start the observability stack. If GlitchTip integration (DSN wiring) is added to the backend/frontend in follow-up issues, those integration tests must not depend on GlitchTip being up — the DSN env vars should be optional/no-op when absent. Otherwise CI breaks when docker-compose.observability.yml isn't running.
  • The "depends on main stack must be up" precondition is untested — there's no test or CI gate that verifies glitchtip-db-init succeeds when archive-db is running and fails gracefully when it's not. This is a runtime risk: if someone runs docker compose -f docker-compose.observability.yml up without the main stack, the failure message from psql will be a connection-refused error with no indication of why. A comment in the compose file explaining the dependency is the minimum mitigation.
  • No healthcheck means ACs can pass prematurely — the curl -s http://localhost:3002/api/0/ AC can be checked while GlitchTip is still running database migrations. GlitchTip returns 500s during migration. The AC should specify "after docker compose reports all containers healthy" — but without a healthcheck on the glitchtip service, docker compose up --wait never waits for it.
  • DSN verification is manual — the ACs require a human to log in and copy a DSN. This is appropriate for a one-time infrastructure setup step. No automation is needed here.

Recommendations

  • Add to the ACs: "Smoke test passes only after all services report healthy (docker compose -f docker-compose.observability.yml ps shows no containers in Starting state)."
  • Add a note to the compose file: # Requires the main stack (docker-compose.yml) to be running — archive-db must be reachable on archiv-net. This makes the precondition explicit at the point where it matters.
  • Ensure that follow-up issues (backend and frontend DSN wiring) include an AC: "DSN env var is optional — when absent, error tracking is silently disabled (no startup failure, no test failures in CI)." The Sentry SDK supports SENTRY_DSN="" to disable itself — verify this works for GlitchTip's DSN format.
  • The test pyramid impact of this issue is zero (no new unit/integration tests needed). The observability stack is a dev-only tool. Mark the ACs as manual verification — that's the right test strategy here.
## 🧪 Sara Holt — QA Engineer & Test Strategist ### Observations This is an infrastructure-only issue with no application code to test at the unit or integration layer. The relevant test layer is smoke/E2E verification of the running stack. - **Acceptance criteria are verifiable but not automated** — the ACs are good for manual sign-off. None of them are tied to an automated CI check. For a dev-environment observability setup, that's acceptable — you don't run GlitchTip in CI. But the ACs should be explicit about being manual verification steps. - **`curl -s http://localhost:3002/api/0/` returning HTTP 200 is the right smoke test** — this is testable and unambiguous. However, the existing E2E test suite (`npm run test:e2e`) does not start the observability stack. If GlitchTip integration (DSN wiring) is added to the backend/frontend in follow-up issues, those integration tests must _not_ depend on GlitchTip being up — the DSN env vars should be optional/no-op when absent. Otherwise CI breaks when `docker-compose.observability.yml` isn't running. - **The "depends on main stack must be up" precondition is untested** — there's no test or CI gate that verifies `glitchtip-db-init` succeeds when `archive-db` is running and fails gracefully when it's not. This is a runtime risk: if someone runs `docker compose -f docker-compose.observability.yml up` without the main stack, the failure message from `psql` will be a connection-refused error with no indication of why. A comment in the compose file explaining the dependency is the minimum mitigation. - **No healthcheck means ACs can pass prematurely** — the `curl -s http://localhost:3002/api/0/` AC can be checked while GlitchTip is still running database migrations. GlitchTip returns 500s during migration. The AC should specify "after `docker compose` reports all containers healthy" — but without a healthcheck on the `glitchtip` service, `docker compose up --wait` never waits for it. - **DSN verification is manual** — the ACs require a human to log in and copy a DSN. This is appropriate for a one-time infrastructure setup step. No automation is needed here. ### Recommendations - Add to the ACs: "Smoke test passes only after all services report healthy (`docker compose -f docker-compose.observability.yml ps` shows no containers in Starting state)." - Add a note to the compose file: `# Requires the main stack (docker-compose.yml) to be running — archive-db must be reachable on archiv-net.` This makes the precondition explicit at the point where it matters. - Ensure that follow-up issues (backend and frontend DSN wiring) include an AC: "DSN env var is optional — when absent, error tracking is silently disabled (no startup failure, no test failures in CI)." The Sentry SDK supports `SENTRY_DSN=""` to disable itself — verify this works for GlitchTip's DSN format. - The test pyramid impact of this issue is zero (no new unit/integration tests needed). The observability stack is a dev-only tool. Mark the ACs as manual verification — that's the right test strategy here.
Author
Owner

📋 Elicit — Requirements Engineer

Observations

The issue is well-structured for a DevOps infrastructure ticket. The body contains a Docker Compose snippet, acceptance criteria, and first-run steps. From a requirements completeness standpoint:

  • Context is clear and scoped — the issue correctly identifies GlitchTip as a Sentry-compatible error tracker, describes the Redis/Celery dependency, and links to the shared PostgreSQL instance pattern. The "Depends on" section references a scaffold issue — good dependency tracking.
  • "Type: Django" project in the ACs is misleading — GlitchTip works with any backend SDK that speaks the Sentry protocol. The Django project type in GlitchTip's UI is for GlitchTip's own organization metadata, not a constraint on which SDK you can use. The Spring Boot backend should use sentry-spring-boot-starter, which connects to GlitchTip via DSN regardless of the project type set in the UI. The AC says "type: Django" — this will cause confusion when the developer sees the project type list and wonders if they should choose "Java" or "Spring." Recommend changing the AC to: "Create a new Project (any type, e.g. Django) — the project type is metadata only; the DSN works with any Sentry-compatible SDK."
  • "First-Run Steps" scope — the issue says "document in commit message or PR body." As noted elsewhere, commit messages are ephemeral documentation. The first-run procedure is a permanent operational runbook entry. It belongs in docs/DEPLOYMENT.md under a new "Observability Stack" section.
  • Missing non-functional requirements — the issue doesn't specify:
    • Resource constraints — Redis and GlitchTip together add ~200-400MB RAM to the dev environment. On a developer machine with 16GB this is fine. On the CI runner or VPS it may matter.
    • Data retentionGLITCHTIP_MAX_EVENT_LIFE_DAYS: 90 is set. This is a good default. But the database growth implications of 90 days of error events are not discussed. For a family archive with low traffic this is negligible, but it should be acknowledged.
    • Startup order — the dependency on the main stack being up is a hard precondition. It's mentioned in the "Depends on" section but not encoded in the ACs.
  • AC for "a DSN copied from project settings" — the AC is correct but the phrasing "a DSN copied" implies a manual action. The Definition of Done says "all acceptance criteria checked" — these ACs are checked manually, not by CI. That's appropriate for this type of issue, but it should be made explicit.

Recommendations

  • Amend the AC about "Project (type: Django)" to clarify that the project type is GlitchTip metadata, not an SDK constraint. Suggested wording: "A new Project is created (project type is GlitchTip UI metadata only; any type works with a Sentry-compatible SDK) and a DSN is copied."
  • Add an AC: "The main stack (docker-compose.yml) is confirmed to be running before starting the observability stack."
  • Move first-run steps to docs/DEPLOYMENT.md as a persistent section rather than relying on PR/commit body.
## 📋 Elicit — Requirements Engineer ### Observations The issue is well-structured for a DevOps infrastructure ticket. The body contains a Docker Compose snippet, acceptance criteria, and first-run steps. From a requirements completeness standpoint: - **Context is clear and scoped** — the issue correctly identifies GlitchTip as a Sentry-compatible error tracker, describes the Redis/Celery dependency, and links to the shared PostgreSQL instance pattern. The "Depends on" section references a scaffold issue — good dependency tracking. - **"Type: Django" project in the ACs is misleading** — GlitchTip works with any backend SDK that speaks the Sentry protocol. The Django project type in GlitchTip's UI is for GlitchTip's own organization metadata, not a constraint on which SDK you can use. The Spring Boot backend should use `sentry-spring-boot-starter`, which connects to GlitchTip via DSN regardless of the project type set in the UI. The AC says "type: Django" — this will cause confusion when the developer sees the project type list and wonders if they should choose "Java" or "Spring." Recommend changing the AC to: "Create a new **Project** (any type, e.g. Django) — the project type is metadata only; the DSN works with any Sentry-compatible SDK." - **"First-Run Steps" scope** — the issue says "document in commit message or PR body." As noted elsewhere, commit messages are ephemeral documentation. The first-run procedure is a permanent operational runbook entry. It belongs in `docs/DEPLOYMENT.md` under a new "Observability Stack" section. - **Missing non-functional requirements** — the issue doesn't specify: - **Resource constraints** — Redis and GlitchTip together add ~200-400MB RAM to the dev environment. On a developer machine with 16GB this is fine. On the CI runner or VPS it may matter. - **Data retention** — `GLITCHTIP_MAX_EVENT_LIFE_DAYS: 90` is set. This is a good default. But the database growth implications of 90 days of error events are not discussed. For a family archive with low traffic this is negligible, but it should be acknowledged. - **Startup order** — the dependency on the main stack being up is a hard precondition. It's mentioned in the "Depends on" section but not encoded in the ACs. - **AC for "a DSN copied from project settings"** — the AC is correct but the phrasing "a DSN copied" implies a manual action. The Definition of Done says "all acceptance criteria checked" — these ACs are checked manually, not by CI. That's appropriate for this type of issue, but it should be made explicit. ### Recommendations - Amend the AC about "Project (type: Django)" to clarify that the project type is GlitchTip metadata, not an SDK constraint. Suggested wording: "A new **Project** is created (project type is GlitchTip UI metadata only; any type works with a Sentry-compatible SDK) and a DSN is copied." - Add an AC: "The main stack (`docker-compose.yml`) is confirmed to be running before starting the observability stack." - Move first-run steps to `docs/DEPLOYMENT.md` as a persistent section rather than relying on PR/commit body.
Author
Owner

🎨 Leonie Voss — UX Designer & Accessibility Strategist

Observations

This is a pure infrastructure issue — no frontend components, no UI changes, no user-facing flows. From my angle, the relevant concern is downstream: what gets built when these DSNs are wired into the frontend and backend.

  • No UI concerns in this issue — the GlitchTip UI is a third-party admin interface, not part of the Familienarchiv product. No brand, accessibility, or responsive design concerns apply here.
  • GlitchTip error grouping quality depends on what's sent — the DSN wiring issue (which follows this one) will determine what the @sentry/sveltekit SDK captures. By default, Sentry captures full page URLs including query parameters. For the Familienarchiv, document URLs contain UUIDs (/documents/{id}) which are non-sensitive, but user search queries (?q=...) may contain personal names. The frontend DSN integration issue should include PII filtering as an AC.
  • No concerns from my angle for this specific issue — the compose file, Redis service, and GlitchTip containers have no UX implications. The acceptance criteria are infrastructure-only. I'll review the DSN wiring issues when they arrive.

No blocking concerns from the UX/accessibility perspective.

## 🎨 Leonie Voss — UX Designer & Accessibility Strategist ### Observations This is a pure infrastructure issue — no frontend components, no UI changes, no user-facing flows. From my angle, the relevant concern is downstream: what gets built when these DSNs are wired into the frontend and backend. - **No UI concerns in this issue** — the GlitchTip UI is a third-party admin interface, not part of the Familienarchiv product. No brand, accessibility, or responsive design concerns apply here. - **GlitchTip error grouping quality depends on what's sent** — the DSN wiring issue (which follows this one) will determine what the `@sentry/sveltekit` SDK captures. By default, Sentry captures full page URLs including query parameters. For the Familienarchiv, document URLs contain UUIDs (`/documents/{id}`) which are non-sensitive, but user search queries (`?q=...`) may contain personal names. The frontend DSN integration issue should include PII filtering as an AC. - **No concerns from my angle for this specific issue** — the compose file, Redis service, and GlitchTip containers have no UX implications. The acceptance criteria are infrastructure-only. I'll review the DSN wiring issues when they arrive. No blocking concerns from the UX/accessibility perspective.
Author
Owner

🗳️ Decision Queue — Action Required

3 decisions need your input before implementation starts.

Architecture

  • ADR-015: Redis as Celery broker — GlitchTip requires Celery, Celery requires a message broker. Redis 7-alpine is the proposed choice. The alternatives are: (a) in-memory broker — fast but not durable, events lost on restart; (b) PostgreSQL-backed broker via django-celery-results — no new service, but GlitchTip doesn't support this out of the box. Write ADR-015 before implementing. (Raised by: Markus)

  • obs-net vs archiv-net — collapse to one network? — The compose snippet puts all observability services on both archiv-net and a new obs-net. GlitchTip only needs to reach archive-db and mailpit, which are already on archiv-net. Redis can be expose-only and still reachable by GlitchTip on the same network. The obs-net adds no isolation benefit at this scale. Decision: use only archiv-net for everything, or keep obs-net for future Prometheus/Loki/Alertmanager services in this milestone? (Raised by: Markus, Tobias)

Infrastructure / Security

  • Production port binding for PORT_GLITCHTIP — the dev compose binds to 0.0.0.0:3002. GlitchTip's Django admin at /admin/ gives raw database access to error events containing stack traces and potentially session data. Two sub-decisions: (1) When the prod compose is written, confirm 127.0.0.1:${PORT_GLITCHTIP}:8080 binding and Caddy fronting. (2) Should the Caddy config for production block the /admin/ path entirely, or is it acceptable behind basic auth? (Raised by: Nora, Tobias)
## 🗳️ Decision Queue — Action Required _3 decisions need your input before implementation starts._ ### Architecture - **ADR-015: Redis as Celery broker** — GlitchTip requires Celery, Celery requires a message broker. Redis 7-alpine is the proposed choice. The alternatives are: (a) in-memory broker — fast but not durable, events lost on restart; (b) PostgreSQL-backed broker via `django-celery-results` — no new service, but GlitchTip doesn't support this out of the box. Write ADR-015 before implementing. _(Raised by: Markus)_ - **`obs-net` vs `archiv-net` — collapse to one network?** — The compose snippet puts all observability services on both `archiv-net` and a new `obs-net`. GlitchTip only needs to reach `archive-db` and `mailpit`, which are already on `archiv-net`. Redis can be `expose`-only and still reachable by GlitchTip on the same network. The `obs-net` adds no isolation benefit at this scale. Decision: use only `archiv-net` for everything, or keep `obs-net` for future Prometheus/Loki/Alertmanager services in this milestone? _(Raised by: Markus, Tobias)_ ### Infrastructure / Security - **Production port binding for `PORT_GLITCHTIP`** — the dev compose binds to `0.0.0.0:3002`. GlitchTip's Django admin at `/admin/` gives raw database access to error events containing stack traces and potentially session data. Two sub-decisions: (1) When the prod compose is written, confirm `127.0.0.1:${PORT_GLITCHTIP}:8080` binding and Caddy fronting. (2) Should the Caddy config for production block the `/admin/` path entirely, or is it acceptable behind basic auth? _(Raised by: Nora, Tobias)_
Author
Owner

Implementation complete on branch feat/issue-578-glitchtip.

Two commits:

  1. feat(observability): add GlitchTip error tracking to observability stack — adds four services to docker-compose.observability.yml:

    • obs-glitchtip-db-init — one-shot postgres:16-alpine container that creates the glitchtip database on archive-db if it doesn't already exist
    • obs-redis — Redis 7 Alpine broker for Celery, obs-net-internal only
    • obs-glitchtip — main web process, bound to 127.0.0.1:${PORT_GLITCHTIP:-3002}:8080, on both archiv-net (reaches archive-db and mailpit) and obs-net
    • obs-glitchtip-worker — Celery + beat worker (./bin/run-celery-with-beat.sh)
  2. docs(observability): document GlitchTip services in DEPLOYMENT.md and C4 diagram — extends the observability env var table with PORT_GLITCHTIP, GLITCHTIP_DOMAIN, GLITCHTIP_SECRET_KEY; adds service rows to the services table; adds a GlitchTip first-run subsection (superuser creation, org + two projects); updates docs/architecture/c4/l2-containers.puml with the GlitchTip and Redis containers and their relationships.

Validation: docker compose -f docker-compose.observability.yml config exits cleanly (expected warning for GLITCHTIP_SECRET_KEY having no default — it's a required secret by design).

PR-ready — no breaking changes, additive-only.

Implementation complete on branch `feat/issue-578-glitchtip`. **Two commits:** 1. `feat(observability): add GlitchTip error tracking to observability stack` — adds four services to `docker-compose.observability.yml`: - `obs-glitchtip-db-init` — one-shot postgres:16-alpine container that creates the `glitchtip` database on `archive-db` if it doesn't already exist - `obs-redis` — Redis 7 Alpine broker for Celery, `obs-net`-internal only - `obs-glitchtip` — main web process, bound to `127.0.0.1:${PORT_GLITCHTIP:-3002}:8080`, on both `archiv-net` (reaches `archive-db` and `mailpit`) and `obs-net` - `obs-glitchtip-worker` — Celery + beat worker (`./bin/run-celery-with-beat.sh`) 2. `docs(observability): document GlitchTip services in DEPLOYMENT.md and C4 diagram` — extends the observability env var table with `PORT_GLITCHTIP`, `GLITCHTIP_DOMAIN`, `GLITCHTIP_SECRET_KEY`; adds service rows to the services table; adds a GlitchTip first-run subsection (superuser creation, org + two projects); updates `docs/architecture/c4/l2-containers.puml` with the GlitchTip and Redis containers and their relationships. **Validation:** `docker compose -f docker-compose.observability.yml config` exits cleanly (expected warning for `GLITCHTIP_SECRET_KEY` having no default — it's a required secret by design). PR-ready — no breaking changes, additive-only.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: marcel/familienarchiv#578