obs.env documents POSTGRES_HOST but does not set a value, so obs-secrets.env does not 'override' it — it is the only source. Reword the carried-over comment to match reality. Raised in review (Tobias). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
128 lines
6.2 KiB
YAML
128 lines
6.2 KiB
YAML
name: Deploy observability stack
|
|
description: >-
|
|
Deploy observability configs + secrets to /opt/familienarchiv, validate the
|
|
compose config, start the stack, and assert the five healthchecked services
|
|
are healthy. Per-environment values arrive as inputs.
|
|
|
|
inputs:
|
|
grafana_admin_password:
|
|
description: Grafana admin password (secret)
|
|
required: true
|
|
grafana_db_password:
|
|
description: Read-only grafana_reader DB role password (secret, issue #651)
|
|
required: true
|
|
glitchtip_secret_key:
|
|
description: GlitchTip Django secret key (secret)
|
|
required: true
|
|
postgres_password:
|
|
description: PostgreSQL password for the environment (secret)
|
|
required: true
|
|
postgres_host:
|
|
description: >-
|
|
Compose project + service hostname, e.g. archiv-staging-db-1. Derived
|
|
from the Compose project name and service name — a project rename
|
|
requires updating the caller's value. Plain input, not a secret.
|
|
required: true
|
|
|
|
runs:
|
|
using: composite
|
|
steps:
|
|
- name: Deploy observability configs
|
|
shell: bash
|
|
# Copies the compose file and config tree from the workspace checkout
|
|
# into /opt/familienarchiv/ — the permanent location that persists
|
|
# between CI runs. Containers started in the next step bind-mount
|
|
# from there, so a future workspace wipe cannot corrupt a running
|
|
# config file.
|
|
#
|
|
# obs-secrets.env is written fresh from Gitea secrets on every run so
|
|
# Gitea is always the single source of truth for secret rotation.
|
|
# Non-secret config lives in infra/observability/obs.env (tracked in git).
|
|
#
|
|
# secrets.* is NOT available inside a composite action, so the values
|
|
# arrive as inputs mapped to env: below and are referenced as $VAR in
|
|
# the heredoc. The delimiter MUST stay unquoted (<<EOF, not <<'EOF') so
|
|
# the shell expands $VAR — a quoted delimiter would write the literal
|
|
# string "$GRAFANA_ADMIN_PASSWORD" and `config --quiet` would still pass
|
|
# (the var is present, just wrong). Do not stage these into intermediate
|
|
# variables either, or Gitea log masking can be lost.
|
|
env:
|
|
GRAFANA_ADMIN_PASSWORD: ${{ inputs.grafana_admin_password }}
|
|
GRAFANA_DB_PASSWORD: ${{ inputs.grafana_db_password }}
|
|
GLITCHTIP_SECRET_KEY: ${{ inputs.glitchtip_secret_key }}
|
|
POSTGRES_PASSWORD: ${{ inputs.postgres_password }}
|
|
POSTGRES_HOST: ${{ inputs.postgres_host }}
|
|
run: |
|
|
set -euo pipefail
|
|
rm -rf /opt/familienarchiv/infra/observability
|
|
mkdir -p /opt/familienarchiv/infra/observability
|
|
cp -r infra/observability/. /opt/familienarchiv/infra/observability/
|
|
cp docker-compose.observability.yml /opt/familienarchiv/
|
|
cat > /opt/familienarchiv/obs-secrets.env <<EOF
|
|
GRAFANA_ADMIN_PASSWORD=$GRAFANA_ADMIN_PASSWORD
|
|
GRAFANA_DB_PASSWORD=$GRAFANA_DB_PASSWORD
|
|
GLITCHTIP_SECRET_KEY=$GLITCHTIP_SECRET_KEY
|
|
POSTGRES_PASSWORD=$POSTGRES_PASSWORD
|
|
POSTGRES_HOST=$POSTGRES_HOST
|
|
EOF
|
|
# Five-key non-empty guard: a bare presence check matches an empty
|
|
# `KEY=` line, so assert each key has a value. Fail loudly on any
|
|
# missing/empty key rather than starting the stack with broken auth.
|
|
for key in GRAFANA_ADMIN_PASSWORD GRAFANA_DB_PASSWORD GLITCHTIP_SECRET_KEY POSTGRES_PASSWORD POSTGRES_HOST; do
|
|
grep -Eq "^${key}=.+" /opt/familienarchiv/obs-secrets.env \
|
|
|| { echo "::error::obs-secrets.env missing or empty: ${key}"; exit 1; }
|
|
done
|
|
# chmod 600 MUST be the final operation: the ordering is the security
|
|
# property — there is no window where the file is world-readable.
|
|
chmod 600 /opt/familienarchiv/obs-secrets.env
|
|
|
|
- name: Validate observability compose config
|
|
shell: bash
|
|
# Dry-run: resolves all variable substitutions and reports any missing
|
|
# required keys before containers start. Catches undefined variables and
|
|
# YAML errors in config files updated by the previous step.
|
|
# --env-file order: obs.env first (git-tracked defaults), obs-secrets.env
|
|
# second (CI-written secrets). Later files win on duplicate keys. POSTGRES_HOST
|
|
# is environment-specific and supplied only by obs-secrets.env — obs.env
|
|
# documents it but deliberately does not set a value.
|
|
run: |
|
|
docker compose \
|
|
-f /opt/familienarchiv/docker-compose.observability.yml \
|
|
--env-file /opt/familienarchiv/infra/observability/obs.env \
|
|
--env-file /opt/familienarchiv/obs-secrets.env \
|
|
config --quiet
|
|
|
|
- name: Start observability stack
|
|
shell: bash
|
|
# Runs with absolute paths so bind mounts resolve to stable host paths
|
|
# that survive workspace wipes between runs (see ADR-016).
|
|
# Non-secret config from obs.env (git-tracked); secrets from obs-secrets.env
|
|
# (written fresh from Gitea secrets above). --env-file order: obs.env first,
|
|
# obs-secrets.env second — later file wins on duplicate keys.
|
|
run: |
|
|
docker compose \
|
|
-f /opt/familienarchiv/docker-compose.observability.yml \
|
|
--env-file /opt/familienarchiv/infra/observability/obs.env \
|
|
--env-file /opt/familienarchiv/obs-secrets.env \
|
|
up -d --wait --remove-orphans
|
|
|
|
- name: Assert observability stack health
|
|
shell: bash
|
|
# docker compose up --wait covers services WITH healthcheck directives only.
|
|
# obs-promtail, obs-cadvisor, obs-node-exporter, and obs-glitchtip-worker have
|
|
# no healthcheck — they are considered "started" as soon as the process runs.
|
|
# This step explicitly asserts the five healthchecked critical services are
|
|
# healthy before the smoke test proceeds.
|
|
run: |
|
|
set -e
|
|
unhealthy=""
|
|
for svc in obs-loki obs-prometheus obs-grafana obs-tempo obs-glitchtip; do
|
|
status=$(docker inspect "$svc" --format '{{.State.Health.Status}}' 2>/dev/null || echo "missing")
|
|
if [ "$status" != "healthy" ]; then
|
|
echo "::error::$svc is not healthy (status: $status)"
|
|
unhealthy="$unhealthy $svc"
|
|
fi
|
|
done
|
|
[ -z "$unhealthy" ] || exit 1
|
|
echo "All critical observability services are healthy"
|