The job label (derived from the Docker Compose service name) is what
powers {job="backend"} queries in Loki dashboards and populates the
Grafana "App" variable dropdown. Operators need to know this mapping
when writing custom Loki queries.
Addresses @markus non-blocker suggestion from PR #606 review.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Documents the architectural decision behind the dedicated management
SecurityFilterChain, the discovery that SB4+Jetty removed the isolated
management child-context security, and the consequences for actuator
endpoint exposure.
Addresses @markus blocker from PR #606 review.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add @Order(1) managementFilterChain scoped to /actuator/** with explicit
401 entry point, blocking all non-public actuator paths without the
form-login redirect that the main chain uses for browser clients.
- Split single combined test into two focused assertions
(prometheus_endpoint_returns_200_without_credentials,
prometheus_endpoint_returns_jvm_metrics).
- Add negative regression test: actuator_metrics_requires_authentication
verifies that /actuator/metrics returns 401 without credentials.
Addresses reviewer concerns from @sara (missing negative test, split
assertions) and @nora (dedicated management security layer).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Four Spring Boot 4.0-specific issues prevented /actuator/prometheus from working:
1. spring-boot-starter-micrometer-metrics missing — Spring Boot 4.0 splits
Micrometer metrics export (including the Prometheus scrape endpoint) out of
spring-boot-starter-actuator into its own starter. Added dependency.
2. management.prometheus.metrics.export.enabled not set — Spring Boot 4.0
defaults metrics export to false (opt-in). Added the property to
application.yaml.
3. SecurityConfig did not permit /actuator/prometheus — Spring Boot 4.0
with Jetty serves the management port (8081) via the same security filter
chain as the main port (8080). The previous commit's exclusion of
ManagementWebSecurityAutoConfiguration was a no-op (that class no longer
exists in Spring Boot 4.0); removed it and added the correct permitAll()
rule. Updated the architecture comment in application.yaml to reflect the
true filter-chain behaviour.
4. Reverted invalid FamilienarchivApplication.java change from the prior
commit (ManagementWebSecurityAutoConfiguration import compiled against a
class that does not exist in the Spring Boot 4.0 BOM).
Also adds ActuatorPrometheusIT — an integration test that asserts the
/actuator/prometheus endpoint returns 200 with jvm_memory_used_bytes without
credentials, serving as regression protection against future Spring Boot
upgrades silently breaking metrics collection.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Three root causes confirmed via live server investigation (issue #604):
1. ManagementWebSecurityAutoConfiguration applied HTTP Basic auth to the
management port (8081), causing Prometheus to receive 401 HTML responses
instead of metrics. Excluded the auto-config — the Docker network
(archiv-net) provides the security boundary for this internal port.
2. promtail-config.yml had no `job` relabel rule. Grafana's Loki dashboards
query {job="$app"} which matched nothing; logs were in Loki under
compose_service but invisible to every dashboard panel.
3. prometheus.yml had a stale comment claiming the spring-boot target would
be DOWN until micrometer-registry-prometheus was added — it has been
present in pom.xml for some time.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
GlitchTip 6.x moved its internal listen port from 8080 to 8000.
The ports mapping was forwarding to the wrong port (host traffic
never reached the app), and the healthcheck was probing 8080 with
wget (not present in the image), causing the container to stay
permanently unhealthy.
Fix: map to port 8000, check with bash /dev/tcp (no external tools
needed, available in the Python base image).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The live runner config was missing /opt/familienarchiv in valid_volumes
and options, so deploy steps wrote files into the ephemeral job
container rather than the host — silently discarded on exit.
Updated /root/docker/gitea/runner-config.yaml on the server and
restarted gitea-runner. Repo file now matches the server exactly,
including the network: gitea_gitea setting that was previously
only on the server.
DEPLOYMENT.md: clarifies that /opt/familienarchiv does not need to be
in the runner container's own volumes (DooD spawns job containers from
the host daemon directly); updates restart command from systemctl to
docker restart; narrows the cp-r stale-file note to manual ops only
(CI uses rm -rf before copying).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
rsync is not present in the act_runner job container image. rm -rf +
cp -r gives identical semantics (including removal of deleted files)
using only coreutils, which are always available.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Move POSTGRES_USER to obs.env (non-secret, constant across envs)
- Replace cp -r with rsync -a --delete so removed config files are
purged from /opt/familienarchiv on next deploy instead of lingering
- Document --env-file ordering contract in validate + start steps:
obs.env first (defaults), obs-secrets.env second (wins on dupes)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
CI uses 'cp -r' which does not remove deleted files. Documents the
manual cleanup step for config files removed from git.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Same fix as nightly.yml: prevents shell expansion of '$' in secret
values after Gitea renders them. Keep in sync with nightly.yml.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Prevents shell from expanding '$' in Gitea-rendered secret values.
Without the quote, a password like 'P@$s5w0rd' has '$s5w0rd' silently
expanded to '' — writing a truncated value to obs-secrets.env.
'<<'EOF'' suppresses shell expansion; Gitea's '${{ }}' template
rendering already ran before the shell sees the script.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The container names archiv-staging-db-1 and archiv-production-db-1 are
derived from the Compose project name + service name. A project rename
silently breaks the obs stack DB connection. Add a comment at the point
of definition so the dependency is obvious when someone changes it.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The heredoc creates the file with default umask permissions (644 —
world-readable). Setting 600 immediately after creation prevents other
processes on the host from reading the Grafana, GlitchTip, and Postgres
credentials. Defence-in-depth for the single-tenant VPS.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
nightly.yml had two observability gates that release.yml lacked:
- "Validate observability compose config" (docker compose config --quiet)
catches missing env vars and YAML errors before any containers start
- "Assert observability stack health" checks obs-loki/prometheus/grafana/tempo
are healthy after up --wait, covering services without healthcheck directives
Mirrors the nightly.yml steps verbatim so the production deploy path is at
least as well-verified as the nightly staging path.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The Decision section described an operator-managed /opt/familienarchiv/.env
that CI does not touch. The actual implementation is a two-source model:
obs.env (git-tracked, non-secret config) + obs-secrets.env (CI-written
fresh from Gitea secrets on every deploy). Also updates the Consequences
bullet that incorrectly stated secrets are decoupled from CI.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The observability stack's bind-mount sources pointed to workspace-relative
paths. When CI wiped the workspace between runs, containers kept running but
their config files disappeared — causing Docker to auto-create directories
at the missing paths and crash the services on next restart.
Fix: mount /opt/familienarchiv/ into CI job containers via runner-config.yaml,
then copy infra/observability/ and docker-compose.observability.yml there before
docker compose up. Compose runs from the permanent path, so bind mounts resolve
to stable host paths that survive workspace wipes.
Docker Compose reads /opt/familienarchiv/.env automatically (no --env-file flag),
which is managed on the server and persists between CI runs.
Closes#601
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
POSTGRES_HOST variable (default: archive-db) lets the observability stack
connect to a different Postgres container — needed when only the staging
stack is running (container name: archiv-staging-db-1).
PORT_GRAFANA default changed from 3001 to 3003 to avoid collision with
the staging frontend which occupies 3001.
Closes#601
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Tempo 2.7.2 removed `processors` from the top-level metrics_generator
config; the field is only valid under `overrides.defaults.metrics_generator`.
The setting was already present there, so this only removes the now-rejected
duplicate at the top level.
Closes part of #601
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
5646e739 added svelte-kit sync before lint so .svelte-kit/tsconfig.json
always exists. This activated projectService: true for every run, which
builds the full TypeScript language service for all .svelte files and
caused CI lint to take 7+ minutes.
None of the rules in the Svelte-specific block need type information —
they are all AST-selector-based no-restricted-syntax checks. Removing
projectService restores the previous fast path without losing any lint
coverage.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add the /srv/gitea-workspace prerequisite step to DEPLOYMENT.md §3.1
and a new "Workspace bind-mount setup" subsection plus failure mode 4
to ci-gitea.md, covering the root cause, one-time host setup, disk
management, and troubleshooting for the bind-mount resolution fix
introduced in ADR-015.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Documents the decision to use workdir_parent + identical host<->container
path instead of the overlay2 MergedDir sync that was in the initial fix.
Captures the alternatives (nsenter sync, image-baked configs, path mismatch)
and the operational consequences (prereq directory, out-of-band compose.yaml).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
runner-config.yaml: correct path to /srv/gitea-workspace (VPS, not Synology).
docker-compose.observability.yml: revert 5 bind mounts to plain relative paths;
OBS_CONFIG_DIR variable is no longer needed.
nightly.yml / release.yml: remove OBS_CONFIG_DIR env injection and the
"Sync observability configs to host" step from both workflows.
With workdir_parent=/srv/gitea-workspace and an identical host<->container
bind mount, $(pwd) inside job containers resolves to a real host path the
daemon can find — no privileged container, no overlay2 inspection, no nsenter.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Set workdir_parent to /volume1/gitea-workspace so act_runner stores job
workspaces at a real NAS path. Mounting that path at the same absolute
location in job containers means $(pwd) inside any job container resolves
to a host path the daemon can find — no overlay2 tricks needed.
Prerequisite (NAS): mkdir -p /volume1/gitea-workspace and add
- /volume1/gitea-workspace:/volume1/gitea-workspace
to the runner service volumes in gitea's docker-compose.yml, then restart
the runner.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
DooD runner only shares /var/run/docker.sock — no workspace directory is
mapped to the host daemon. Relative bind mounts in
docker-compose.observability.yml resolved to paths that didn't exist on
the host; Docker auto-created directories in their place, causing
'not a directory' mount failures for all five config files.
Fix:
- docker-compose.observability.yml: replace hardcoded ./infra/observability/
prefix with ${OBS_CONFIG_DIR:-./infra/observability} so the path is
configurable while remaining backwards-compatible for local use.
- nightly.yml / release.yml: add a 'Sync observability configs to host'
step that finds the job container's overlay2 MergedDir (the container's
full filesystem as seen from the host mount namespace), then uses the
existing nsenter/alpine pattern to cp the config tree into a stable host
path (/srv/familienarchiv-{staging,production}/obs-configs).
OBS_CONFIG_DIR is injected into the env file so Compose picks it up.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
glitchtip/glitchtip:v4 is not a real tag — GlitchTip does not use a
v-prefix in its Docker image versioning. Latest stable release is 6.1.6.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The endpoint belongs in the compose file (hardcoded to the in-network
Tempo service) rather than per-environment workflow files. This covers
both staging (nightly.yml) and production (release.yml) with a single
change and removes the duplicate from the nightly env-file block.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Two bugs introduced when the management port was split from the app port:
1. Backend healthcheck hit localhost:8080/actuator/health (app port) —
actuator is on management.server.port=8081, so every probe got a 404
from the main MVC dispatcher, marking the container permanently unhealthy.
Fix: change the probe to localhost:8081.
2. OTEL_EXPORTER_OTLP_ENDPOINT was not set in .env.staging, so the exporter
fell back to http://localhost:4317 (the CI-safe default) instead of
http://tempo:4317 (the in-network Tempo service). Fix: inject the correct
endpoint in the nightly env-file generation step.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The name: archiv-net declaration (needed so docker-compose.observability.yml
can join the network as external: true) caused the compose-idempotency CI job
to collide with any archiv-net left on the runner from staging or a previous
run. mc would resolve 'minio' to the wrong container and fail with a signature
mismatch.
Make the network name interpolable via COMPOSE_NETWORK_NAME (default: archiv-net
so production/staging behaviour is unchanged). Inject COMPOSE_NETWORK_NAME=
test-idem-archiv-net into the stub env file so the idempotency test always
gets a fully isolated network.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
opentelemetry-spring-boot-starter:2.27.0 was built against
opentelemetry-api:1.61.0. Spring Boot 4.0.0 only manages 1.55.0,
which is missing GlobalOpenTelemetry.getOrNoop(). The backend crashed
at startup with NoSuchMethodError on the first staging nightly.
Add a <dependencyManagement> import of opentelemetry-bom:1.61.0 before
the Spring Boot BOM applies, so all OTel core artifacts resolve to the
version the instrumentation starter actually requires.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Both nightly and release workflows were missing --remove-orphans on the
observability compose up, while the main app deploy step already had it.
Without it, containers removed from docker-compose.observability.yml
linger as unnamed orphans until manually pruned.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The GlitchTip vhost only had a manual HSTS header; the rest of the
(security_headers) snippet (X-Content-Type-Options, Referrer-Policy,
Permissions-Policy, -Server removal) was missing.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
GRAFANA_ADMIN_PASSWORD, GLITCHTIP_SECRET_KEY, and SENTRY_DSN were
referenced in the workflow env files but absent from the secrets table,
leaving the first-run operator without a complete checklist.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds SENTRY_DSN as an optional secret (empty by default) so it can be
set after GlitchTip first-run without requiring another code change.
Backend reads it via application.yaml; empty value keeps Sentry disabled.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Prometheus, Loki, Tempo, and Grafana all define healthchecks in
docker-compose.observability.yml. Without --wait, the step exits 0
as soon as containers are created, masking startup failures silently.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Caddy does not set Strict-Transport-Security on GlitchTip because the
full security_headers snippet is intentionally omitted (Permissions-Policy
interferes with the Sentry SDK CORS). Adding HSTS alone guarantees
HTTPS enforcement at the Caddy layer without breaking SDK ingestion.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
grafana.archiv.raddatz.cloud → 127.0.0.1:3003 (with security headers)
glitchtip.archiv.raddatz.cloud → 127.0.0.1:3002 (no security headers —
GlitchTip manages its own; the Sentry SDK also POSTs here)
Requires A records for both subdomains pointing at the server before
the next `systemctl reload caddy`.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- docker-compose.prod.yml: add `name: archiv-net` so the network has a
stable Docker name regardless of compose project name (-p flag).
Both staging and production share the same host-level network, which
is correct since the observability stack is a single shared instance.
- nightly.yml / release.yml: add observability env vars (POSTGRES_USER,
PORT_GRAFANA=3003, PORT_GLITCHTIP=3002, PORT_PROMETHEUS=9090,
GRAFANA_ADMIN_PASSWORD, GLITCHTIP_SECRET_KEY, GLITCHTIP_DOMAIN) to the
env file, then `docker compose -f docker-compose.observability.yml up -d`
after the app deploy step. PORT_GRAFANA=3003 avoids collision with
staging frontend on 3001.
Requires two new Gitea secrets: GRAFANA_ADMIN_PASSWORD, GLITCHTIP_SECRET_KEY.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
4 integration test classes were restarting the full Spring context (and a new
Postgres Testcontainer, ~75s each) after every test method — 10 unnecessary
container startups adding ~12 minutes to CI. Fixed by:
- PersonServiceIntegrationTest, DocumentSearchPagedIntegrationTest,
GeschichteServiceIntegrationTest: swap to @Transactional so each test
rolls back instead of destroying the context.
- AuditServiceIntegrationTest: cannot use @Transactional (logAfterCommit
hooks into AFTER_COMMIT which requires a real commit); reset state with
@BeforeEach deleteAll() instead.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
sentry-spring-boot-starter-jakarta 8.5.0 does not support Spring Boot 4.0 —
it logs an "Incompatible Spring Boot Version" warning and its SentryAutoConfiguration
crashes SF7 bean-name generation. sentry-spring-boot-4 (added in 8.21.0) is the
dedicated Spring Boot 4 module with a fixed auto-configuration class.
- Replace sentry-spring-boot-starter-jakarta:8.5.0 with sentry-spring-boot-4:8.41.0
- Delete SentryConfig.java — workaround no longer needed, auto-config handles init
- Remove spring.autoconfigure.exclude from application.yaml + application-test.yaml
- Delete SentryConfigTest.java — tested the deleted workaround class
- Update ApplicationContextTest: assert Sentry.isEnabled() is false when no DSN set
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
SentryAutoConfiguration$HubConfiguration$SentrySpanRestClientConfiguration is a triply-
nested @Configuration class conditionally loaded when RestClient is on the classpath
(always true on Spring Framework 7). Spring Framework 7's bean name generator fails
on such deeply-nested @Import-ed classes, crashing every @SpringBootTest context.
Replace the broken auto-configuration with a minimal SentryConfig bean that calls
Sentry.init() with the same properties (DSN, environment, sample rate, PII guard,
DomainException filter). Unexpected 5xx exceptions are forwarded to Sentry via
Sentry.captureException() in GlobalExceptionHandler.handleGeneric().
Also add management.server.port=0 to application-test.yaml to eliminate TIME_WAIT
conflicts from @DirtiesContext restarts on the fixed management port 8081 (see #593).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Port 8081 was fixed by #576. With four @DirtiesContext(AFTER_EACH_TEST_METHOD)
classes (22 context restarts total), the OS TIME_WAIT state holds port 8081
for ~45-60s per cycle — adding ~17 min overhead. All 1601 tests pass but
surefire's 10-min timeout fires before the suite finishes.
Fixes#593.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds the sentrySvelteKit() Vite plugin as the first plugin in vite.config.ts.
When SENTRY_AUTH_TOKEN is set at build time, source maps are uploaded to
GlitchTip so error stack traces show original TypeScript source and line number.
When SENTRY_AUTH_TOKEN is absent (CI, dev builds), upload is disabled via
autoUploadSourceMaps: false — the build succeeds normally.
Resolves Felix's review blocker on PR #591.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds @sentry/sveltekit to hooks.client.ts and hooks.server.ts.
When VITE_SENTRY_DSN is unset (default), Sentry is fully disabled.
When set to a GlitchTip JavaScript project DSN, browser exceptions
and SSR handleError events are forwarded automatically.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds GlitchTip env vars to the observability env var table, extends the
services table, and adds a first-run section with superuser creation and
project setup steps. Updates the C4 L2 container diagram with GlitchTip
and Redis containers and their relationships.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds obs-glitchtip, obs-glitchtip-worker, obs-redis, and obs-glitchtip-db-init
services to docker-compose.observability.yml. The one-shot db-init container
creates the dedicated glitchtip database on the existing archive-db PostgreSQL
instance automatically on first stack start.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add Grafana row to the observability services table, Grafana access details
(URL, credentials, auto-provisioned datasources, pre-loaded dashboards), and
GRAFANA_ADMIN_PASSWORD to the env vars table in DEPLOYMENT.md.
Update C4 l2-containers.puml: replace placeholder Grafana entry with pinned
image version, expand observability boundary with node_exporter and cadvisor
containers, and add Rel() edges for Grafana → Prometheus, Loki, and Tempo.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add obs-grafana service (grafana/grafana-oss:11.6.1) to docker-compose.observability.yml.
Datasources (Prometheus, Loki, Tempo) are auto-provisioned via
infra/observability/grafana/provisioning/datasources/datasources.yml with
cross-datasource linking (Loki traceId → Tempo, Tempo → Loki, service map via Prometheus).
Three dashboards are pre-loaded: Node Exporter Full (1860), Spring Boot Observability (17175),
Loki Logs (13639) — datasource template variables replaced with provisioned UIDs.
GRAFANA_ADMIN_PASSWORD added to .env.example.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
opentelemetry-spring-boot-starter:2.27.0 pulls in AzureAppServiceResourceProvider which
references ServiceAttributes.SERVICE_INSTANCE_ID — a field absent from the semconv version
used by this project. This caused every integration test to fail with NoSuchFieldError during
Spring context startup.
Fix 1 (application-test.yaml): set otel.sdk.disabled=true so the OTel auto-configuration
never runs during tests at all.
Fix 2 (pom.xml): exclude opentelemetry-azure-resources from the starter dependency to remove
the problematic provider from the dependency graph entirely.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add micrometer-registry-prometheus (BOM-managed) to expose /actuator/prometheus
- Add micrometer-tracing-bridge-otel (BOM-managed) for Micrometer → OTel tracing bridge
- Add opentelemetry-spring-boot-starter 2.27.0 (pinned — not in Spring Boot BOM)
- Move management to port 8081 so Prometheus scrapes directly inside archiv-net,
bypassing both Caddy and Spring Security's session-authenticated filter chain
- Configure otel.service.name and OTLP endpoint (default localhost:4317 for CI safety)
- Set tracing sampling probability to 1.0 in base config; override via env var in compose
- Add OTEL_EXPORTER_OTLP_ENDPOINT + MANAGEMENT_TRACING_SAMPLING_PROBABILITY to docker-compose.yml
- Expose management port 8081 inside archiv-net for Prometheus scraping
- Disable trace export in application-test.yaml (probability: 0.0) for deterministic CI
OTLP export failures are non-fatal; app starts cleanly without Tempo running.
Closes#576
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replace the stale "no monitoring infrastructure in place yet" note in
§4 with a brief description of the observability compose file and a
pointer to issue #581 for full docs.
Add a placeholder System_Boundary block for Prometheus + Loki + Grafana
to l2-containers.puml, showing the stack joins archiv-net.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Creates the skeleton observability stack (no running services yet) that all
subsequent Grafana LGTM + GlitchTip issues depend on:
- docker-compose.observability.yml: external archiv-net join, obs-net bridge,
named volumes for all five services, placeholder comments for each service
group (Metrics/Logs/Traces/Dashboards/Error Tracking), startup-order note
- infra/observability/{prometheus,loki,promtail,tempo,grafana/provisioning/{datasources,dashboards}}/.gitkeep
- .env.example: new # --- Observability --- section with PORT_GRAFANA,
PORT_GLITCHTIP, PORT_PROMETHEUS, GLITCHTIP_DOMAIN, GLITCHTIP_SECRET_KEY
(with generation hint), SENTRY_DSN, VITE_SENTRY_DSN
Verified: docker compose -f docker-compose.observability.yml config exits 0
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The production stage runs npm ci --omit=dev to install runtime deps for
the pre-built SvelteKit app. The postinstall script calls patch-package,
which is a devDependency, so it is absent and causes exit code 127.
--ignore-scripts is the correct npm-native fix: no lifecycle scripts are
needed when installing into a pre-built image.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Use untrack() for intentional one-time prop seed in UserGroupsSection.
Add explicit LoadData type alias in page.server.test to avoid void|Record<string,any> union.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
When groups load successfully but the list is empty, render a quiet
"Keine Gruppen vorhanden." message rather than a blank section that
leaves users uncertain whether groups failed to load.
Adds admin_new_invite_no_groups i18n key to de/en/es.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Spring Framework 7 prohibits constructor injection cycles. InviteService
already injects UserService, so UserService cannot inject InviteService
for the deleteGroup guard — repository injection is the correct workaround.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Client-submitted duplicate UUIDs were causing a false GROUP_NOT_FOUND:
size(deduplicated_db_result)==1 != size(submitted)==2. Deduplicate input
with HashSet before calling findGroupsByIds so the size comparison is
always against unique IDs.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- legend uses m.admin_new_invite_groups() instead of hardcoded "Gruppen"
so screen readers announce the correct string in en/es locales
- label gets min-h-[44px] for WCAG 2.2 touch target compliance
- add test asserting fieldset accessible name comes from i18n key
- add test documenting empty-groups-no-error renders no checkboxes/banner
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replace hand-copied load/action replicas with direct imports of the
real module. Mock $env/dynamic/private so the tests cover the actual
production code paths, not a duplicate that can drift.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Screen readers now announce the amber warning when it appears after
the form expands, without requiring the user to navigate to it.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
bind:group requires a writable $state variable; $derived is read-only
in Svelte 5, so every click was silently reset to unchecked, making
the group picker non-functional.
Also wraps checkboxes in <fieldset>/<legend> for WCAG 1.3.1 compliance.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- load() fetches /api/groups in parallel with /api/invites; returns
sorted groups array and groupsLoadError for partial failures
- create action forwards groupIds[] to POST /api/invites so invited
users are placed in the selected groups on registration
- +page.svelte: group checkboxes via UserGroupsSection inside the form;
amber warning banner when groups could not be loaded
- page.svelte.test.ts: groups checkboxes + warning banner tests
- page.server.test.ts: parallel fetch, sorting, error fallback,
groupIds in POST body
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds the error code to the ErrorCode union and getErrorMessage() switch.
Adds admin_new_invite_groups, admin_invite_groups_load_error, and
error_group_has_active_invites to all three locale files (de/en/es).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds GROUP_HAS_ACTIVE_INVITES error code and guards UserService.deleteGroup()
with a 409 conflict when any active (non-revoked, non-expired, non-exhausted)
invite token still holds the group UUID.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Extract ImportStatus type to types.ts — removes duplication across
+page.svelte, ImportStatusCard.svelte, and test file (Felix blocker)
- Fix H2 to match CLAUDE.md card pattern: text-xs uppercase tracking-widest
text-ink-3 mb-5 (Leonie blocker 1)
- Add font-sans to RUNNING and DONE status labels (Leonie blocker 2)
- Add data-testid="processed-count" to count elements in both states
- Replace document.querySelector with locator API in spinner tests
- Tighten getByText('7') to getByTestId('processed-count') (Felix/Sara)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Covers the success path — previously untested per Sara's review.
Creates a minimal empty XLSX via XSSFWorkbook so processRows returns 0.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
toBeAttached() is not in the vitest-browser matcher set; toBeVisible() was
previously ruled out because the spinner is 0x0 px. Mirror the querySelector
pattern already used for the negative case in the same file.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
CI Chromium runs with German locale so hardcoded English strings like
'No spreadsheet file found.' never matched. Use m.admin_system_import_*()
to assert whatever locale the browser resolves to.
Spinner test used toBeVisible() on an empty <span> whose dimensions come
entirely from Tailwind CSS. Without layout CSS the span is 0×0 and fails
the visibility check; toBeAttached() asserts DOM presence, which is the
right semantic here.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Three test files were written against the old API shape (raw `message` field) before
the statusCode i18n field was introduced, or used the wrong `expect` import path:
- ImportStatusCard.svelte.test.ts: `@vitest/browser/context` does not export `expect`
in this project's Vitest setup — use `vitest` like every other test file.
- page.svelte.spec.ts: FAILED mock lacked `statusCode`; assertion matched old German
raw message instead of the i18n string for IMPORT_FAILED_NO_SPREADSHEET.
- page.svelte.test.ts: same pattern — mock lacked `statusCode`; assertion checked for
raw backend string "database error" instead of the rendered i18n text.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Remove dead `message` field from both frontend ImportStatus types
(field is now @JsonIgnore'd on the backend)
- Extract failure message ternary into `$derived` — business logic off
the template (Felix)
- Add motion-reduce:animate-none to spinner — WCAG 2.1 SC 2.3.3 (Leonie)
- Replace text-green-600 with text-green-800 — WCAG AA contrast 6.1:1
on bg-green-50 (Leonie)
- Add min-h-[44px] to all three buttons — WCAG 2.2 44px touch target (Leonie)
- Add 6 missing tests: IMPORT_FAILED_INTERNAL path, IDLE state text,
null importStatus, ontrigger called on DONE/FAILED/IDLE buttons (Sara)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- @JsonIgnore on ImportStatus.message — stops internal directory paths and
raw exception text leaking through the admin import-status endpoint (CWE-209)
- Add importStatus_messageField_notPresentInApiResponse test (red/green verified)
- Add importStatus_returns401/403 auth boundary tests — documents and guards
the @RequirePermission(ADMIN) protection against configuration drift
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Extracts the mass-import block from +page.svelte into ImportStatusCard.svelte.
Changes per the three UX fixes from issue #533:
- RUNNING: animated spinner (animate-spin) + processed count at text-base;
auto-poll at 2 s was already in place
- DONE: processed count at text-base, label at text-xs uppercase tracking-widest
- FAILED: maps statusCode (IMPORT_FAILED_NO_SPREADSHEET / IMPORT_FAILED_INTERNAL)
to Paraglide messages — no raw German backend string rendered
Adds vitest-browser tests covering spinner visibility, count display,
and per-statusCode FAILED message selection.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replaces the {message} interpolation (raw German backend string) with
two distinct error keys: IMPORT_FAILED_NO_SPREADSHEET and
IMPORT_FAILED_INTERNAL. Also removes the {count} parameter from the
done message and adds admin_system_import_status_done_label so the
processed count can be rendered separately at text-base size.
All three locales (de / en / es) updated.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds a statusCode field (IMPORT_IDLE / IMPORT_RUNNING / IMPORT_DONE /
IMPORT_FAILED_NO_SPREADSHEET / IMPORT_FAILED_INTERNAL) to ImportStatus.
The frontend will map these codes to localized strings via Paraglide
instead of rendering the backend's German message verbatim.
NoSpreadsheetException distinguishes a missing spreadsheet from other
I/O failures so the frontend can show a specific error without raw text.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
testTimeout: 30_000 causes Vitest to fail a hanging browser test
within 30 s when Chromium crashes mid-load instead of silently
occupying the CI slot for 14+ min.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Captures all 102 test results independent of log verbosity.
if: always() ensures reports are available on failure — exactly
when they're needed most.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
forkedProcessTimeoutInSeconds=120 caps the JVM on catastrophic hangs.
junit.jupiter.execution.timeout.default=90s times out each hanging
JUnit 5 test individually, letting healthy tests continue — replaces
the deprecated <timeout> alias that conflicted with the JVM ceiling.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Set logging.level.root=WARN + logging.level.org.raddatz=INFO in
backend/src/test/resources/application.properties to keep the full
test run under Gitea's 1.4 MB log cap.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Use [key: string]: unknown index signature so TS does not reject the
extra fields (location, status) passed to the redirect/failure result
in the spec helpers.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Mirror the groups/new fix: replace inline beforeNavigate/isDirty with
createUnsavedWarning() + UnsavedWarningBanner and add an enhance callback
that calls clearOnSuccess() before update() on redirect results.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Use createUnsavedWarning() + UnsavedWarningBanner to replace the inline
beforeNavigate/isDirty pattern, and add an enhance callback that calls
clearOnSuccess() before update() so the guard is disarmed before
SvelteKit's internal goto() fires on a redirect result.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
When the node_modules cache hits, npm ci is skipped and the prepare
lifecycle (svelte-kit sync) never runs. frontend/tsconfig.json extends
.svelte-kit/tsconfig.json which only exists after svelte-kit sync —
so ESLint fails at tsconfig resolution on every cache-warm run.
Adding an unconditional svelte-kit sync step after Paraglide compile
and before Lint ensures .svelte-kit/tsconfig.json is always present
regardless of cache state.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replace && with ; in test:coverage so the client vitest run is not
short-circuited when the server run exits non-zero (e.g. threshold
violation or test failure). Without this the upload-artifact step
only ever sees coverage/server.
Also updates the stale CLAUDE.md comment that said server-only.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
CDP-based Playwright clicks (locator.click()) do not reliably trigger
Svelte 5 onclick handlers — documented in commit 0c765d81 which fixed
13 other specs. The layout dropdown tests were missed in that pass.
Applies the same pattern: ((await locator.element()) as HTMLElement).click()
for button interactions, and native KeyboardEvent dispatch for the Escape
test (dispatched on the button so it bubbles to the parent div's onkeydown).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Docker Compose interpolates all variables in the full file even when
only a subset of services is requested. The backend service uses
IMPORT_HOST_DIR with :? (hard-required), causing the idempotency job
to abort before any container starts. A dummy path satisfies the parser;
the backend service is never started in this job so the path need not exist.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The previous self-test proved the regex catches @v5 (positive case).
This adds a negative case proving @v3 is NOT flagged — guards against
a false-positive that would break every CI run permanently.
Suggested by Sara Holt in review of PR #558.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Lines 203, 230, and 332 carried comments that actively encouraged
the regression (they read as if v4 is the canonical target). Replaced
with the correct pinned-at-v3 comment referencing ADR-014.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Documents the three-incident history, the enforcement layers (inline
comments + grep guard + ADR), how to spot the symptom, and the explicit
upgrade trigger (act_runner v4 protocol support OR v3 CVE).
Cross-references ADR-011 (single-tenant Gitea runner) and #557.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Reverts the re-regression introduced in 410b91e2. Gitea Actions
(act_runner) does not implement the v4 artifact protocol — jobs report
failure even when all tests pass. Pins all three call sites back to @v3
and adds load-bearing inline comments pointing to ADR-014 / #557.
This commit makes the grep guard added in the previous commit GREEN.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds a repo-invariant check in the same 'Assert' block as the ADR-012
birpc guard. Anchored to YAML `uses:` lines so the inline self-test
fixture does not false-positive. Fails with an actionable error
referencing ADR-014 / #557.
Guard is intentionally RED at this commit — the three v4 call sites
are downgraded in the next commit.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Branches gate was blocking CI at 75% measured coverage. The 80% floor
suffers Istanbul parent/child denominator coupling (long-tail grind, per
#496) that makes the remaining gap disproportionately costly to close.
Drop branches to 75 to match current state; leave lines/functions/
statements at 80. ADR-013 documents the rationale and the ratchet rule
for raising the gate back incrementally.
Closes#556
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds a second binding invariant section to ADR-012 covering the
duplicate-id mechanism named in #553's follow-up investigation: same
resolved module URL referenced via two distinct vi.mock id strings →
@vitest/browser-playwright leaks an orphan Playwright route → birpc-closed
crash in the next session.
Records the rule (one canonical id per mocked module, prefer the spelling
production uses, no-extension for .svelte rune modules), the in-suite
detector (no-duplicate-mock-ids.test.ts), and the patch-package backport
of vitest PR #10267 with its removal trigger.
Extends the existing Consequences enforcement list from four layers to
six, adding the duplicate-id detector and the patch-package layer.
Refs: #553 · vitest-dev/vitest#9957 · vitest-dev/vitest#10267
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Installs patch-package (^8.0.0) and a postinstall script, then applies
the diff from vitest PR #10267 against @vitest/browser-playwright@4.1.0.
What the patch changes (in dist/index.js):
- createPredicate(sessionId, url) → createPredicate(url): factory becomes
pure, returns { url, predicate } instead of mutating sessionIds /
idPreficates as a side-effect.
- sessionIds value type: array → Set (deduplicates resolved URLs).
- register handler now looks up any existing predicate for the
(sessionId, resolvedUrl) pair and unroutes it BEFORE installing the
new route. This is the actual race fix: without it, the second
vi.mock for a duplicate-id leaks an orphan Playwright route that
fires after birpc closes.
- clear handler iterates the Set via spread.
Why this matters even though Layer 1 normalised the only known duplicate
in our suite: every future vi.mock call is a class of race we shouldn't
have to think about. The patch closes the upstream gap at the
route-handler level, so a contributor reintroducing the duplicate-id
pattern can't reopen the race.
When to remove: when @vitest/browser-playwright ships a release
containing PR #10267. Delete patches/@vitest+browser-playwright+4.1.0.patch
and the postinstall hook (or keep the hook if other patches accumulate).
Refs: #553 · vitest-dev/vitest#9957 · vitest-dev/vitest#10267
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Scans every src/**/*.svelte.{spec,test}.ts file for vi.mock first-arg
strings, canonicalises each by stripping a trailing .js/.ts after
.svelte, groups by canonical id, and fails if any canonical id is
referenced under two or more distinct raw spellings.
Mirrors the shape of src/__meta__/no-async-mock-factories.test.ts:
source-text regex scan (no AST parser dependency), red/green self-test
fixtures inline, then one corpus assertion that the whole suite is
clean.
This is the in-suite defence-in-depth layer for the duplicate-id birpc
race named in ADR-012 / #553 and fixed upstream by vitest PR #10267.
Harder to disable than ESLint (cross-file invariant ESLint cannot
express anyway) and harder to scope around than a CI grep.
Refs: #553
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Five test files mocked $lib/shared/services/confirm.svelte under BOTH
spellings (.svelte and .svelte.js) within the same file; two more mocked
only the .svelte.js form. Both resolve to the same module URL but register
two distinct Playwright route handlers in @vitest/browser-playwright. The
cleanup logic only removes one, leaving an orphan that fires when the next
session loads the module — crashing the run with
"[birpc] rpc is closed, cannot call resolveManualMock".
This is the exact trigger fixed upstream by vitest PR #10267 (issue #9957).
Normalise every confirm.svelte mock to the no-extension form, matching
production imports and the source file basename (confirm.svelte.ts).
After this commit: 8 confirm.svelte mocks across 8 spec files, all under
one canonical ID. A meta-test (next commit) prevents the duplicate-id
pattern from reappearing.
Refs: #553 · vitest-dev/vitest#9957 · vitest-dev/vitest#10267
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Production code referenced $lib/shared/services/confirm.svelte under two
spellings — 4 files with the .js extension and one without. Standardise on
the no-extension form to match Svelte 5 rune-module convention and the
source file basename (confirm.svelte.ts).
Why this matters: vitest browser mode's @vitest/browser-playwright resolves
both spellings to the same module URL but registers a separate Playwright
route per spelling. The route-cleanup logic only unregisters the latest,
leaving an orphan that crashes the next session with
"[birpc] rpc is closed, cannot call resolveManualMock". Fixed upstream in
vitest PR #10267 (merged, not yet released). Normalising the spelling
removes the trigger from our side.
Refs: #553. Companion test-file changes follow in the next commit.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Hover-prefetch has two surfaces in SvelteKit:
- data-sveltekit-preload-data (route loader data)
- data-sveltekit-preload-code (route JS chunks)
The original fix turned off only the loader-data side. Route-code chunks
prefetched on hover can also include manually-mocked module URLs; an
in-flight code prefetch landing after iframe teardown hits the same
Playwright route handler that resolves manual mocks, raising the
unhandled rejection. Disable both surfaces.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The previous revision allowed vi.mock for virtual modules on the "consumer
import is static" argument. #553 proved that argument wrong: a statically-
imported module with an async factory body whose dynamic import landed
after teardown still produced the race. The factory body — not the
consumer — is the failure surface.
- Drop the "residual exceptions" table.
- Add the binding invariant: factory bodies under `**/*.svelte.{test,spec}.ts`
must be synchronous (no `await`, no `import(...)`).
- Document the canonical vi.hoisted + getter pattern, with file references.
- Record the $app/stores → $app/state architectural call (Markus's
recommendation), removing one of the last two deprecated-import
outliers.
- Record the preload-data=off hardening (Tobias's recommendation) as a
pattern note.
- Update the Enforcement section to list all four defence layers (ESLint,
CI grep, in-suite meta-test, CI birpc assert) and the coverage-flake-
probe verification workflow.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Verification mechanism for the 20-run acceptance criterion of issue #553.
Triggered manually via workflow_dispatch, runs the full coverage suite 20×
in parallel against a single SHA, asserts zero `[birpc] rpc is closed`
lines in every cell.
One fire, parallel cost (~one main-job's wall-clock), deterministic signal
for the teardown race. Cheaper than 20 sequential push events and tests
the same property the AC names.
Closes the verification gap raised by Tobias and Elicit in the issue
discussion.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The pdfjs-dist literal grep added in 9260866f only caught one named
trigger of the birpc teardown race; the underlying mechanism (ADR 012 /
#553) is any async vi.mock factory whose body performs `await import(...)`.
Add a second PCRE-multiline grep matching that shape. Scoped to
**/*.{spec,test}.ts under frontend/src/, excluding __meta__ (which holds
the fixture strings exercising the meta-test). Defence in depth pairs with
the ESLint rule (saves at edit time) and the in-suite meta-test (catches
when tests run).
Verified locally with real GNU grep against a planted synthetic offender.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Generalise the no-restricted-syntax rule from the literal pdfjs-dist
selector (added in #535) to also catch the underlying mechanism named in
ADR-012 / #553: any `vi.mock(..., async () => { ... await import(...)
... })` produces a late birpc roundtrip during worker teardown.
Selector: vi.mock CallExpression whose second argument is an
ArrowFunctionExpression with async=true and whose subtree contains an
AwaitExpression > ImportExpression. Both rules coexist — the literal
pdfjs-dist rule still enforces the libLoader prop injection pattern
(catches sync forms too); the new rule enforces the sync-factory
invariant universally.
Demonstrated by planting a synthetic offender locally and watching
ESLint flag it with the new rule's message.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
In-suite belt-and-braces detector for the birpc teardown race named in
ADR-012 / #553. Catches `vi.mock(<arg>, async ... { ... await import(...)
... })` in any browser spec on every vitest invocation — the layer hardest
to disable or scope around (ESLint can be silenced; CI grep runs only in
CI; this test runs whenever the suite runs).
Demonstrated red→green by planting a synthetic offender locally and
watching the live-scan assertion fail; removing the offender returned it
to green.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Hover-prefetch fires real fetch requests for route loader chunks; those
requests go through the same Playwright route handler that serves mocked
modules. An in-flight prefetch landing after iframe teardown can hit the
handler with a closed birpc channel, raising an unhandled rejection that
exits the run with code 1 even when every individual test was green.
Add `src/test-setup.ts` that sets `document.body.dataset.sveltekitPreloadData
= 'off'` and wire it via `setupFiles` in both `vite.config.ts` (client
project) and `vitest.client-coverage.config.ts` (Istanbul coverage config).
Add `src/__meta__/browser-preload-disabled.svelte.test.ts` asserting the
setup ran. Zero production impact.
Issue #553 secondary trigger.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The async vi.mock factory in EnrichmentBlock.svelte.spec.ts performed an
`await import(...)` in its body — the same mechanism #535/#546 fixed for
pdfjs-dist. Issue #553: when Chromium's playwright route handler fetches
the mocked module after the worker's birpc channel has closed, the
factory's RPC roundtrip raises `[birpc] rpc is closed, cannot call
"resolveManualMock"` and the run exits 1.
Migrate EnrichmentBlock from the deprecated `$app/stores.navigating`
(store) to the modern `$app/state.navigating` (reactive proxy). The
spec uses vi.hoisted + a sync vi.mock factory with a getter that defers
the read — no dynamic import in the factory body. Delete the now-unused
__mocks__/navigatingStore.ts.
Fix path applied: $app/state migration (Markus's recommendation /
Felix's Path 2). See ADR-012.
Refs #553
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The role=link override on a <button> creates a WCAG 4.1.2 keyboard-contract
mismatch: ARIA role=link tells AT users "press Enter to activate (Space does
nothing)", but the native <button> responds to both Enter and Space. Removes
the override so the element is announced as "button" (accurate).
Test selectors updated from getByRole('link') to getByRole('button')
accordingly.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
SvelteKit's capture-phase link interceptor fires before the component's
onclick handler, so e.preventDefault() was structurally too late to stop
iframe navigation in vitest-browser. Replacing the <a href> with a
<button type="button"> removes the href entirely — the interceptor never
fires — and the existing goto() mock in tests is sufficient.
Also splits the single view-all test into two focused it() blocks and
clears mocks in afterEach to prevent cross-test mock leakage.
Fixes#551
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Extract makeFakePdfjsLib / makeFakeLibLoader to testHelpers.ts — single
source of truth used by both PdfViewer.svelte.test.ts and
usePdfRenderer.svelte.test.ts; removes the diverging-fidelity DRY violation
flagged by @felixbrandt and @saraholt in the PR review
- Add 'loadDocument sets error and loading=false when getDocument().promise
rejects' test to usePdfRenderer.svelte.test.ts — closes the error-path gap
flagged by @felixbrandt and @saraholt
- Replace toBeInTheDocument() with toBeVisible() in the three absorbed
spec-file tests — uniform assertion style across the loaded-state describe
block, as flagged by @felixbrandt
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Five tests in usePdfRenderer.svelte.test.ts called createPdfRenderer() without
a libLoader, causing init() to dynamically import pdfjs-dist in the browser.
Every dynamic import goes through Playwright's route handler, which calls
resolveManualMock via birpc to check for mocks. If the RPC closes during
teardown while one of these imports is in flight, the birpc race fires —
even though pdfjs-dist was never explicitly vi.mock()-ed.
Replace all bare createPdfRenderer() calls that invoke init() with
createPdfRenderer(makeFakeLibLoader()), identical to the pattern already
used in PdfViewer.svelte.test.ts. No real module loads, no route-handler
calls, no birpc exposure.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds a static grep step that runs after Lint and before the test suite.
Fails in ~1 s if any file under frontend/src/ contains the banned
vi.mock('pdfjs-dist' pattern, catching the regression before Playwright
spins up. Belt-and-suspenders with the ESLint rule (ADR 012).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds a no-restricted-syntax rule scoped to *.spec.ts / *.test.ts that
flags any vi.mock call whose first argument starts with 'pdfjs-dist'.
Turns the ~2-min CI wait into an immediate lint error on save.
Updates ADR 012 Enforcement section to document the rule.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Absorbs the three tests from PdfViewer.svelte.spec.ts (nav buttons, zoom
controls, page counter) into the loaded-state describe in test.ts, then
deletes the now-empty spec file. One spec file per component.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Removes both vi.mock('pdfjs-dist', …) calls that caused the birpc teardown
race (ADR 012). Replaces with static import + makeFakeLibLoader() helper
injected via the libLoader prop on every render() call.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Two root causes:
1. In-flight test: resolveFetch() was the last line, leaving the async
finally-block writing `training = false` after cleanup destroyed the
component. Awaiting the button becoming re-enabled ensures the finally
block settles before cleanup runs.
2. Success-dismiss test: startTraining() schedules setTimeout(5000) which
fired after cleanup destroyed the component. vi.useFakeTimers() +
vi.runAllTimers() scoped to the describe block drains the timer while
the component is still alive.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Svelte defers DOM updates to microtasks; .query() is a synchronous
snapshot that can fire before the element disappears — making the
absence assertions in AnnotationShape and AnnotationLayer non-deterministic.
Sweeps all 4 instances across both spec files (Sara's ≤5 threshold).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Explicitly states no lint rule is planned; CI guard is the backstop
(addresses Elicit OQ-001 from PR #536 round 4)
- Adds a "when to revisit" note: extract shared DynamicImportLoader<T>
if 3+ components adopt the libLoader pattern
(addresses Markus Keller round-4 observation on PR #536)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds a comment above the assertion step so a future developer diagnosing
a birpc-related failure in `npm test` knows where to find the diagnostic.
Addresses Sara Holt + Tobias Wendt round-4 observation on PR #536.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Prototype-style assignment was a vi.mock hoisting artifact from the old
version of the file. Rest of the codebase uses class syntax — aligning.
Addresses Felix Brandt round-4 suggestion on PR #536.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds idempotency test: calling init() twice must invoke libLoader only once.
Adds `if (pdfjsReady) return;` guard to satisfy the contract.
Addresses Felix Brandt round-4 suggestion on PR #536.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
-F (fixed string) matches the literal pattern [birpc] rpc is closed
without relying on BRE bracket escaping, making the intent explicit
and immune to accidental regex interpretation.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The birpc guard step writes to /tmp/coverage-test-<run_id>.log and exits 1
when a race is detected. Without this file in the artifact, the evidence
disappears when the runner tears down — only the exit code remained visible.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
.catch(()=>{}) swallowed the rejection, so the test passed vacuously even
if a future refactor silently caught the error. rejects.toThrow() proves
the propagation contract holds before asserting pdfjsReady stays false.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add explicit set -eo pipefail so npm test:coverage exit code
propagates through the pipe (not just tee's always-0 exit)
- Scope log file to github.run_id to prevent stale-log false positives
on retried steps sharing the same runner /tmp
- Tighten grep pattern to \[birpc\] rpc is closed to avoid matching
unrelated log lines that happen to contain "rpc is closed"
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Regression-protection test: init() propagates the loader rejection
before pdfjsReady is set, so the renderer stays in a safe unready state.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Without untrack, a reactive libLoader prop reference change would
reinitialise the whole renderer and lose all loaded state.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Exporting LibLoader gives the type a stable, named identity.
PdfViewer.svelte and PdfViewer.svelte.spec.ts now import it directly
instead of using Parameters<typeof createPdfRenderer>[0].
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- removes unreachable `; exit ${PIPESTATUS[0]}` — already covered by pipefail (Tobias)
- adds explicit `shell: bash` to both new steps for clarity (Tobias)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- import PdfViewer left mid-file from vi.mock hoisting — no longer needed (Sara/Felix)
- adds one-line comment explaining as unknown as cast is an intentional partial fake (Felix)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Documents why vi.mock(module, factory) races with birpc teardown for
dynamically-imported modules, the libLoader injection pattern used to fix
#535, and the residual exceptions ($app/*, $env/*) that are safe to keep
as vi.mock because they are resolved statically before any test runs.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Captures npm run test:coverage output with tee and adds an always-run step
that greps for the teardown-race fingerprint. Any future regression where a
vi.mock factory races with birpc teardown will now surface as an explicit CI
failure rather than a silent exit-1 after all tests report green (#535).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Removes both vi.mock('pdfjs-dist', factory) and
vi.mock('pdfjs-dist/build/pdf.worker.min.mjs?url', factory) from
PdfViewer.svelte.spec.ts — the ManualMockedModule registrations that were
racing with vitest-browser-playwright's birpc teardown channel.
PdfViewer.svelte now accepts an optional libLoader prop (typed as
Parameters<typeof createPdfRenderer>[0]) that is passed untracked to
createPdfRenderer(). Tests supply a vi.fn() fake loader directly as a prop;
production code uses the default loader that imports the real pdfjs-dist.
The birpc route handler for pdfjs-dist is never registered, so no teardown
race is possible. Fixes#535.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds an optional LibLoader parameter (defaults to the real pdfjs-dist dynamic
imports) and a failing test that verified the loader is called during init().
This is the first step toward removing ManualMockedModule registrations that
race with vitest-browser-playwright's birpc teardown (#535).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
`ip route` (iproute2) is not installed in the Gitea runner container,
causing the smoke test step to exit 127. /proc/net/route is a kernel
virtual file that is always present on Linux; awk decodes the
little-endian hex gateway field to dotted-decimal without any external
binary dependency.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Unquoted variable expansion is safe here since the value contains
no spaces or glob characters, but quoting is the correct default
and keeps the script consistent with surrounding style.
Addresses review suggestion by Felix Brandt and Tobias Wendt.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
If `ip route show default` returns no output the old code passed
an empty string to curl --resolve, producing a confusing error 6
("couldn't resolve host") with no indication that gateway detection
had failed. The new guard exits immediately with a clear message.
Addresses review concern raised by Tobias Wendt.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Job containers run in bridge network mode (runner-config.yaml). Inside
a bridge-networked container 127.0.0.1 is the container's own loopback;
Caddy on the host is unreachable there, causing an immediate ECONNREFUSED.
Use the Docker bridge gateway IP instead — the host's docker0 interface
where Caddy (bound on 0.0.0.0:443) is reachable from the container.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds a packageRule matching .gitea/workflows/** digest updates with
automerge: false. Digest bumps for images running --privileged --pid=host
have root-equivalent host access and must not be auto-merged.
Addresses Nora's review concern on #537.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Covers the three failure modes Sara flagged: Caddy stopped (explicit
systemctl error), symlink missing/mis-pointed (silent reload, stale
smoke test), and Docker socket / nsenter unavailable (container error).
Each failure mode includes symptoms and recovery steps.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replace the stale generic runner provisioning docs with an accurate
description of the actual two-container setup on the Hetzner VPS.
Document the nsenter pattern for running host-level commands (systemctl)
from containerised CI steps, and the Caddyfile symlink contract that the
reload step depends on.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Same gap as nightly.yml: production deploys also need Caddy to reload
the updated Caddyfile before the smoke test validates the public surface.
Uses the same nsenter pattern introduced in the previous commit.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
`sudo systemctl reload caddy` does not work from inside a DooD job
container: `systemctl` is absent from Ubuntu container images and
container processes cannot reach the host systemd without entering its
namespaces. Replace with `docker run --privileged --pid=host ubuntu:22.04
nsenter -t 1 -m -u -n -p -i -- /bin/systemctl reload caddy`, which uses
the already-mounted Docker socket to spin up a privileged sibling
container that enters the host PID namespace via nsenter. Tested live on
the Hetzner VPS. No sudoers entry required.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds a `sudo systemctl reload caddy` step between the docker compose
deploy and the smoke test. This ensures any committed Caddyfile changes
are applied before the public surface is verified.
Previously the workflow had no mechanism to push Caddyfile changes to
the running host daemon. A Caddyfile edit would land in the repo but
Caddy would keep serving the previous config, causing the smoke test to
catch a stale header or still-proxied /actuator route rather than the
intended current config.
This step also surfaces the root cause of today's port-443 failure
explicitly: if Caddy is not running, the step fails with a clear service
error rather than a misleading "Failed to connect to port 443" from curl.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Also replaces a vacuous expect(true).toBe(true) with a real behavioral
assertion that both block texts remain rendered after rerender.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
waitForSource() helper polls for the EventSource constructor effect
to register the mock; assertion blocks use vi.waitFor on the progress
bar / heading / button changes after each SSE event dispatch.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Replaces 15 setTimeout sleeps with vi.waitFor on the actual signal
(fetch URL recorded, banner appears, status text rendered) and
switches the default fetch mock from mockResolvedValue to
mockImplementation so each call yields a fresh Response — no more
"body stream already read" unhandled rejections.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Replaces 16 setTimeout(350ms / 30ms / 50ms) sleeps with vi.waitFor on
the actual signal — popup listbox appearance/disappearance, option
aria-selected state — so the test no longer races the 200ms internal
debounce against the real clock under CI load.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Replaces the vacuous expect(true).toBe(true) sleep test with a real
flyout-open assertion (role=dialog appears after trigger click) and
turns the Escape-keydown smoke test into a full open→Escape→closed
behavioral test. Routes the Escape event through document (matches
the svelte:document binding) instead of window.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Replaces 2 setTimeout-based wait() helpers with vi.useFakeTimers() +
vi.advanceTimersByTimeAsync() so the polling-loop tests no longer
race against the real clock under CI load — they instead deterministically
advance the setInterval by the exact poll interval and let microtasks
flush. Also converts the destroy() .not.toThrow smoke into a direct
expect(job.destroy()).toBeUndefined() check.
Per Sara: polling-loop tests are the legitimate case for fake timers
(time progression matters) — exactly the pattern she requested.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Replaces 3 setTimeout sleeps with vi.waitFor on document.activeElement
during keyboard nav, and converts 2 .not.toThrow smoke tests on the
prev/next buttons into no-op assertions: with a single file in the
strip the active chip stays selected and onSelect is not invoked.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Replaces 3 setTimeout sleeps with vi.waitFor on listbox / aria-expanded
state and converts 2 .not.toThrow smoke tests + 1 vacuous expect(true)
into assertions about the input remaining usable after fetch errors
and Escape on a closed dropdown being a no-op.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Replaces 8 setTimeout sleeps with vi.waitFor on the actual signal
(textarea value, fetch URL recorded, onCountChange call) and converts
3 .not.toThrow smoke tests into behavioural assertions:
- "no onCountChange wired" → asserts initial comment text still renders
- "network error during reload" → asserts empty-hint state is shown
- "non-OK reload" → asserts empty-hint state is shown
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Replaces 5 setTimeout sleeps with vi.waitFor on the actual class
transition, and converts 6 .not.toThrow smoke tests into assertions
that the validation guard surfaces the expected error message (or
absence thereof). Tightens the dragging-state regex to bg-accent-bg
so it cannot match the idle hover:border-primary substring.
Runtime: faster + deterministic.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Replaces 6 setTimeout sleeps with vi.waitFor and expect.element
auto-wait, and converts 9 .not.toThrow smoke tests into assertions
on the rendered PDF nav controls (Zurück/Weiter/Vergrößern/Verkleinern)
and the conditional outdated-annotation notice / annotation visibility
toggle. transcribeMode test now mocks the annotations fetch so the
toggle button is actually rendered (annotationCount > 0 guard).
Runtime: 33s → 4.5s.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Replaces 3 setTimeout sleeps with click + auto-wait / vi.waitFor on
the bulk-edit-all flow, and converts 14 .not.toThrow smoke tests into
behavioral assertions:
- Advanced-filter labels (Schlagworte/Absender/Empfänger/Von/Bis) for
every hasAdvancedFilters() branch (senderId, from, to, tags)
- Collapsed advanced section when all filters are at falsy defaults
- Search input value reflected via two-way binding
- BulkSelectionBar surfaces count when store has entries
- bulk-edit-all populates selection store on success
Runtime: 48s → 3.8s. Addresses Sara's blockers on PR #505.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Replaces 13 setTimeout sleeps with vi.waitFor and expect.element
auto-wait, and converts 17 .not.toThrow smoke tests into behavioral
assertions that verify what each scenario actually exposes:
- topbar mount + svelte:head title for prop pass-through cases
- Edit anchor surfaced when canWrite=true
- Details drawer open + sender displayName visible for sender data
- panel-close testid for transcribe-mode entry
- OCR progress heading 'OCR läuft' for RUNNING + jobId
- OCR spinner absent for 500 / DONE / PENDING-without-jobId / network-error
Runtime: 34s → 3.5s, no sleeps. Addresses Sara's "118 setTimeout" and
"74 .not.toThrow" blockers on PR #505.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Fixes Sara's .spec.ts outlier concern on PR #505 — every other new
test file in the coverage push uses .svelte.test.ts.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The test:coverage step runs the full suite under Istanbul; running
`npm test` first executes every test twice for no extra signal.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Pulls the transcription-block state (load, save, delete, reviewToggle,
markAllReviewed, createFromDraw, toggleTrainingLabel, deleteAnnotation
+ derived blockNumbers / hasBlocks / lastEditedAt / annotationReloadKey)
out of documents/[id]/+page.svelte into a reusable factory in
lib/document/transcription/useTranscriptionBlocks.svelte.ts.
The page now reads transcription.blocks / .blockNumbers / .hasBlocks /
.lastEditedAt / .annotationReloadKey reactively and delegates writes
to transcription.{load, save, delete, reviewToggle, markAllReviewed,
createFromDraw, toggleTrainingLabel, deleteAnnotation,
findByAnnotationId, bumpAnnotationReloadKey}. The confirm-then-delete
dialog stays in the page; the hook only handles the data ops.
24 unit tests cover initial state, load (success / non-OK / network /
empty-id), derived state (blockNumbers in sortOrder, lastEditedAt
recent-pick, lastEditedAt-null fallback), delete (success bumps key /
non-OK throws), reviewToggle (success updates / non-OK no-op), markAll
(success / non-OK), createFromDraw (success / non-OK / network all
return correct shape), toggleTrainingLabel (200 / 500), deleteAnnotation
(linked-block path / orphan-annotation path / orphan-fail throw),
findByAnnotationId match + miss, bumpAnnotationReloadKey.
Also bumps the polling-loop test waits in useOcrJob.svelte.test.ts to
150-200ms (from 60-80ms) so the suite is reliable when run in parallel.
Refs #496.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Pulls the trigger/poll/check-status state out of documents/[id]/+page.svelte
into a pure factory in lib/ocr/useOcrJob.svelte.ts that takes documentId,
fetchImpl, and onJobFinished callback as injected dependencies.
The page now delegates to ocrJob.triggerOcr / ocrJob.checkStatus /
ocrJob.destroy and reads ocrJob.running / .progressMessage / .errorMessage /
.skippedPages reactively.
Test discipline reset: 22 unit tests cover initial state, triggerOcr 200/
4xx-with-code/4xx-without-code/5xx/network-error paths, useExistingAnnotations
flag round-trip, checkStatus PENDING/RUNNING/DONE/no-jobId/empty-id/5xx/network
paths, polling progressMessage / skippedPages updates, DONE/FAILED → onJobFinished
callback, polling-error swallow, and destroy mid-poll cleanup.
Refs #496.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
renderCurrentPage early-returns when canvasEl/textLayerEl null,
init() idempotent on second call, zoomIn after floor, goToPage(1)
no-op.
5 new tests covering ~6 branches.
Refs #496.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Whitespace-only quotedText not seeded, no onCountChange not provided,
fetch network error during reload, non-OK reload response, own
comment with edit/delete affordances.
5 new tests covering ~10 branches.
Refs #496.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
authorName email fallback when no first/last names, undefined-author
empty result, publishedAt missing, body empty no-excerpt, single
person filter render-without-throw.
5 new tests covering ~10 branches.
Refs #496.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
use:enhance vs callback form variant rendering, self-relation
error, submit disabled on missing related person, submit disabled
on yearError.
5 new tests covering ~10 branches.
Refs #496.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
ArrowUp wrap-around, Escape close, Enter without selection no-op,
keydown without dropdown no-throw, Enter with active selection
selects, excludeIds filter works, parentId fallback as subtitle.
7 new tests covering ~12 branches.
Refs #496.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds selected-node primary fill, birth/death year combinations,
node click and Enter/Space/other-key handling, dashed/solid spouse
line, single-parent connector, focus ring on focus + blur, aria
labels and aria-expanded reflection, accent stripe on selected node.
13 new tests covering ~30 branches in the node-render path.
Refs #496.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Status color paths (exhausted/expired/revoked), new-invite form
toggle, loadError banner.
5 new tests covering ~10 branches.
Refs #496.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Color dot hidden at depth>0 and when color is null, document count
badge omitted at 0, toggle click mutates collapseMap.
4 new tests covering ~6 branches.
Refs #496.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds single-word name (one-initial) and leading-space edge cases
for the initials function.
2 new tests covering ~4 branches.
Refs #496.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds banner-hidden defaults (success/error), empty groups list,
groups field undefined fallback to [].
4 new tests covering ~6 branches.
Refs #496.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds unsaved-warning hidden by default, oninput dirty marker, form
error banner hidden when form is undefined.
3 new tests covering ~6 branches.
Refs #496.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds two tests that pass all filter props as truthy and as falsy
defaults, covering the seed-from-data-or-default branches.
2 new tests covering ~14 branches (all data.X || '' chains).
Refs #496.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds receiver-focus triggers correspondents fetch, advanced-filter
chevron rotation in both states.
3 new tests covering ~6 branches.
Refs #496.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds fake-timer tests for morning (h<12), day (12<=h<18), and
evening (h>=18) branches plus the empty-firstName fallback.
4 new tests covering the greeting time-of-day branches.
Refs #496.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sender/receivers populated, filePath set, full user object,
Escape vs other keys keydown handler, deep-link comment query.
6 new tests targeting ~14 branches.
Refs #496.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds backfill-versions and backfill-file-hashes click handlers,
verifies initial fetch hits import-status and thumbnail-status.
3 new tests targeting ~10 branches in the page component.
Refs #496.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Empty state when url is empty (no controls, placeholder shown),
loaded state with controls, annotationsDimmed branch, transcribeMode
flag, documentFileHash filtering branch.
6 tests covering ~10 branches.
Refs #496.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-correspondents derived from received-document senders, self-skip
branch when sender == current person, GeschichtenCard rendered when
geschichten array is non-empty, 5-entry cap on co-correspondents.
4 new tests covering ~10 branches.
Refs #496.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Two persontypeaheads + two date inputs, swap button visible/invisible
based on both persons set, sort label DESC vs ASC, chevron rotation,
onapplyFilters / ontoggleSort / onswapPersons callbacks fire.
11 tests covering ~20 branches.
Refs #496.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds direct-relationship sorting, yearRange formatting (both years,
only fromYear), inferred-relationships disclosure rendering, 5-item
cap on derived relationships.
5 new tests targeting ~15 branches.
Refs #496.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds password show/hide toggle (independent for both fields), pwHint
visible after typing, pwValid green hint for 8+ chars, pwMismatch
red hint, pwMatch green hint, form.error rendering, notifyOnMention
checkbox toggle.
7 new tests targeting ~25 branches in the register flow.
Refs #496.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Textarea props (placeholder, rows, disabled), popup not shown
initially, popup opens on @ + query, empty results from API,
HTTP error → empty popup, Enter submits when popup closed,
Shift+Enter does not submit, Escape closes popup, Arrow{Up,Down}
navigation, Enter with no results.
12 tests covering ~30 branches in MentionEditor.
Refs #496.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds drag-over and drag-leave styling, drop with no files, multiple
invalid files, mixed valid+invalid files, non-Enter keydown ignore,
window-level dragenter/dragleave with and without 'Files' types,
counter underflow guard.
16 tests, +9 covered branches.
Refs #496.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds keyboard navigation (Arrow{Up,Down,Left,Right}, shiftKey step,
non-arrow no-op, edge clamping at all four sides), pointer drag
flows (move-area + each of the 8 handles), early-return branches
for non-primary pointers and pointer events without active drag.
28 tests, +20 covered branches over previous 7-test version.
Refs #496.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Three variants (first-run, filter-empty, inbox-zero), title vs body
visibility, data-variant attribute, accent vs ink-3 icon coloring.
5 tests, ~15 branches.
Refs #496.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Radiogroup with label, all five filter pills, aria-checked for active
filter, tabindex matrix (0 active vs -1 inactive), onChange callback
when clicked.
5 tests, ~15 branches.
Refs #496.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Empty state (default + term-specific), error banner, year groups
default sort, sender-group sort, undated/unknown-sender labels, total
count display. Mocks $app/navigation since the empty-state CTA calls
goto.
8 tests covering ~30 of DocumentList's branches.
Refs #496.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Empty placeholder, all four status pill branches (QUEUED/DONE/FAILED/
RUNNING), error-detail disclosure on FAILED, Personalisiert vs Basis
type label, COLLAPSED_COUNT visible runs, person columns visibility
toggle, em-dash CER fallback.
11 tests covering ~25 of TrainingHistory's branches.
Refs #496.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Empty list early return, heading + per-doc row rendering, title link
href, date visibility tied to updatedAt, stats footnote presence
toggled by stats.totalDocuments.
7 tests covering ~16 of the dashboard section's branches.
Refs #496.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Empty list early return, populated section, write-action link gated on
canWrite, visible-cap of 3, footer show-all link visibility based on
overflow, author name vs email fallback.
9 tests covering ~25 of GeschichtenCard's branches.
Refs #496.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Hidden when totalPages <= 1, prev/next disabled state matrix at
boundaries, link form when in range, aria-current for active page,
mobile page label, left ellipsis / right ellipsis branches based on
window position, custom ariaLabel.
11 tests covering ~30 of Pagination's branches.
Refs #496.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Avatar with initials vs question-mark fallback, for-you marker
visibility, data-variant matrix (simple/for-you/rollup/comment),
count badge for rollup, comment preview rendering with fallback,
document title link, default vs comment-deep-link href, time-range
label for rollup with happenedAtUntil.
11 tests covering ~40 of ChronikRow's high-uncovered branches.
Refs #496.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Title rendering with originalFilename fallback, sender vs unknown
placeholder, tag buttons per document tag, bulk-select checkbox gated
on canWrite, archive chips visibility, snippet/summary visibility,
em-dash for missing date.
11 tests covering ~30 of the row's branches.
Refs #496.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Toggle button, form open on click, all relationship type options,
year-error alert when toYear < fromYear, no-error path when equal,
cancel button closes form, onSubmit prop wiring.
7 tests covering ~20 branches.
Refs #496.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Prev/next nav buttons, chip count per file, aria-current matrix for
active id, error-state data attribute, onSelect callback, onRemove
callback, sr-only announcer for active title.
7 tests covering ~25 branches.
Refs #496.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Drop hint + accepted types render, default no-progress state, invalid
MIME-type rejection, valid PDF acceptance, no-files early return,
click + Enter open the file input, multi-file accept whitelist
attributes.
8 tests covering ~25 of DropZone's 46 branches.
Refs #496.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Heading with tag name, name input hydration, color picker visible only
for top-level tags, color swatch grid (10 entries), aria-pressed for
active color, success banner branch, error banner branch, merge-success
banner branch.
8 tests covering ~30 branches in the tag-edit page.
Refs #496.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Mounts the page, renders the orchestrator, exposes the hidden skip-form,
and renders the three submit-action buttons (skip, save, save+review).
4 tests covering the orchestration entry path of enrich/[id].
Refs #496.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Renders the document edit page with mocked confirm service. Verifies
DocumentEditLayout mounts, both hidden submit-target forms (review and
delete) exist, and the delete button is present in the action bar.
3 tests covering the orchestration entry path of documents/[id]/edit.
Refs #496.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Mounts the aktivitaeten page with mocks for the notification SSE
singleton (init/destroy/markRead/markAllRead) and $app/state. Verifies
heading renders, error state renders main element, empty state renders
main, and a non-default filter renders without crashing.
4 tests covering the orchestration entry path.
Refs #496.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Mounts the page with mocked $app/state, $app/navigation, and confirm
service. Verifies the top bar renders, the viewer container exists, and
the last-visited localStorage write happens onMount.
3 tests covering the orchestration entry path of the 558-line
documents/[id]/+page.svelte.
Refs #496.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
admin/ocr index: heading, sender-models heading, global-history link,
defensive defaults for missing trainingInfo fields.
admin/ocr/[personId]: person name from personNames lookup, Unknown
fallback when not found, back-link href, missing-personNames defensive
handling.
8 tests across two pages.
Refs #496.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Three admin/index pages (groups/tags/users) — each renders a single
"Wähle X aus der Liste" prompt for the desktop split-view layout.
AuthHeader: brand link href + wordmark.
PersonsEmptyState: empty heading + explanation text.
6 tests across five small files.
Refs #496.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Listbox label, empty-state placeholder, create-new escape hatch with
noopener target, populated list, default aria-selected on first item,
life-date range visibility, position fallback when clientRect is null,
positioning from clientRect.
8 tests covering ~25 branches.
Refs #496.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Tag link href, document-count visibility branch, color-dot at depth 0
vs deeper, aria-current matrix, children list rendering, collapse-map
hides children, expand/collapse toggle for nodes with children.
9 tests covering ~30 branches in the recursive tree-node component.
Refs #496.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Backfill cards rendered, both backfill buttons enabled by default,
no success banner before any action. Smoke-level coverage of the
admin maintenance page.
5 tests covering basic render branches.
Refs #496.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Year divider rendering, distinct-year branch, no-duplicate consecutive
years, no-divider for documents without documentDate, canWrite-gated
new-document link with senderId-only and senderId+receiverId href
variants.
7 tests covering ~20 branches.
Refs #496.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
All-sections render when full permissions, users/invites hidden when
!canManageUsers, groups hidden when !canManagePermissions, tags hidden
when !canManageTags, system/ocr hidden when !canRunMaintenance,
flyout closed by default.
6 tests covering ~30 branches in the permission matrix.
Refs #496.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Hero state when no senderId set, results card when senderId set,
SinglePersonHintBar gating on senderId × !receiverId, empty-results
message branch.
5 tests covering ~15 branches in the orchestrator.
Refs #496.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
One bar per filled bucket, singular vs plural aria-label, aria-pressed
matrix, drag-window visibility tied to isDragging, onbarclick callback,
minimum-height handling for zero-count buckets.
8 tests covering ~25 branches.
Refs #496.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Dialog with bell label, empty state vs populated list, mark-all-read
visibility branch, REPLY vs MENTION text, unread-dot rendering, all
three callback wirings (onMarkRead, onMarkAllRead, onClose).
10 tests covering the notification dropdown surface.
Refs #496.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
register page (350 lines): hero render when no codeError, NO_INVITE_CODE
vs other-codeError card branches, form hidden when codeError set,
back-to-login link, form section rendering, prefill hydration of
firstName/lastName/email, prefill-hint visibility branch, hidden
code input with code-null fallback.
admin/users/new: heading, three card sections, group checkboxes
rendered, form-error banner branch, cancel link, submit button.
17 tests across two pages.
Refs #496.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Brand link, four primary nav links, admin link gated on isAdmin,
hamburger menu open/close state via aria-expanded. Mocks $app/state
so the page URL drives the active-route highlighting.
6 tests, ~30 branches.
Refs #496.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Title, all four section headings, secure Wikipedia link rel
attributes, five rule cards rendered, four klaerung chips rendered.
7 tests covering the static help page.
Refs #496.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Heading with email, three card sections (profile/groups/password),
success vs error form banners, group preselection from editUser.groups,
cancel link, delete button. Mocks the confirm service.
7 tests, ~25 branches.
Refs #496.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
TimelineControls: empty render when neither flag is set, reset button
gated on isZoomed, clear button gated on hasSelection, both-on, both
callback wirings.
TimelineXAxis: empty filled → no ticks, populated → ticks render,
omit-year branch when all buckets share a year, show-year branch
across multiple years, length-4 bucket-string fallback.
11 tests across two timeline primitives.
Refs #496.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
UserPasswordSection: input rendering, type=password attribute,
required-prop propagation in both directions.
CorrespondenzFilterControls: dual date label rendering, both DateInput
ids, value hydration from fromDate/toDate, change-event smoke check.
Refs #496.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Four tests: discard link href, save button label, form attribute
wiring, formaction. Small focused component.
Refs #496.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
enrich/done: heading, body, both CTA links.
documents/bulk-edit: empty-store onMount redirect to /documents,
loading spinner during in-flight fetch, error banner on backend error
code, error banner on fetch rejection. Mocks fetch via vi.spyOn so the
async branches are exercised without a real backend.
Refs #496.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
BackButton: visible vs aria-only label branches, custom class
application, history.back() click handler.
OverflowPillButton: +N pill render, aria-expanded matrix
(closed default → open after click), per-person link rendering with
correct href, Escape closes the dropdown.
Both are reused widely; their coverage closes the line and function gap
left after the DocumentTopBar split inflated the denominator.
Refs #496.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sixteen tests covering the four-column drawer: details column always
renders, persons column branches (no-persons placeholder vs sender
vs receivers), receiver overflow + show-all toggle, tags column
branches (placeholder vs anchor list with /?tag href encoding),
geschichten column visibility (hidden by default, shown for
canBlogWrite, attach link gated on canBlogWrite + documentId, list
rendering, show-all overflow), inferred-relationship pill on the
single-receiver branch.
Refs #496.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
admin/groups/new: heading, both permission group renderings (4 standard
+ 4 administrative checkboxes), form-error banner branch, cancel link
href, submit button form-attribute wiring, name input requiredness.
Mocks $app/navigation so beforeNavigate doesn't crash the test runner.
enrich/+: heading, empty placeholder vs populated count + start CTA,
start CTA href derived from documents[0].id, per-row title rendering,
bulk-select checkbox gated on canWrite.
16 tests across two files.
Refs #496.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
users/[id]: full-name derivation across all four branches
(both/firstName-only/lastName-only/email fallback), avatar initials
matrix, email/contact row visibility tied to data presence.
admin/ocr/global: heading + back link, runs prop pass-through,
defensive default for missing history fields.
geschichten/[id]: title rendering, author full-name vs email fallback
vs null, publishedAt suffix conditional, persons and documents sections
gated on array length, edit/delete actions gated on canBlogWrite. Mocks
the confirm service since it requires a ConfirmDialog mounted in layout.
26 tests across three files.
Refs #496.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
PersonEditForm: PERSON vs INSTITUTION/GROUP visibility matrix (firstName,
title, alias, birth/deathYear toggle), lastName label switch, prop
hydration of all populated fields, fallback to PERSON for unknown type,
empty-string handling for null fields. 10 tests, ~30 branches.
SegmentationTrainingCard: trainingInfo null vs populated, block count
display, button disabled-state matrix (training × tooFewBlocks ×
serviceDown), too-few-blocks and service-down hints, success message
after a mocked fetch, training history heading. 10 tests, ~25 branches.
Refs #496.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Third Phase 5 split. The desktop action buttons — transcribe,
transcribe-stop, edit link, download link — become their own component
with a focused props interface (documentId, canWrite, isPdf,
transcribeMode bindable, filePath, originalFilename, fileUrl).
TDD: 8 tests covering empty render, transcribe button gating
(canWrite × isPdf × transcribeMode), stop-transcribe rendering, edit
link with documentId href, download link with filePath gating, all
hidden when in transcribe mode. After the test was red the component
was created.
DocumentTopBar dropped from 303 lines to 166. The orchestrator now
just composes BackButton, DocumentTopBarTitle, PersonChipRow,
OverflowPillButton, the details toggle, DocumentTopBarActions,
DocumentMobileMenu, and DocumentMetadataDrawer — each visual region
named in one or two words.
Refs #496.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Second step of the Phase 5 split. The kebab dropdown — including
clickOutside handling and its own mobileMenuOpen state — becomes its
own component named after its visual region. The mobile snippet
duplication inside DocumentTopBar is removed; the component owns its
mobile-specific markup.
TDD: DocumentMobileMenu.svelte.test.ts (7 tests) was red first. The
component then made it green (kebab trigger, dropdown open/close on
click, transcribe button gated on canWrite × isPdf × !transcribeMode,
download link gated on filePath). DocumentTopBar wraps the new
component in a md:hidden div so responsive behaviour is unchanged.
Existing 18-test DocumentTopBar suite still passes.
Refs #496.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
First step of the Phase 5 split plan from issue #496. The 14-line title
+ date block becomes its own component named after the visual region.
TDD red/green: DocumentTopBarTitle.svelte.test.ts written first
(7 tests covering title, originalFilename fallback, empty-string
fallback, short-date rendering, no-date branch, title attribute
sourcing). After the test was red the component was created.
DocumentTopBar.svelte updated to use it; the existing 18-test suite
still passes.
Refs #496.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Eighteen tests covering the user-observable matrix without yet splitting
the component (Phase 5 of the plan): title vs originalFilename fallback,
short-date rendering and absence, transcribe-button gating
(canWrite × isPdf × transcribeMode), edit-link gating, download-link
gating on filePath, kebab-menu visibility on (canWrite & isPdf) || filePath,
details drawer toggle, mobile menu open/close.
The 83 raw branches in the source map mostly to combinations of the
above flags — each test isolates one branch. Per Sara's guidance the
test names read as sentences and verify what the user sees, not internal
state.
Refs #496.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Each status (active / exhausted / revoked / expired) maps to a distinct
visual treatment via statusColor() — one focused test per branch
asserts the correct background class on a tbody element so the test
verifies user-observable behaviour rather than the internal switch.
Also covers: empty placeholder, loadError banner, filter chip
selection state, new-invite form toggle on button click, createError
message visibility inside the open form, created-invite success card
with shareable URL, revoke button gating to active invites only,
unlimited-uses display, no-expiry display.
16 tests, ~50 branches covered.
Refs #496.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
empty state vs. populated, zoom controls visibility tied to node count,
URL ?focus= preselection (matching id selects, missing id does not),
zoom-out clamping safety. $app/state mocked at module boundary so the
test can drive page.url and page.data.canWrite without a SvelteKit
runtime.
Six tests focused on user-observable behaviour — one logical behaviour
per test (Sara's guidance).
Refs #496.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
DocumentViewer: loading / error / no-scan / image rendering branches.
filePath conditionally drives the direct-download link in the error
state; fileUrl + non-PDF contentType drives the <img> render.
PersonalInfoForm: default render, prop hydration including the German
date conversion path, success/error banner branches, form action wiring.
profile/+page: notification-checkbox enabled/disabled depending on
hasEmail, no-email hint visibility, prefsSuccess/prefsError banners,
fallback when notificationPrefs is null.
20 tests across three files.
Refs #496.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
PersonDocumentList: empty/populated, year-range derivation across
no-date/single-year/multi-year inputs, sort toggle visibility (>1 doc),
sort-direction round trip, preview-limit + show-more expansion,
title→originalFilename fallback, no-date and no-location branches.
persons/new: PERSON vs INSTITUTION/GROUP visibility matrix
(firstName/alias/life-year fields toggle), lastName label switching
between Vorname/Nachname/Name, form-error banner, prior-form hydration,
cancel link href, fallback to PERSON for unknown personType.
24 tests across two files, hitting the 32+28 = 60 branches at the top
of the issue's leverage list.
Refs #496.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
CorrespondentSuggestionsDropdown: empty list still renders the static
heading and 'Alle Korrespondenten' row, populated rows when not loading,
loading hides correspondent rows, initials fallback (lastName-only when
firstName is null), click + keyboard selection, Escape closes.
PersonCard: full matrix of conditional UI — title visibility for PERSON
vs non-PERSON, avatar initials path (firstName+lastName vs lastName-only
fallback), PersonTypeBadge presence for non-PERSON types, alias, life
dates, notes, and the canWrite=true/false branches that gate the edit
link (Nora's authorization-rendering rule).
21 tests covering ~50 branches.
Refs #496.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
PersonTypeBadge: one test per switch arm (INSTITUTION, GROUP, UNKNOWN)
plus the two no-render branches (unrecognised type, empty type).
ExpandableText: clamp detection, toggle visibility logic, expand →
collapse round-trip, default maxLines fallback.
PersonChipRow: sender-only, sender+arrow, abbreviated naming, max-two
visible receivers, +N overflow pill presence/absence, receivers-only
case (no sender → no arrow).
19 tests across three files. Each file uses afterEach(cleanup) and
queries via getByRole/getByText so tests stay decoupled from CSS.
Refs #496.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
+error.svelte: vi.mock('$app/state') drives the page state so each test
can assert one of the three rendering branches — populated error message,
distinct status code, and the 'Internal Error' fallback when page.error
is null.
forgot-password/+page.svelte: prop-driven tests for the four states —
default form, success banner, error message inside the form, and the
back-to-login link href.
Refs #496.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
PasswordChangeForm: tests the null/success/error/mismatch banner branches
plus the form action wiring.
FileSectionNew: tests the no-file/file-selected toggle, onfileParsed
callback invocation with the parsed metadata, the early-return when no
file is in the change event, and the suggestedTitle fallback path.
Eleven tests across two files. Both follow the UploadZone template (props,
File API synthetic input, vi.fn() callback spies).
Refs #496.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Covers the abbreviated/full name branches, the firstName-null fallback
path, link href derivation from person id, initials rendering, and the
deterministic avatar palette colour. Six tests, six branches hit.
Refs #496.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds DocumentStatusChip.svelte.test.ts asserting one branch per
DocumentStatus value (PLACEHOLDER, UPLOADED, TRANSCRIBED, REVIEWED,
ARCHIVED) plus the title/aria-label exposure. Each test queries the
element via getByTitle so the component's accessibility surface is
verified at the same time as its branch logic.
Refs #496.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
UploadZone is the canonical browser-test template referenced from issue #496
implementation guidance. Adding afterEach(cleanup) makes it match the
TranscriptionPanelHeader pattern and prevents cross-test DOM leakage as more
tests are added in this branch.
Refs #496.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Per Felix's review on issue #496, tests should query observable behaviour via
ARIA roles, not test-only data-testid attributes. Replaces every
'document.querySelector([data-testid=...])' with 'page.getByRole(...)'.
The disabled-button click test uses force: true so Playwright bypasses its
enabled-check — the behaviour under test is precisely that the click is
ignored.
Refs #496.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Removes scaffolding pages from initial Paraglide setup that were never
navigated to in production. Shrinks the measured coverage surface and
removes dead code from the production bundle. CLAUDE.md route tables
updated to drop the demo/ entry.
Refs #496.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sara flagged that a future "compose cleanup" PR could silently drop the
backend volumes block and CI would happily pass while mass import on
staging silently broke. Adds a pre-build step that renders the staging
compose config and fails the deploy if `target: /import` or
`read_only: true` is missing.
Local verification of the guard:
- Volumes block removed → `grep -q 'target: /import'` exits 1 → step fails
- Volumes block present → both greps match → step passes
Addresses Sara's review on #526.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Mirrors the staging change. The host directory does not yet exist on
the production server — first production release that consumes this
will create an empty bind source via Docker's auto-create behaviour;
mass import then reports "no spreadsheet found" until an operator
pre-stages a payload there.
Addresses Tobias's review on #526.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The compose file now requires IMPORT_HOST_DIR or refuses to start
(#526). Without this line the next nightly deploy would fail with a
clear interpolation error, but it should not fail — the staging
import payload already lives at this host path (rsync'd in #526).
Addresses Tobias's review on #526.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
DEPLOYMENT.md line 81 declares any compose env var missing from §2 a
blocking review comment. IMPORT_HOST_DIR (added on this branch) was
unmentioned. Adds the row and rewrites §6.4 so the staging/prod operator
workflow (rsync host → set env → trigger import) is in the runbook,
not just buried in compose comments.
Addresses review feedback from Markus and Tobias on #526.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Tobias and Markus both flagged that a shared default (/srv/familienarchiv/
import) invites silent collision when staging and prod cohabit one host.
Switch to ${IMPORT_HOST_DIR:?...} so compose refuses to start without an
explicit per-env path — collision becomes structurally impossible.
The error message points operators at docs/DEPLOYMENT.md so the recovery
step is one click away. IMPORT_HOST_DIR moves from "Optional" to the
main required-env-vars block in the header.
Addresses review feedback from Markus, Tobias, and Nora on #526.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The hardcoded `static final String IMPORT_DIR = "/import"` was the only
non-`@Value` configurable input in MassImportService — every column
index next to it is wired through `app.import.col.*`. Lifts the
contract from infrastructure (compose bind mount) into application
config (`app.import.dir`), with `/import` as the default so the existing
bind-mount path keeps working.
Addresses review feedback from Markus and Felix on #526.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
`MassImportService` reads the ODS spreadsheet and referenced PDFs from a
hardcoded `/import` path inside the backend container. Dev compose
already bind-mounts `./import:/import`, but the prod compose had no
equivalent, so `POST /api/admin/import` would always fail on staging/prod
with "no spreadsheet found".
Mount strategy:
- Source path is env-driven (`IMPORT_HOST_DIR`), defaulting to
`/srv/familienarchiv/import` so the host path is stable across CI
deploys (the compose working dir is recreated each run, so `./import`
would not persist).
- Read-only — `MassImportService` only reads (`Files.list` /
`Files.walk`), never writes. Read-only mount makes that contract
explicit and prevents the backend container from mutating the source
PDFs.
- Empty / missing path is harmless: the import API just returns the
existing "no spreadsheet found" error rather than crashing the
container.
To use on staging: rsync the import folder to
`/srv/familienarchiv-staging/import/` on the host, set
`IMPORT_HOST_DIR=/srv/familienarchiv-staging/import` in `.env.staging`,
redeploy, trigger import from `/admin/system`.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The new alpine-based frontend production image (`node:20.19.0-alpine3.21`)
resolves `localhost` only to `::1` in /etc/hosts. SvelteKit's adapter-node
binds to 0.0.0.0 (IPv4 only), so `wget http://localhost:3000/login` from
inside the container connects to ::1 and gets "Connection refused" every
15s. Container goes unhealthy → `docker compose up --wait` fails → nightly
staging deploy fails. The app itself is fine.
Switching to 127.0.0.1 bypasses /etc/hosts and matches what Node actually
listens on.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- frontend/login: derive cookie `secure` flag from request URL protocol.
Pre-PR the cookie was only read by SSR so the flag didn't matter; now
the cookie IS the API credential and must be Secure on HTTPS or it
leaks a 24h Basic token on plaintext networks. Dev runs over HTTP and
would silently lose the cookie if we hardcoded `secure: true`, so the
flag follows `event.url.protocol === 'https:'`.
- SecurityConfig: rewrite the CSRF-disabled comment. The old
"browsers block cross-origin custom headers" justification no longer
holds once /api/* is authenticated via the cookie. Make the
load-bearing dependencies explicit: SameSite=strict on the auth_token
cookie + Spring's default CORS rejection.
- AuthTokenCookieFilter:
- Scope to /api/* only. /actuator/health and similar must not be
cookie-authenticated.
- Refuse malformed percent-encoding (URLDecoder throws); forward the
request without a promoted Authorization rather than crash.
- Use isBlank() instead of isEmpty() per Nora.
- Javadoc warning: getHeaderNames/getHeaders exposes the Basic
credential; any future header-iterating logger must scrub
Authorization before logging.
- Tests: add `passes_through_unchanged_when_request_is_outside_api_scope`
(/actuator/health with cookie should NOT be wrapped) and
`passes_through_unchanged_when_cookie_value_is_malformed_percent_encoding`.
Tighten the explicit-header test to verify same-instance forwarding
rather than just header equality.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Closes#520.
The login action stores `Basic <base64>` in an HttpOnly `auth_token`
cookie. SSR fetches from hooks.server.ts explicitly set the
Authorization header. Vite's dev proxy does the same on every
/api/* request. Caddy in production does NOT. So browser-side
fetch() and EventSource() calls reach the backend without auth,
get 401 + WWW-Authenticate: Basic, and the browser pops a native
auth dialog over the SPA.
Add AuthTokenCookieFilter (Ordered.HIGHEST_PRECEDENCE, before any
Spring Security filter) that promotes the cookie to a request
header when no explicit Authorization is present. URL-decodes the
cookie value because SvelteKit URL-encodes spaces ("Basic " ->
"Basic%20") when serializing the cookie. Works the same for REST,
SSE (/api/notifications/stream, /api/ocr/jobs/.../progress), and
any other browser-direct backend call.
5 tests in AuthTokenCookieFilterTest cover: URL-decoded promotion,
explicit-Authorization-wins precedence, no-cookies pass-through,
absent-auth-token pass-through, empty-value pass-through.
Also: add `@ActiveProfiles("test")` to ThumbnailServiceIntegrationTest,
the one remaining @SpringBootTest in the suite that wasn't annotated.
After #516 made UserDataInitializer fail-closed outside dev/test/e2e,
this test's context load was throwing. Restores green main.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Closes#518.
UserDataInitializer.initAdminUser was doing groupRepository.save(adminGroup)
unconditionally. If a previous boot had seeded the group but failed
before creating the admin user (or if the operator deleted just the
admin row to retry with a corrected APP_ADMIN_USERNAME), the next
seed attempt violated user_groups_name_key and aborted the context.
Switch to the same findByName(...).orElseGet(...) pattern initE2EData
already uses for the "Leser" group.
Tests in AdminSeedFailClosedTest:
- reuses_existing_Administrators_group_when_seeding_a_new_admin
- creates_Administrators_group_when_seeding_admin_on_a_fresh_database
Plus updated existing tests to stub groupRepository.save now that the
seed path also exercises it.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Closes#512.
The previous `(block_actuator)` snippet emitted `respond @actuator 404`
at the top level of each archive vhost. But each vhost also has a
catch-all `handle { reverse_proxy ... }` that matches /actuator/*
too. Caddy's `handle` blocks are mutually exclusive — once one matches,
the request never reaches a top-level `respond`. So /actuator/health
was being proxied to the backend, which 302s to /login.
Wrap the actuator response in its own `handle /actuator/*` block.
Caddy sorts `handle` blocks by path specificity, so /actuator/* wins
over the catch-all and the 404 is actually returned.
Verified with `caddy validate` against the caddy:2 image.
Also unblocks the nightly.yml smoke test's `/actuator/health → 404`
assertion, which has been failing since the first staging deploy.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Addresses Nora's review concern on #513/#516.
The previous fix only made env-vars take effect — it did NOT close the
fail-open default path. If an operator forgets APP_ADMIN_USERNAME /
APP_ADMIN_PASSWORD on first prod boot, the seeded admin is the
well-known `admin@familienarchiv.local` / `admin123` and is permanently
locked (UserDataInitializer only seeds when the row is missing).
Refuse to seed outside dev/test/e2e profiles when either credential
matches the documented default. The startup fails fast with a clear
message pointing at the env-var names and the permanence trap.
Also adds Markus/Felix/Sara's "pin the Java side" coverage: a
reflection test on the @Value placeholder catches a future rename
of `${app.admin.email:...}` back to `${app.admin.username:...}`,
which would otherwise pass the yaml-side test but silently break
the binding.
Tests:
- AdminSeedFailClosedTest pins fail-closed for non-local profiles
and verifies the dev/test/e2e bypass.
- AdminSeedPropertyKeyTest now also asserts the @Value placeholder
string on UserDataInitializer.adminEmail/adminPassword.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Closes#513.
UserDataInitializer reads `@Value("${app.admin.email:...}")` but
application.yaml mapped APP_ADMIN_USERNAME to `app.admin.username`.
The keys never connected — env vars APP_ADMIN_USERNAME and
APP_ADMIN_PASSWORD were silently ignored and the admin user got
seeded with the hardcoded defaults admin@familyarchive.local /
admin123.
For production this is HIGH severity: DEPLOYMENT.md §3.5 documents
the admin password as permanently locked on first deploy. The
bug locked the lock-in to dev defaults, not to whatever an operator
set in PROD_APP_ADMIN_PASSWORD.
Rename yaml key from `username:` to `email:` so the Spring property
`app.admin.email` actually exists. Keep env-var name
APP_ADMIN_USERNAME (matches the already-set Gitea secrets and
DEPLOYMENT.md §3.3). Default value updated to an email-shape.
Added AdminSeedPropertyKeyTest (Binder pattern, no Spring context):
verifies both `app.admin.email` and `app.admin.password` resolve
from the yaml. Confirmed red without the fix, green with it.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-11 17:12:36 +02:00
270 changed files with 40889 additions and 1105 deletions
@@ -159,7 +159,7 @@ Input DTOs live flat in the domain package. Response types are the model entitie
→ See [CONTRIBUTING.md §Error handling](./CONTRIBUTING.md#error-handling)
**LLM reminder:** use `DomainException.notFound/forbidden/conflict/internal()` from service methods — never throw raw exceptions. When adding a new `ErrorCode`: (1) add to `ErrorCode.java`, (2) mirror in `frontend/src/lib/shared/errors.ts`, (3) add i18n keys in `messages/{de,en,es}.json`.
**LLM reminder:** use `DomainException.notFound/forbidden/conflict/internal()` from service methods — never throw raw exceptions. When adding a new `ErrorCode`: (1) add to `ErrorCode.java`, (2) add to `ErrorCode` type in `frontend/src/lib/shared/errors.ts`, (3) add a `case` in `getErrorMessage()`, (4) add i18n keys in `messages/{de,en,es}.json`.
@@ -24,4 +24,7 @@ public interface InviteTokenRepository extends JpaRepository<InviteToken, UUID>
@Query("SELECT t FROM InviteToken t ORDER BY t.createdAt DESC")
List<InviteToken>findAllOrderedByCreatedAt();
@Query("SELECT CASE WHEN COUNT(t) > 0 THEN true ELSE false END FROM InviteToken t JOIN t.groupIds g WHERE g = :groupId AND t.revoked = false AND (t.expiresAt IS NULL OR t.expiresAt > CURRENT_TIMESTAMP) AND (t.maxUses IS NULL OR t.useCount < t.maxUses)")
# :ro restricts file-system access but NOT Docker API permissions — a compromised Promtail has full daemon access. Accepted risk on single-operator self-hosted archive.
@@ -63,7 +63,7 @@ Members of the cross-cutting layer have no entity of their own, no user-facing C
| `audit` | Append-only event store (`audit_log`) for all domain mutations. Feeds the activity feed and Family Pulse dashboard. | Consumed by 5+ domains; no user-facing CRUD of its own |
| `config` | Infrastructure bean definitions: `MinioConfig`, `AsyncConfig`, `WebConfig` | Framework infra; no business logic |
| `dashboard` | Stats aggregation for the admin dashboard and Family Pulse widget | Aggregates from 3+ domains; no owned entities |
| `exception` | `DomainException`, `ErrorCode` enum, `GlobalExceptionHandler` | Framework infra; consumed by every controller and service |
| `exception` | `DomainException`, `ErrorCode` enum, `GlobalExceptionHandler` | Framework infra; consumed by every controller and service. Adding a new `ErrorCode` requires matching updates in `frontend/src/lib/shared/errors.ts` and all three `messages/*.json` locale files. |
| `filestorage` | `FileService` — MinIO/S3 upload, download, presigned-URL generation | Generic service; consumed by `document` and `ocr` |
- SSE notifications transit Caddy (browser → Caddy → backend); the backend is never reachable directly from the public internet. The SvelteKit SSR layer is bypassed for SSE, but Caddy is not.
- The Caddyfile responds `404` on `/actuator/*` (defense in depth). Internal monitoring scrapes the backend on the docker network, not through Caddy.
- Production and staging cohabit on the same host via docker compose project names: `archiv-production` (ports 8080/3000) and `archiv-staging` (ports 8081/3001).
- An optional observability stack (Prometheus, Node Exporter, cAdvisor, Loki, Tempo, Grafana, GlitchTip) runs as a separate compose file. Configuration lives under `infra/observability/`. In production and CI, the stack is managed from `/opt/familienarchiv/` (CI copies it there on every nightly run) so bind mounts survive workspace wipes — see §4 for the ops procedure.
### OCR memory requirements
@@ -97,6 +98,7 @@ All vars are set in `.env` at the repo root (copy from `.env.example`). The back
| `APP_OCR_BASE_URL` | Internal URL of the OCR service | — | YES | — |
| `APP_OCR_TRAINING_TOKEN` | Secret token for OCR training endpoints | — | YES (prod) | YES |
| `IMPORT_HOST_DIR` | Absolute host path holding the ODS spreadsheet + PDFs for the `/admin/system` mass-import card. Mounted read-only at `/import` inside the backend (compose-only — backend reads via `app.import.dir`). Compose refuses to start when unset, so staging and prod cannot accidentally share the source. Convention: `/srv/familienarchiv-staging/import` and `/srv/familienarchiv-production/import` | — | YES (prod compose) | — |
| `OCR_MEM_LIMIT` | Container memory cap for ocr-service in `docker-compose.prod.yml`. Set to `6g` on CX32 hosts; leave unset on CX42+ to use the 12g default | `12g` (prod compose default) | — | — |
| `PORT_PROMETHEUS` | Host port for the Prometheus UI (bound to `127.0.0.1` only) | `9090` | — | — |
| `PORT_GRAFANA` | Host port for the Grafana UI (bound to `127.0.0.1` only) | `3003` | — | — |
| `POSTGRES_HOST` | PostgreSQL hostname for GlitchTip's db-init job and workers. Override when only the staging stack is running and `archive-db` is not resolvable by that name. | `archive-db` | — | — |
Phase 7 of the Production v1 milestone adds Prometheus + Loki + Grafana. No monitoring infrastructure is in place yet.
An observability stack is available via `docker-compose.observability.yml`. Configuration lives under `infra/observability/`.
#### Dev — start from the workspace
```bash
docker compose up -d # creates archiv-net
docker compose -f docker-compose.observability.yml up -d
```
#### Why the obs stack is managed differently from the main app stack
The main app stack (`docker-compose.prod.yml`) has no config-file bind mounts — its containers read config from env vars and image defaults. The workspace is wiped after each CI run but that does not affect running containers, because they hold no references to workspace paths.
The obs stack is different: `prometheus.yml`, `tempo.yml`, Loki config, Grafana provisioning files, and Promtail config are all bind-mounted from the host filesystem into their containers. If those source paths disappear (workspace wipe), the containers can restart fine until a `docker compose up` is run again — at that point Docker tries to re-resolve the bind-mount source and fails because the workspace path no longer exists.
The fix is to keep the obs compose file and config tree at a **permanent path** that CI copies to on every run but which survives between runs: `/opt/familienarchiv/` (see ADR-016).
#### Production — managed from `/opt/familienarchiv/`
Every CI run (nightly + release) copies `docker-compose.observability.yml` and `infra/observability/` to `/opt/familienarchiv/` before starting the stack. Bind mounts then resolve to `/opt/familienarchiv/infra/observability/…` — a stable path that outlasts any workspace wipe.
**Environment variables** follow the same two-source model as the main stack:
| Source | What it contains | Managed by |
|---|---|---|
| `infra/observability/obs.env` | All non-secret config (ports, URLs, hostnames) | Git — reviewed in PRs |
| `/opt/familienarchiv/obs-secrets.env` | Passwords and secret keys only | CI — written fresh from Gitea secrets on every deploy |
Both files are passed explicitly via `--env-file` to the compose command, so there is no implicit auto-read `.env` and no operator-managed file to keep in sync.
> **Note (manual ops only):** CI clears the destination with `rm -rf` before copying, so deleted files are removed automatically on the next run. If you copy manually with `cp -r` without first removing the directory, stale files from deleted configs will persist until cleaned up:
| `obs-loki` | `grafana/loki:3.4.2` | Log aggregation — receives log streams from Promtail. Port 3100 is `expose`-only (not host-bound). |
| `obs-promtail` | `grafana/promtail:3.4.2` | Log shipping agent — reads all Docker container logs via the Docker socket and forwards them to Loki with `container_name`, `compose_service`, `compose_project`, and `job` labels. The `job` label is mapped from the Docker Compose service name (`com.docker.compose.service`) so that Grafana Loki dashboard queries (`{job="backend"}`, `{job="frontend"}`) work out of the box and the "App" variable dropdown is populated. |
| `obs-tempo` | `grafana/tempo:2.7.2` | Distributed trace storage — OTLP gRPC receiver on port 4317, OTLP HTTP on port 4318 (both `archiv-net`-internal). Grafana queries traces on port 3200 (`obs-net`-internal). All ports are `expose`-only (not host-bound). |
| `obs-grafana` | `grafana/grafana-oss:11.6.1` | Unified observability UI — metrics dashboards, log exploration, trace viewer. Bound to `127.0.0.1:${PORT_GRAFANA:-3001}` on the host. |
| `obs-glitchtip` | `glitchtip/glitchtip:v4` | Sentry-compatible error tracker. Receives frontend + backend error events, groups by fingerprint, provides issue UI with stack traces. Bound to `127.0.0.1:${PORT_GLITCHTIP:-3002}`. |
| `obs-redis` | `redis:7-alpine` | Celery task broker for GlitchTip. Internal to `obs-net`; no host port exposed. |
| `obs-glitchtip-db-init` | `postgres:16-alpine` | One-shot init container. Creates the `glitchtip` database on the existing `archive-db` PostgreSQL instance if it does not already exist. Runs at stack startup; exits cleanly once done. |
**Prefer `compose_service` over `container_name` in LogQL queries** — `container_name` differs between dev (`archive-backend`) and prod (`archiv-production-backend-1`), while `compose_service` is stable (`backend`, `db`, `minio`, etc.).
Prometheus port `9090` and Grafana port `3003` (default; configurable via `PORT_GRAFANA`) are bound to `127.0.0.1` on the host. No other observability ports are host-bound.
GLITCHTIP_DOMAIN=http://localhost:3002 # change to your public URL in prod
PORT_GLITCHTIP=3002# optional, defaults to 3002
```
**Database:** GlitchTip shares the existing `archive-db` PostgreSQL instance. The `obs-glitchtip-db-init` one-shot container creates a dedicated `glitchtip` database on first stack start — no manual step required.
**First-run steps** (one-time, after `docker compose -f docker-compose.observability.yml up -d`):
2. Make sure `IMPORT_HOST_DIR=<host-path>` is set in `.env.staging` / `.env.production` (the nightly/release workflows already write this — see §3). Compose refuses to start without it.
3. Redeploy the stack so the bind mount picks up — or, if the mount is already in place, skip to step 4.
4. Call `POST /api/admin/trigger-import` (requires `ADMIN` permission), or click the "Import starten" button on `/admin/system`.
5. The import runs asynchronously — poll `GET /api/admin/import-status`, watch `/admin/system`, or tail the backend logs.
**Issues:** [#535 — original incident](https://git.raddatz.cloud/marcel/familienarchiv/issues/535) · [#553 — revision](https://git.raddatz.cloud/marcel/familienarchiv/issues/553)
---
## Context
Vitest browser-mode tests (the `client` project, run with `@vitest/browser-playwright` / Chromium) use a different module resolution path than Node-environment tests. When a spec calls `vi.mock('some-module', factory)`, vitest registers a `ManualMockedModule`. At runtime, every time Chromium requests that module, a playwright route handler intercepts the request and calls the Node worker over **birpc** (`resolveManualMock`) to evaluate the factory and return the module body.
This is safe for modules that are imported **statically** at spec module-eval time (e.g. `$app/navigation`, `$env/static/public`): those requests resolve before the first test runs and well before any teardown occurs.
It is **unsafe** for modules that are imported **dynamically** (e.g. inside an `async onMount`, inside a lazy-loaded chunk): Chromium may fetch the module after the worker's birpc channel has already closed, producing:
```
Error: [birpc] rpc is closed, cannot call "resolveManualMock"
This raises an unhandled rejection that exits the vitest process with code 1, even though every test in the run reported green.
`pdfjs-dist` and `pdfjs-dist/build/pdf.worker.min.mjs?url` are loaded via `await Promise.all([import('pdfjs-dist'), import('pdfjs-dist/build/pdf.worker.min.mjs?url')])` inside `usePdfRenderer.svelte.ts::init()`, which is called from `onMount`. These dynamic imports triggered the race.
---
## Decision
**Prefer prop injection over `vi.mock(module, factory)` for any module that is loaded dynamically in browser-mode specs.**
### The libLoader pattern (for external rendering libraries)
When a component depends on a large external library loaded via dynamic import, extract the import into an injectable loader function with a production default:
### The test-host pattern (for component behaviour)
For components that fetch data or call services, the `*.test-host.svelte` pattern threads the dependency as a prop rather than mocking the module. See `PersonMentionEditor.test-host.svelte` for the canonical example.
---
## Binding invariant: factory bodies must be synchronous (#553)
The original revision of this ADR allowed `vi.mock(virtualModule, factory)` for SvelteKit/Vite virtual modules on the argument that their consumer imports were resolved at static-import time. **That reasoning is wrong.** What matters is what the **factory body** does, not where the mocked module is consumed.
`EnrichmentBlock.svelte.spec.ts` (issue #553) was statically imported and still produced the race: its `vi.mock('$app/stores', async () => { const mod = await import(...); return mod; })` factory performed a dynamic import in its body, and that body was invoked asynchronously when Chromium fetched the manually-mocked module — sometimes after the worker's birpc channel had already closed.
**Therefore: under `**/*.svelte.{test,spec}.ts`, every `vi.mock` factory body must be synchronous. No `await`, no `import(...)`.**
If a factory needs to share state with the spec (a mutable ref, a `vi.fn`, a writable store), use `vi.hoisted()` to lift the reference above `vi.mock`'s implicit hoist:
```ts
const { mockNavigating } = vi.hoisted(() => ({
mockNavigating: { type: null as string | null }
}));
vi.mock('$app/state', () => ({
get navigating() {
return mockNavigating;
}
}));
```
The getter defers the read until consumption time; `vi.hoisted` guarantees the reference is initialised before the (also hoisted) `vi.mock` factory runs. See `DropZone.svelte.spec.ts:9`, `NotificationBell.svelte.spec.ts:6-10`, and `EnrichmentBlock.svelte.spec.ts` for canonical examples.
### Architectural follow-on: prefer `$app/state` over `$app/stores`
`$app/stores` is the deprecated subscription-based store API; `$app/state` is the modern reactive proxy. New components should import from `$app/state`. As part of #553 we migrated `EnrichmentBlock.svelte` from `$app/stores.navigating` to `$app/state.navigating` with `!!navigating.type` — matching the pattern already established in `routes/aktivitaeten/+page.svelte:117` and `routes/documents/+page.svelte:261`. Migration eliminated the *need* to mock a store at all in that spec.
**Pattern note:** When an overlay or dropdown triggers a navigation action, use `<button type="button">` with an `onclick` handler that calls `goto(path)` — do **not** use `<a href="…">` with `e.preventDefault()`. SvelteKit registers its link interceptor as a capture-phase `document` listener, so it fires before the component's bubble-phase `onclick`. By the time `e.preventDefault()` runs the router has already initiated navigation, which tears down the vitest-browser Playwright orchestrator iframe. A `<button>` carries no `href`, so the capture-phase interceptor never fires. See `NotificationDropdown.svelte` for the canonical example.
**Pattern note (#553):** Browser-mode tests run with `data-sveltekit-preload-data="off"` (set in `src/test-setup.ts` via the client project's `setupFiles`). Hover-prefetch otherwise fires real fetch requests for route loader chunks; those requests go through the same Playwright route handler that serves mocked modules. An in-flight prefetch landing after iframe teardown can hit the handler with a closed birpc channel, raising an unhandled rejection.
---
## Binding invariant: one canonical ID per mocked module (#553 — duplicate-id hazard)
The sync-factory invariant above closes one named trigger of the `[birpc] rpc is closed` race. Investigation of a follow-up flake revealed a second, independent trigger: **the same resolved module URL mocked under two distinct ID strings** across or within spec files.
`@vitest/browser-playwright` registers a Playwright `page.context().route(...)` handler per `vi.mock` call. The predicate matches on the module's resolved URL. When two `vi.mock` calls reference the same module under different IDs — for example `'$lib/foo.svelte'` and `'$lib/foo.svelte.js'` (both resolve to the same Svelte rune-module URL) — the registry stores both predicates but the cleanup map only tracks the latest. The orphan route survives session teardown. When the next session loads the same module, the orphan fires, calls `await module.resolve()` against a closed birpc channel, and crashes the run.
This is fixed upstream in [vitest PR #10267](https://github.com/vitest-dev/vitest/pull/10267) (issue [#9957](https://github.com/vitest-dev/vitest/issues/9957)). Until that fix reaches a published `@vitest/browser-playwright` release, we close the gap from two sides:
**The rule.** Every mocked module must be referenced under exactly one ID string across the entire client test suite. Pick the spelling production code uses. For Svelte 5 rune modules (`*.svelte.ts`), the canonical form is the no-extension import (`'$lib/foo.svelte'`) — matches the source file basename and matches Svelte 5 convention. Never mix `.svelte.js` and `.svelte` for the same module across specs.
**Enforcement layers** (added in #553's second cycle, extending the four-layer chain above):
5. **In-suite meta-test** at `frontend/src/__meta__/no-duplicate-mock-ids.test.ts` globs `src/**/*.svelte.{test,spec}.ts`, extracts every `vi.mock` first-arg string, canonicalises by stripping a trailing `.js`/`.ts` after `.svelte`, and fails if any canonical ID is referenced under two or more distinct spellings. Same shape as `no-async-mock-factories.test.ts`.
6. **`patch-package` backport** of PR #10267 at `frontend/patches/@vitest+browser-playwright+4.1.0.patch`. Applied automatically by the `postinstall` hook. Closes the race at the route-handler level — even if a contributor reintroduces a duplicate-ID, the patched `register` handler unroutes the existing predicate before installing the new one.
**When to remove the patch.** Once `@vitest/browser-playwright` ships a release containing PR #10267, delete `patches/@vitest+browser-playwright+4.1.0.patch`. Bump the dependency to the version containing the fix. The in-suite meta-test stays — it's a cheap permanent guard against the contributor-facing pattern, independent of upstream library version.
---
## Consequences
- New browser-mode specs that need to stub an external library **must not** use `vi.mock(externalLib, factory)`. Add a loader/factory parameter to the underlying hook or service instead.
- The CI `unit-tests` job includes a permanent grep guard that fails the build if `rpc is closed` appears in any coverage run log. This catches regressions before they reach the acceptance criterion.
- Acceptance criterion for #535: 60 consecutive green `workflow_dispatch` CI runs against `main` after the fix is merged, with zero `rpc is closed` lines in any log.
- **Enforcement (six layers, defence in depth):**
1. **ESLint `no-restricted-syntax`** in `eslint.config.js` (scoped to `**/*.{spec,test}.ts`) flags two patterns: (a) the literal `vi.mock('pdfjs-dist', ...)` — enforces the libLoader pattern — and (b) any `vi.mock(..., async () => { ... await import(...) ... })` — enforces the synchronous-factory invariant. Both messages point at this ADR. Failure surfaces at save time.
2. **CI grep guard** in `.gitea/workflows/ci.yml` runs before the test suite launches. Mirrors the ESLint patterns with `grep -Pzn`. ~10s round-trip.
3. **In-suite meta-test** at `frontend/src/__meta__/no-async-mock-factories.test.ts` globs `src/**/*.svelte.{test,spec}.ts` and asserts none match the banned pattern. Catches at every vitest invocation — the layer hardest to disable.
4. **CI birpc assert** runs after the coverage step and fails the build if `[birpc] rpc is closed` appears in any log line. Catches the symptom even if all the upstream layers were bypassed.
5. **In-suite duplicate-ID meta-test** at `frontend/src/__meta__/no-duplicate-mock-ids.test.ts` enforces the one-canonical-ID-per-module rule from the duplicate-id-hazard section above.
6. **`patch-package` backport** at `frontend/patches/@vitest+browser-playwright+4.1.0.patch` closes the upstream race itself, applied via `postinstall`. To be removed when `@vitest/browser-playwright` releases [vitest PR #10267](https://github.com/vitest-dev/vitest/pull/10267).
- **Acceptance verification:** `coverage-flake-probe.yml` is a `workflow_dispatch`-triggered matrix workflow that runs the coverage suite 20× in parallel against a single SHA and asserts zero birpc lines. One fire, parallel cost, deterministic signal — replaces accumulating 20 sequential push events.
- **When to revisit the LibLoader home:** If three or more components adopt this pattern, consider extracting a shared `$lib/types/lib-loader.ts` or a generic `DynamicImportLoader<T>` type to avoid parallel type definitions across modules.
# ADR-012: nsenter via privileged sibling container for host service management in CI
## Status
Accepted
## Context
The deploy workflows (`.gitea/workflows/nightly.yml`, `release.yml`) run job steps inside Docker containers under a Docker-out-of-Docker (DooD) setup: the Gitea runner container mounts the host Docker socket, and act_runner spawns a sibling container for each job. That job container also gets the Docker socket mounted (via `valid_volumes` in `runner-config.yaml`).
This architecture has one significant limitation: **job containers cannot manage host services**. Specifically:
- Job containers are not in the host's PID, mount, UTS, network, or IPC namespaces.
- There is no systemd PID 1 inside a job container — `systemctl` has nothing to talk to.
-`sudo` is not present in standard container images; even if it were, it would not help.
- Caddy runs as a **host systemd service** (not a Docker container), managing TLS certificates via Let's Encrypt. It must be running on the host to serve port 443.
The deploy workflows need to tell Caddy to reload its config after each deploy so that committed Caddyfile changes are applied before the smoke test validates the public surface. Without a reload step, Caddy silently serves the previous config and the smoke test may pass against stale configuration.
## Decision
Use the host Docker socket (already mounted in every job container via `runner-config.yaml`) to spin up a **privileged sibling container** in the host PID namespace, then use `nsenter` to enter all host namespaces and call `systemctl reload caddy`:
`nsenter -t 1 -m -u -n -p -i` enters the init process's mount, UTS, IPC, network, PID, and cgroup namespaces, giving `systemctl` a view of the real host systemd daemon.
**Alpine is used** instead of Ubuntu: ~5 MB vs ~70 MB pull size, no unnecessary tooling. `util-linux` (which ships `nsenter`) is installed at run time; apk add takes ~1 s on the warm VPS cache. The image digest is pinned so any upstream change requires an explicit Renovate bump PR.
**`reload` not `restart`**: reload sends SIGHUP so Caddy re-reads its config in-process without dropping TLS connections or in-flight requests.
**No sudoers entry is required**: the Docker socket already grants root-equivalent host access. This pattern makes existing implicit privileges explicit rather than introducing new ones.
This decision applies the same pattern to both `nightly.yml` and `release.yml` since both deploy the app stack and must apply Caddyfile changes before smoke-testing the public surface.
## Alternatives Considered
| Alternative | Why rejected |
|---|---|
| `sudo systemctl reload caddy` in the job container | No systemd PID 1 inside the container — `systemctl` has nothing to connect to. `sudo` is not present in container images and would not help even if it were. |
| Caddy admin API (`curl localhost:2019/load`) | Job containers do not share the host network namespace; `localhost:2019` on the host is unreachable. Exposing `:2019` on a host-bound port would add a network attack surface with no benefit over the current approach. |
| SSH from the job container to the VPS host | Requires storing an SSH private key as a CI secret, managing authorized_keys on the host, and opening an inbound SSH path from the container. Adds key management overhead for a pattern that the Docker socket already enables more directly. |
| Running Caddy as a Docker container (instead of host service) | Caddy manages TLS certificates via Let's Encrypt; running it in Docker complicates certificate persistence and renewal. As a host service, cert storage is straightforward and restarts do not risk rate-limit issues. This would be a larger infrastructure change unrelated to the CI gap. |
## Consequences
- The runner host's Docker socket access is now a capability relied upon for host service management, not just for running `docker compose` commands. This is stated explicitly in the YAML comment so future reviewers understand the trust boundary.
- The Caddyfile symlink on the VPS (`/etc/caddy/Caddyfile → /opt/familienarchiv/infra/caddy/Caddyfile`) is a required contract for CI to succeed. It is documented in `docs/DEPLOYMENT.md §3.1` and `docs/infrastructure/ci-gitea.md`. If the symlink is absent or mis-pointed, `systemctl reload caddy` succeeds but Caddy serves stale config.
- Renovate will create bump PRs when a new Alpine 3.21 digest is published. Because the container runs `--privileged --pid=host`, these bump PRs must be reviewed manually and must not be auto-merged. A `packageRule` in `renovate.json` enforces this.
- The step is duplicated between `nightly.yml` and `release.yml` (tracked in issue #539 for extraction into a composite action).
- If Caddy is not running when the step executes, `systemctl reload` exits non-zero and the workflow aborts before the smoke test — preventing a misleading "port 443 refused" curl error.
## References
-`docs/infrastructure/ci-gitea.md` §"Running host-level commands from CI (nsenter pattern)" — full operational context, troubleshooting guide
The browser-mode component test suite (`vitest.client-coverage.config.ts`) enforces Istanbul coverage thresholds across `lines`, `functions`, `branches`, and `statements`. The `branches` metric was set to 80%, but the codebase sits at **75%** — below the gate — causing every CI run of `unit-tests` and `coverage-flake-probe` to fail on this check alone, even when all tests are green.
**Measured baseline (2026-05-14, branch `feat/issue-553-birpc-async-mock-factory`, head `2e6cc346`):**
```
branches: 75% (below the 80% gate — reason for this ADR)
lines: ≥ 80%
functions: ≥ 80%
statements: ≥ 80%
```
Reproducer:
```bash
cd frontend && npm ci && npx vitest run -c vitest.client-coverage.config.ts --coverage
```
### The long-tail-grind problem
In Istanbul's branch accounting, when a child component gains test coverage its branches are added to the parent's denominator. A child moving from 40% → 80% coverage can drag a parent from 78% → 72% because more branches in the call graph become reachable and must be covered. This is not a bug — it is how branch accounting works — but it means that on a large SvelteKit application the denominator grows with every coverage improvement, making an arbitrary 80% ceiling a constant grind. Per #496, the expected cost to reach 80% branches from 75% is 30–100+ commits with no guarantee of stability.
### Why this layer is different
The 80% branch floor used for backend unit/integration tests is appropriate for Java service code and permission logic. Browser-mode component coverage measures Svelte template branches: conditional class bindings, `{#if}` blocks, empty/loaded/error state guards. These branches have a fundamentally different accounting model and a higher inherent denominator. This ADR **only** lowers the browser-mode component gate; the backend test coverage gates are unaffected.
### Security-relevant uncovered components
The following auth/permission-boundary components currently have low or zero branch coverage. When ratchet-up work begins (see below), these are the highest-priority targets:
-`src/routes/login/+page.svelte`
-`src/routes/forgot-password/+page.svelte`
-`src/routes/reset-password/+page.svelte`
-`src/routes/register/+page.svelte`
Note: the 75% figure already reflects the absence of coverage on these files. Lowering the gate does not create this gap — it makes the existing state legible.
---
## Decision
Drop the `branches` threshold from `80` → `75` in `frontend/vitest.client-coverage.config.ts`. Leave `lines`, `functions`, and `statements` at `80`.
The 75% figure matches the measured current state, allowing CI to pass while deliberate coverage improvement work (tracked in #496) continues without blocking other PRs. The asymmetry in the thresholds block is intentional and documented with an inline comment pointing here.
---
## Ratchet Rule
The branches threshold ratchets **up by 3 percentage points** when the rolling 3-PR-average client-project branches figure on `main` stays at or above `threshold + 3pp` for ≥ 30 consecutive days. Direction is **up-only** — never lower the floor below 75 without a new ADR superseding this one. Manual today (verify before any `vitest.client-coverage.config.ts` edit); a future automation issue may codify the check.
Concretely:
- When `main` sustains ≥ 78% branches across 3 consecutive PRs for 30 days → raise gate to 78%
- When `main` sustains ≥ 81% branches across 3 consecutive PRs for 30 days → raise gate back to 80%
---
## Non-goals
- **Not** raising actual branch coverage — that is #496's job, tracked separately.
- **Not** touching the server-project coverage configuration (`vitest.config.ts`) — only the client project hits the long-tail-grind pattern.
- **Not** removing or relaxing any existing test files, `skipIf` guards, or axe-playwright accessibility runs.
---
## Consequences
**Easier:**
- CI unblocked — `unit-tests` and `coverage-flake-probe` jobs pass when all tests are green
- The ratchet rule creates a concrete, observable path back to 80%
**Harder:**
- The gate now has near-zero headroom — any branch regression that drops below 75% will fail CI immediately
- The 75% floor must not be treated as a permanent ceiling; the ratchet discipline requires active attention
# ADR-015: DooD workspace bind mount for Compose file bind-mount resolution
## Status
Accepted
## Context
The deploy workflows (`.gitea/workflows/nightly.yml`, `release.yml`) run job steps inside Docker containers via Docker-out-of-Docker (DooD): the Gitea runner mounts the host Docker socket, and act_runner spawns sibling containers for each job.
When a job step calls `docker compose -f docker-compose.observability.yml up`, Docker Compose resolves relative bind-mount sources against `$(pwd)` inside the job container and passes the resulting absolute paths to the **host** daemon. For example, `./infra/observability/prometheus/prometheus.yml` becomes `/some/path/infra/observability/prometheus/prometheus.yml`, and the host daemon tries to bind-mount that path from the **host filesystem**.
In the default DooD setup (`runner-config.yaml` with only `valid_volumes: ["/var/run/docker.sock"]`), job container workspaces live in the act_runner overlay2 layer. The host has no corresponding directory at the job container's `$(pwd)` path, so the daemon auto-creates an empty directory in its place. The container then fails to start because the mount target was expected to be a file, not a directory:
```
error mounting "…/prometheus/prometheus.yml" to rootfs at "/etc/prometheus/prometheus.yml": not a directory
```
This affected all five config file bind mounts in `docker-compose.observability.yml`.
## Decision
Configure act_runner to store job workspaces on a real host path (`/srv/gitea-workspace`) and mount that path into both the runner container and every job container at the **same absolute path**. The identity of the host path and container path is the key constraint: Compose resolves to an absolute path and hands it to the host daemon, which looks for that exact path on the host filesystem.
**Runner compose.yaml change** (host side — not in this repo):
```yaml
runner:
volumes:
- /srv/gitea-workspace:/srv/gitea-workspace
```
With this in place, `$(pwd)` inside a job container resolves to `/srv/gitea-workspace/<owner>/<repo>/`, which is a real directory on the host. Compose-managed bind mounts from that directory work without any additional steps.
## Alternatives Considered
| Alternative | Why rejected |
|---|---|
| **overlay2 `MergedDir` sync via privileged nsenter** (the previous approach, see PR #599 v1) | Required `--privileged --pid=host` (effective root on the host) plus fragile overlay2 driver assumption. Introduced stale-file risk on the host and a second stable path (`/srv/familienarchiv-*/obs-configs`) to maintain separately from the source tree. Replaced by this ADR. |
| **Build configs into a dedicated Docker image** (pattern used for MinIO bootstrap, see `infra/minio/Dockerfile`) | Viable for static files that change infrequently. Requires a build step and an image rebuild every time a config changes. Appropriate for bootstrap scripts; too heavy for frequently-tuned observability configs. |
| **Add workspace directory to runner-config `valid_volumes` only** (without `workdir_parent`) | `valid_volumes` whitelists paths that workflow steps may reference, but does not change where act_runner stores workspaces. Without `workdir_parent`, the workspace would still be in overlay2 and the bind-mount resolution problem would remain. |
| **Map workspace under a different host path than container path** (e.g. host `/srv/workspace`, container `/workspace`) | Compose resolves to the container-internal path (e.g. `/workspace/…`) and passes that to the host daemon. The host daemon interprets the source as a host path. If host `/workspace` does not exist, the daemon creates an empty directory — the original bug. The paths must be identical. |
## Consequences
-`/srv/gitea-workspace` must exist on the VPS before the runner starts. The directory was created as part of this change; it is not created automatically.
- The runner container's `compose.yaml` (maintained outside this repo at `~/docker/gitea/compose.yaml` on the VPS) must include the `- /srv/gitea-workspace:/srv/gitea-workspace` volume line. This is an out-of-band operational dependency; the prerequisite is documented in `runner-config.yaml`.
-`workdir_parent` applies to all jobs on this runner. Any future workflow that calls `docker compose` with relative bind mounts benefits automatically without further configuration.
- Job workspaces persist across runs under `/srv/gitea-workspace`. act_runner manages per-run subdirectory cleanup. Orphaned directories from interrupted runs should be cleaned up manually if disk space becomes a concern.
- Workflows that previously relied on `OBS_CONFIG_DIR` env var or the `obs-configs` stable path on the host no longer need those. Both were removed in this PR.
- This pattern does **not** apply to the `nsenter`-based Caddy reload step (ADR-012), which manages a host systemd service — a different problem class with no bind-mount equivalent.
## References
- ADR-011 — single-tenant runner trust model
- ADR-012 — nsenter via privileged container for host service management
- Issue #598 — original observability stack bind-mount failure
# ADR-016: Observability stack co-location at `/opt/familienarchiv/` with CI-push config sync
## Status
Accepted
## Context
Issue #601 established that the observability stack must survive Gitea CI workspace wipes between nightly runs. When the nightly job completes, act_runner deletes the job workspace. Any Docker container that bind-mounts a config file from a workspace path (`/srv/gitea-workspace/…/infra/observability/prometheus/prometheus.yml`) then references a path that no longer exists on the host. On the next nightly run, Docker Compose either auto-creates an empty directory in its place (causing the container to fail to start because a file mount receives a directory) or finds a stale file from a previous run if the workspace happened to land at the same path.
ADR-015 solved the workspace bind-mount resolution problem: job workspaces are stored at `/srv/gitea-workspace` so `$(pwd)` inside the job container maps to a real host path. But it did not address persistence: the workspace is still wiped after the job, so bind mounts from workspace-relative paths remain fragile across runs.
### Decision drivers
1. Bind-mount sources must point to a host path that persists indefinitely, not to a path that disappears after each CI run.
2. Config files must reflect the committed state of the repo after every nightly run (no manual sync steps).
3. Secrets must not be written to the workspace or to any path managed by CI; they must survive independently of deployments.
4. The solution must not introduce new infrastructure dependencies (no SSH access from CI, no external registry, no additional server-side daemon).
### Alternatives considered
**A: Server-pull model** — a systemd timer or cron job on the server does `git pull` from the repo into `/opt/familienarchiv/` and then runs `docker compose up`. Rejected because: (1) requires git credentials on the server and a registered deploy key, (2) adds a second deployment mechanism that diverges from the CI-push model used for the main app stack, (3) timing coupling — the server pull must complete before CI's health checks run, requiring polling or a webhook.
**B: Separate directory (e.g. `/opt/obs/`)** — keeps obs configs isolated from the app stack. Rejected because: (1) the main app compose files are already in `/opt/familienarchiv/` (managed the same way), and (2) GlitchTip shares the `archive-db` PostgreSQL instance and `archiv-net` Docker network — it is architecturally part of the same deployment unit, not a separate one. Co-location reflects the actual coupling.
**C: Named Docker configs (Swarm)** — Docker Swarm supports first-class config objects that persist in the cluster. Rejected because the project does not use Swarm and introducing it solely for config persistence is a disproportionate dependency.
## Decision
The observability stack is co-located with the main application deployment at `/opt/familienarchiv/`:
Both the nightly CI job (`nightly.yml`) and the release job (`release.yml`) copy these files from the workspace checkout to `/opt/familienarchiv/` using `cp -r` on every run (CI-push model). Containers always read config from the permanent location; a workspace wipe has no effect on running containers.
Environment variables follow a two-source model:
-`infra/observability/obs.env` (git-tracked, non-secret): all non-sensitive config — host ports, public URLs (`GLITCHTIP_DOMAIN`, `GF_SERVER_ROOT_URL`), and the default `POSTGRES_HOST`. Changes go through PR review. No credentials.
-`/opt/familienarchiv/obs-secrets.env` (CI-written, per-deploy): passwords and secret keys only (`GRAFANA_ADMIN_PASSWORD`, `GLITCHTIP_SECRET_KEY`, `POSTGRES_USER`, `POSTGRES_PASSWORD`, `POSTGRES_HOST`), injected fresh from Gitea secrets on every nightly and release deploy. Gitea is the single source of truth for secrets — rotating a secret takes effect on the next deploy without manual server action.
Both files are passed explicitly via `--env-file` to every obs compose command (config dry-run and `up`). There is no implicit auto-read `.env`. The required key inventory is documented in `docs/DEPLOYMENT.md §4`.
The CI runner mounts `/opt/familienarchiv` as a bind mount into job containers (see `runner-config.yaml`). This requires a one-time `mkdir -p /opt/familienarchiv/infra` on the server and a runner restart after updating `runner-config.yaml` (see ADR-015 and `docs/DEPLOYMENT.md §3.1`).
## Consequences
**Positive:**
- Bind-mount sources survive workspace wipes by definition — they are on a persistent host path.
- Config is always in sync with the repo after each nightly run.
- No new infrastructure dependencies; the CI-push model mirrors how the main app stack is deployed.
- Secret rotation requires no manual server action — Gitea secrets are the authoritative store; `obs-secrets.env` is rewritten from scratch on every deploy so a secret change takes effect on the next nightly or release run.
**Negative:**
-`cp -r` does not remove deleted files; a config file removed from the repo persists in `/opt/familienarchiv/infra/observability/` until manually deleted. Acceptable for this project's change frequency. A `rsync -a --delete` would give a clean mirror if this becomes a problem.
- Mounting `/opt/familienarchiv/` into CI job containers expands the blast radius of a compromised workflow step — a malicious step could overwrite app compose files and Caddy config. Acceptable because the runner is single-tenant (trusted code only). See `runner-config.yaml` security comment.
- Runner must be restarted (`systemctl restart gitea-runner`) after any change to `runner-config.yaml` for the new mount to take effect.
# ADR-017: Spring Boot 4.0 management port shares the main security filter chain
## Status
Accepted
## Context
The Familienarchiv backend runs Spring Boot Actuator on a dedicated management port (8081) so that Caddy never proxies `/actuator/*` requests and Prometheus can reach the scrape endpoint directly inside `archiv-net`.
In earlier Spring Boot versions (< 4.0), the management server ran in an isolated child application context whose security was governed independently by `ManagementWebSecurityAutoConfiguration`. The main app's `SecurityConfig` filter chain (port 8080) never intercepted requests arriving on port 8081.
In Spring Boot 4.0 with Jetty, this isolation was removed. The management server now traverses the **same** Spring Security `FilterChainProxy` as the main application. Concretely:
- Any `SecurityFilterChain` bean in the application context is evaluated for requests arriving on the management port.
- There is no longer a separate "management security" child context.
This was discovered when Prometheus began receiving HTTP 401 responses from `/actuator/prometheus` despite the endpoint being exposed and the `micrometer-registry-prometheus` dependency being present. Prometheus rejected these responses with `received unsupported Content-Type "text/html"` because the main filter chain's form-login `DelegatingAuthenticationEntryPoint` was redirecting unauthenticated requests to `/login` (302 → HTML).
A secondary issue: Spring Boot 4.0 no longer auto-enables Prometheus metrics export — `management.prometheus.metrics.export.enabled` must be set explicitly, and the Prometheus scrape endpoint requires `spring-boot-starter-micrometer-metrics` (a new starter that was split out in Spring Boot 4.0).
## Decision
1.**Dedicated management `SecurityFilterChain`** scoped to `/actuator/**` at `@Order(1)` (highest precedence). This chain:
-`permitAll()` for `/actuator/health` and `/actuator/prometheus` — required for Docker health checks and unauthenticated Prometheus scraping.
-`authenticated()` for all other actuator endpoints — blocks `/actuator/metrics`, `/actuator/info`, etc. without credentials.
- Uses an explicit `401` entry point (not form-login redirect) so that API clients — including Prometheus — receive a machine-readable status code rather than an HTML redirect.
- No CSRF, no form login.
2.**Belt-and-suspenders `permitAll()` in the main `SecurityFilterChain`** for `/actuator/health` and `/actuator/prometheus`, in case a future configuration change causes these paths to escape the management chain's `securityMatcher`.
3.**Network isolation as the outer defense boundary.** Port 8081 is not published in `docker-compose.yml` and is not routed through Caddy. Only services inside `archiv-net` (primarily Prometheus and the Docker health checker) can reach the management port.
## Alternatives rejected
- **Exclude `ManagementWebSecurityAutoConfiguration`:** This auto-configuration no longer exists in Spring Boot 4.0. Exclusion is not applicable.
- **Keep `SecurityConfig` as the sole filter chain without `@Order(1)` management chain:** The main chain's form-login `DelegatingAuthenticationEntryPoint` redirects browser-like clients to `/login` (302). Prometheus and automated health check clients cannot follow this redirect, so the endpoint would be unreachable without a dedicated chain that returns plain 401 or 200.
- **Per-endpoint `@Order(1)` filter chain using `EndpointRequest.toAnyEndpoint()`:** The `spring-boot-security` artifact that provides `EndpointRequest` is not a transitive dependency of `spring-boot-starter-actuator` in Spring Boot 4.0. Using a path-based `securityMatcher("/actuator/**")` achieves the same scoping without an extra dependency.
## Consequences
- All actuator endpoints on port 8081 that are not explicitly `permitAll()`-ed require HTTP Basic credentials. Without valid credentials, the response is 401 (not a redirect).
- Adding a new actuator endpoint to `management.endpoints.web.exposure.include` implicitly protects it via `anyRequest().authenticated()` in the management chain — no additional `permitAll()` needed unless intentional.
- A regression test (`ActuatorPrometheusIT`) verifies:
-`/actuator/prometheus` returns 200 without credentials.
-`/actuator/metrics` returns 401 without credentials.
- Prometheus metric names are present in the response body.
- If port 8081 is ever accidentally published in `docker-compose.yml`, actuator endpoints other than health and prometheus are still protected by HTTP Basic. This reduces (but does not eliminate) the risk of inadvertent exposure.
Container(mc, "Bucket / Service-Account Init", "MinIO Client (mc)", "One-shot container on startup. Idempotent: creates the archive bucket, the archiv-app service account, and attaches the readwrite policy.")
Container(loki, "Loki", "grafana/loki:3.4.2", "Stores log streams from all containers.")
Container(promtail, "Promtail", "grafana/promtail:3.4.2", "Ships Docker container logs to Loki via Docker SD.")
Container(tempo, "Tempo", "grafana/tempo:2.7.2", "Distributed trace storage. OTLP gRPC receiver on port 4317 (archiv-net). Grafana queries traces on port 3200 (obs-net). All ports internal only.")
Container(grafana, "Grafana", "grafana/grafana-oss:11.6.1", "Unified observability UI — dashboards, logs, traces. Datasources (Prometheus, Loki, Tempo) and three dashboards are auto-provisioned.")
Container(glitchtip, "GlitchTip", "glitchtip/glitchtip:v4", "Sentry-compatible error tracker — web process. Receives frontend + backend error events, groups by fingerprint, provides issue UI with stack traces.")
| `gitea-runner` (Docker container) | Runs all CI and deploy jobs | `infra/gitea/docker-compose.yml` + `/root/docker/gitea/runner-config.yaml` |
Both containers live in the `gitea_gitea` Docker network on the VPS. The runner connects to Gitea via the LAN IP so job containers (which don't share the `gitea_gitea` network) can also reach it.
### Docker-out-of-Docker (DooD)
The `gitea-runner` container mounts the host Docker socket (`/var/run/docker.sock`). When a workflow job runs, act_runner spawns a **sibling container** for each job. That job container also gets the Docker socket mounted (via `valid_volumes` in `runner-config.yaml`), enabling `docker compose` calls in workflow steps.
When a workflow step calls `docker compose up` with relative bind-mount sources (e.g. `./infra/observability/prometheus/prometheus.yml`), Compose resolves them against `$(pwd)` inside the job container and passes the resulting **absolute path** to the host Docker daemon. The host daemon then tries to bind-mount that path from the **host filesystem**.
In the default DooD setup the job container's workspace lives in the act_runner overlay2 layer — the host has no directory at that path, auto-creates an empty one, and the container fails with:
```
error mounting "…/prometheus/prometheus.yml" to rootfs at "/etc/prometheus/prometheus.yml": not a directory
```
The runner label `ubuntu-latest` maps to the Docker image it uses -- this is how `runs-on: ubuntu-latest` in the workflow YAML continues to work unchanged.
**Solution (ADR-015):** store job workspaces on a real host path and mount it at the **same absolute path** inside the runner and every job container. `runner-config.yaml` configures this via `workdir_parent`, `valid_volumes`, and `options`.
**One-time host setup** (required on any fresh VPS):
```bash
mkdir -p /srv/gitea-workspace
# Then add to the runner service in ~/docker/gitea/compose.yaml:
# volumes:
# - /srv/gitea-workspace:/srv/gitea-workspace
# Restart the runner container for the change to take effect.
```
The path `/srv/gitea-workspace` is the canonical workspace root. It must be identical on the host and inside job containers — if the paths differ, Compose still resolves to the container-internal path, which the host daemon cannot find (the original bug).
**Disk management:** act_runner cleans per-run subdirectories on completion. Orphaned directories from interrupted runs accumulate under `/srv/gitea-workspace` and should be pruned manually if disk space becomes a concern:
```bash
# List workspace directories older than 7 days
find /srv/gitea-workspace -mindepth 3 -maxdepth 3 -type d -mtime +7
```
---
### Running host-level commands from CI (nsenter pattern)
Job containers are unprivileged and do not share the host's PID/mount/network namespaces. Commands like `systemctl` that target the host daemon are therefore unavailable by default. When a workflow step needs to manage a host service (e.g. `systemctl reload caddy`), it uses the Docker socket to spin up a **privileged sibling container** in the host PID namespace:
`nsenter -t 1 -m -u -n -p -i` enters the init process's mount, UTS, IPC, network, PID, and cgroup namespaces, giving `systemctl` a view of the real host systemd. No sudoers entry is required — the Docker socket already grants root-equivalent host access.
Alpine is used instead of Ubuntu: ~5 MB vs ~70 MB, and the digest is pinned to a specific sha256 so any upstream change requires an explicit Renovate bump PR. `util-linux` (which ships `nsenter`) is not part of the Alpine base image but is installed at run time in ~1 s from the warm VPS cache.
#### Why not `sudo systemctl` in the job container?
Job containers run as root inside an unprivileged Docker namespace. There is no systemd PID 1 inside the container — `systemctl` would attempt to reach a socket that does not exist. `sudo` is not present in container images and would not help even if it were.
#### Why not Caddy's admin API?
Caddy ships a localhost admin API at `:2019` by default. Job containers do not share the host network namespace, so they cannot reach `localhost:2019` on the host. Exposing `:2019` on a host-bound port to make it reachable would add a network attack surface with no benefit over the current approach.
### Caddyfile symlink contract
The deploy workflows reload Caddy to pick up committed Caddyfile changes. This relies on a symlink that must exist on the VPS:
Failed to reload caddy.service: Unit caddy.service is not active.
```
Recovery:
```bash
ssh root@<vps>
systemctl start caddy
systemctl status caddy # confirm Active: active (running)
```
Re-run the workflow via Gitea Actions → "Re-run workflow".
**Failure mode 2 — Caddyfile symlink is missing or mis-pointed**
This failure is silent — `systemctl reload caddy` exits 0 but Caddy reloads whatever `/etc/caddy/Caddyfile` currently resolves to. The smoke test may then pass against stale config.
Symptom: smoke test fails on the HSTS value or the `/actuator/health → 404` check despite the Reload Caddy step succeeding.
Diagnosis:
```bash
ssh root@<vps>
ls -la /etc/caddy/Caddyfile
# Should be: lrwxrwxrwx ... /etc/caddy/Caddyfile -> /opt/familienarchiv/infra/caddy/Caddyfile
docker: Cannot connect to the Docker daemon at unix:///var/run/docker.sock.
```
or
```
nsenter: failed to execute /bin/systemctl: No such file or directory
```
The first error means the Docker socket is not mounted into the job container — check `valid_volumes` in `/root/docker/gitea/runner-config.yaml` on the VPS. The second means the Alpine image is running but cannot enter the host mount namespace; verify `--privileged` and `--pid=host` are both present in the workflow step.
**Failure mode 4 — workspace bind-mount not configured (observability stack or any compose-with-file-mounts job)**
Symptom in CI log:
```
Error response from daemon: error while creating mount source path "…/prometheus/prometheus.yml": mkdir …: not a directory
```
Or the service starts but immediately crashes because a config file was mounted as an empty directory.
Cause: `/srv/gitea-workspace` does not exist on the host, or the runner container's `compose.yaml` is missing the `- /srv/gitea-workspace:/srv/gitea-workspace` volume line.
Diagnosis:
```bash
ssh root@<vps>
ls -la /srv/gitea-workspace # must exist and be a directory
docker inspect gitea-runner | grep -A5 Mounts # must show /srv/gitea-workspace
```
Recovery:
```bash
mkdir -p /srv/gitea-workspace
# Add volume line to runner compose.yaml, then:
docker compose -f ~/docker/gitea/compose.yaml up -d gitea-runner
```
See `docs/DEPLOYMENT.md §3.1` and ADR-015 for the full setup rationale.
---
@@ -107,7 +260,7 @@ jobs:
working-directory:frontend
- name:Upload screenshots
if:always()
uses:actions/upload-artifact@v4 # ← upgraded from v3
uses:actions/upload-artifact@v3# pinned per ADR-014 — Gitea Actions does not implement v4 protocol. Do NOT upgrade.
with:
name:unit-test-screenshots
path:frontend/test-results/screenshots/
@@ -134,7 +287,7 @@ jobs:
working-directory:backend
- name:Upload test results
if:always()
uses:actions/upload-artifact@v4 # ← upgraded from v3
uses:actions/upload-artifact@v3# pinned per ADR-014 — Gitea Actions does not implement v4 protocol. Do NOT upgrade.
with:
name:backend-test-results
path:backend/target/surefire-reports/
@@ -236,7 +389,7 @@ jobs:
E2E_BACKEND_URL:http://localhost:8080
- name:Upload E2E results
if:always()
uses:actions/upload-artifact@v4 # ← upgraded from v3
uses:actions/upload-artifact@v3# pinned per ADR-014 — Gitea Actions does not implement v4 protocol. Do NOT upgrade.
"Banned: vi.mock('pdfjs-dist', factory) causes a birpc teardown race in browser-mode specs — see ADR 012. Use the libLoader prop injection pattern instead."
},
{
// ADR 012 / #553. The named mechanism: an async vi.mock factory whose
// body performs `await import(...)` produces a late birpc roundtrip
// during worker teardown. The factory body must be synchronous; if
// you need to share state between the spec and the mock, use
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.