feat(auth): defense-in-depth — CSRF, session revocation, login rate limit #524

Open
opened 2026-05-11 18:50:20 +02:00 by marcel · 8 comments
Owner

Context

This is Phase 2 of 2 splitting #522. Blocked by #523 (Phase 1 — server-side session model).

#523 replaces the cookie-as-credential model with opaque server-side sessions. With sessions in place, several defenses become possible that were inexpressible before:

  1. CSRF can be re-enabled with Spring Security's synchronizer-token pattern (#523 keeps the existing csrf.disable() so the cutover stays scoped).
  2. Session revocation becomes a real lever — password change, password reset, and admin force-logout can all delete the relevant spring_session rows.
  3. Login rate limiting at the application layer (Bucket4j) becomes the right place for per-(IP, email) buckets instead of the coarse Caddy/fail2ban approach.

This issue lands all three. Unlike #523, none of these are breaking — they are additive defenses on top of a working session model.

⚠ Hard dependency

Blocked by #523. Do not start until that ships. The CSRF synchronizer token cookie depends on HttpSession being a real thing; the session-deletion paths depend on spring_session existing; the rate-limit error UX depends on INVALID_CREDENTIALS already being a frontend-mapped code.

User need

As an operator
I want state-changing endpoints to require a server-issued CSRF token in addition to the SameSite cookie
so that a misconfiguration of cookie attributes or CORS allowed-origins does not silently open every POST/PUT/PATCH/DELETE to forgery.

As an operator
I want to be able to kick a compromised account's sessions
so that I can recover from a credential leak in seconds rather than waiting up to 24 h for cookie expiry.

As a user
I want changing my password to log my other devices out
so that I can react to "I think someone stole my session" by changing my password.

As an operator
I want a clear ceiling on brute-force login attempts
so that a stolen email address cannot be paired with a wordlist hammered against /api/auth/login at network speeds.

Functional requirements

FR-AUTH-004 — Whenever an authenticated user changes their own password, all spring_session rows for that user OTHER than the current session SHALL be deleted. The current session SHALL be preserved (do not force the user to re-login on the device where they initiated the change).

FR-AUTH-005 — There SHALL be an admin-only endpoint POST /api/admin/users/{userId}/force-logout (or equivalent — final path is the implementer's call) that deletes all spring_session rows for the target user. Permission required: ADMIN_USER (existing permission — no new permission needed unless the implementer disagrees). UI surface is deferred — backend capability + audit log entry only in this issue.

FR-AUTH-008 — A successful password reset via POST /api/auth/reset-password SHALL delete all spring_session rows for the affected user. Unlike FR-AUTH-004, there is no "current session to preserve" — the password-reset flow is unauthenticated.

FR-AUTH-009 (CSRF) — Spring Security's CSRF filter SHALL be re-enabled in SecurityConfig using CookieCsrfTokenRepository.withHttpOnlyFalse() (XSRF-TOKEN cookie pattern — decided in #522 review). State-changing requests (POST/PUT/PATCH/DELETE) without a valid X-XSRF-TOKEN header echoing the cookie's token SHALL return 403 with ErrorCode.CSRF_TOKEN_MISSING. GET/HEAD/OPTIONS/TRACE are exempt (Spring default).

FR-AUTH-010 (rate limit)POST /api/auth/login SHALL enforce a per-(IP, email) rate limit at the application layer using Bucket4j (or equivalent in-process token bucket). After 5 failed attempts within a 15-minute window, further attempts from that (IP, email) pair SHALL return HTTP 429 with ErrorCode.TOO_MANY_LOGIN_ATTEMPTS. The limit SHALL be independent of any reverse-proxy-layer protection (fail2ban via Caddy). A successful login SHALL reset the bucket.

Non-functional requirements

NFR-SEC-103 — CSRF protection on state-changing endpoints SHALL be enforced by two independent mechanisms:

  1. The session cookie's SameSite=strict attribute (from #523)
  2. A server-issued CSRF token in an XSRF-TOKEN cookie that must be echoed in X-XSRF-TOKEN on writes

Either alone is insufficient. Both SHALL pass for a state-changing request to succeed.

NFR-SEC-104 — Login attempts SHALL be rate-limited at the application layer per the FR-AUTH-010 thresholds. The 5/15min window MAY be tuned by configuration property without code change.

NFR-OBSV-101 (completion of #523's partial) — Admin force-logout SHALL emit AuditKind.ADMIN_FORCE_LOGOUT audit entries with payload { adminUserId, targetUserId, sessionsRevokedCount, ip, ua }. Bulk session deletion via password change/reset SHALL emit a single LOGOUT entry per session deleted, with payload { userId, ip, ua, reason: "password_change" | "password_reset" | "admin_force_logout" }.

NFR-OBSV-102 (new) — Rate-limited login attempts SHALL emit AuditKind.LOGIN_RATE_LIMITED entries with payload { email, ip, ua, attemptsInWindow }. This is distinct from LOGIN_FAILED — the latter means "credentials wrong"; the former means "stop knocking."

NFR-COMPAT-101 (extends #523's) — Existing dev tooling SHALL continue to work without code changes:

  • Vite proxy continues to forward XSRF-TOKEN and X-XSRF-TOKEN headers in both directions
  • E2E tests handle the CSRF token transparently (Playwright reads the cookie before the first write)
  • The e2e profile's admin seed is unaffected

Acceptance criteria

# CSRF — FR-AUTH-009 / NFR-SEC-103

Given a state-changing endpoint (POST /api/persons, PUT /api/documents/{id}, DELETE /api/tags/{id}, etc.)
When a request arrives with a valid session cookie but NO X-XSRF-TOKEN header
Then the response is 403 with ErrorCode.CSRF_TOKEN_MISSING

Given a state-changing endpoint
When a request arrives with a valid session cookie and a stale or forged X-XSRF-TOKEN header
Then the response is 403 with ErrorCode.CSRF_TOKEN_MISSING

Given a successful login
When the response is returned
Then an XSRF-TOKEN cookie is present (NOT HttpOnly — JS must read it)

Given a parameterized test enumerating every write endpoint in the OpenAPI spec
When each is called with valid session + no CSRF token
Then every endpoint returns 403 CSRF_TOKEN_MISSING (no endpoint is silently exempt — Sara's @ParameterizedTest pattern)

# Session revocation on password change — FR-AUTH-004

Given a user with N>1 active sessions, currently authenticated on session S_current
When they successfully change their password via /api/users/me/change-password (or whichever endpoint)
Then session S_current is preserved
And  all OTHER N-1 spring_session rows for the user are deleted
And  N-1 audit entries of type LOGOUT with reason=password_change are recorded

# Session revocation on password reset — FR-AUTH-008

Given a user with N active sessions
When a password reset succeeds via /api/auth/reset-password
Then all N spring_session rows for the user are deleted
And  N audit entries of type LOGOUT with reason=password_reset are recorded

# Admin force-logout — FR-AUTH-005

Given an admin (has ADMIN_USER permission) and a target user with N active sessions
When the admin calls POST /api/admin/users/{targetId}/force-logout
Then the response is 200 (or 204)
And  all N spring_session rows for the target user are deleted
And  an audit entry of type ADMIN_FORCE_LOGOUT is recorded with {adminUserId, targetUserId, sessionsRevokedCount=N}
And  the target user's next request returns 401

Given a non-admin user (no ADMIN_USER permission)
When they call POST /api/admin/users/{anyId}/force-logout
Then the response is 403

Given an unauthenticated request
When it calls POST /api/admin/users/{anyId}/force-logout
Then the response is 401

# Rate limit — FR-AUTH-010 / NFR-SEC-104

Given a clean rate-limit bucket for (IP=1.2.3.4, email=foo@bar)
When 5 failed login attempts arrive within 15 minutes
Then attempt #6 returns 429 with ErrorCode.TOO_MANY_LOGIN_ATTEMPTS
And  no LOGIN_FAILED audit entry is written for attempt #6 (no extra noise)
And  a LOGIN_RATE_LIMITED audit entry IS written with {email, ip, ua, attemptsInWindow=6}

Given a rate-limited bucket
When 15 minutes have passed
Then the bucket refills and the next attempt is evaluated normally

Given a (IP, email) pair with 4 failed attempts
When the 5th attempt is a SUCCESS
Then the bucket resets immediately (subsequent failures restart the count from 0)

# Forced disabled user (Sara's amendment)

Given a user with an active session
When their AppUser record is disabled by an admin (existing flow)
Then their next request bearing the session cookie returns 401
And  all spring_session rows for them are deleted in the same transaction as the disable
And  an audit entry of type ADMIN_FORCE_LOGOUT is recorded

Resolved decisions (carried from #522 review)

OQ Decision Rationale
CSRF token transport XSRF-TOKEN cookie (Spring default) One less moving part; works with use:enhance; cleaner than threading a JSON field through the login response
Force-logout UI Backend + audit only UI deferred to a separate v1.2.0 issue — keep this PR bounded
Force-logout CLI break-glass Yes — small bash script Tobi: useful when frontend is broken; cost is ~20 lines
Logout-vs-password-change scope Three distinct paths (logout=current only; password change=others; password reset / force-logout=all) Threat-model differentiation per Nora

Implementation hints (non-binding)

  • Bucket4j: add com.bucket4j:bucket4j-core (pin version). In-process buckets keyed by (InetAddress, email-lowercased). Bucket configuration: 5 tokens, refill 5 per 15 min. Wire as a filter or interceptor in front of AuthController.login.
  • Spring Security CSRF:
    http.csrf(c -> c
        .csrfTokenRepository(CookieCsrfTokenRepository.withHttpOnlyFalse())
        .csrfTokenRequestHandler(new CsrfTokenRequestAttributeHandler())
    );
    
    Remove the comment in SecurityConfig:54 ("CSRF disabled because SameSite holds the fort").
  • Session revocation API: Spring Session exposes FindByIndexNameSessionRepository.findByPrincipalName(email) and SessionRepository.deleteById(id). Wrap in a small SessionRegistryService inside the auth package (created in #523).
  • Service boundary: AuthService.invalidateAllSessionsForUser(userId, reason) and AuthService.invalidateAllSessionsExceptCurrent(userId, currentSessionId, reason). Called by:
    • UserService.changeOwnPassword(...) → exceptCurrent
    • PasswordResetService.resetPassword(...) → all
    • UserService.disableUser(...) → all (Sara's amendment)
    • AdminController.forceLogout(userId) → all
  • CSRF token failure UX: SvelteKit error boundary catches CSRF_TOKEN_MISSING 403s and shows:
    • de: "Diese Seite wurde zu lange offen gelassen. Bitte neu laden und erneut versuchen."
    • en: "This page has been open too long. Please reload and try again."
    • es: "Esta página ha estado abierta demasiado tiempo. Por favor, recarga e inténtalo de nuevo."
    • Plus a "Seite neu laden" / "Reload" button — don't show a raw 403 to non-technical users.
  • Login rate-limit UX:
    • de: "Zu viele Anmeldeversuche. Bitte warte 15 Minuten und versuche es erneut."
    • en: "Too many login attempts. Please wait 15 minutes and try again."
    • es: "Demasiados intentos de inicio de sesión. Por favor, espera 15 minutos."
    • Do NOT show the exact bucket configuration — text says "15 minutes" generically.
  • Password-change toast: when the password change succeeds and FR-AUTH-004 fires, the success toast SHALL say: "Passwort geändert. Alle anderen Geräte wurden abgemeldet." (Leonie — security signal masquerading as a UX message).
  • scripts/force-logout-user.sh: small bash helper that POSTs to the force-logout endpoint with admin credentials from env vars. Document in docs/infrastructure/.
  • CLAUDE.md update: add the rate-limit pattern (@RequirePermission table already covers admin force-logout if ADMIN_USER is reused).
  • Grafana panels (Tobi):
    • Active session count: SELECT COUNT(*) FROM spring_session (gauge)
    • Login rate split by success/failure/rate-limited (timeseries from audit log)
    • Logout rate by reason (manual / password-change / password-reset / admin-force) (timeseries)
    • Loki alert: rate({app="backend"} |= "LOGIN_FAILED" [5m]) > 10 — likely brute force
  • pg_dump exclusion: --exclude-table=spring_session* in the backup script. Document in docs/infrastructure/. Restores lose all sessions, which is acceptable.

New ErrorCodes (this issue)

Code Frontend message keys Notes
CSRF_TOKEN_MISSING error.csrf_token_missing de/en/es per Leonie above
TOO_MANY_LOGIN_ATTEMPTS error.too_many_login_attempts de/en/es per Leonie above

Both must be added to ErrorCode.java, mirrored in frontend/src/lib/shared/errors.ts, given a case in getErrorMessage(), and have i18n keys in all three locale files (per CLAUDE.md error-handling reminder).

New AuditKinds (this issue)

  • ADMIN_FORCE_LOGOUT
  • LOGIN_RATE_LIMITED
  • (extend LOGOUT payload with optional reason field — schema-level)

Out of scope

  • Force-logout admin UI (a separate v1.2.0 issue — I'll draft it once this is closer)
  • 2FA / MFA
  • Step-up auth for admin operations
  • SSO
  • Remember-me checkbox (defer — threat model change)
  • Public API tokens (machine-to-machine)

Priority & sizing

  • MoSCoW: Must for v1.1.0
  • Priority: P1-high (CSRF currently undefended; rate-limit absent)
  • Linked NFRs: NFR-SEC-103, NFR-SEC-104, NFR-OBSV-101, NFR-OBSV-102, NFR-COMPAT-101
  • Depends on: #523 (Phase 1)

Test strategy

Layer Tests
Unit AuthService.invalidateAllSessionsForUser, invalidateAllSessionsExceptCurrent; Bucket4j wrapper logic; CSRF token validation edge cases
Slice AuthController.login rate-limit path (429 + audit); admin force-logout 200/403/401; CSRF 403 paths
Integration (Testcontainers + real Spring Session + Flyway) password change → other sessions gone, current preserved; password reset → all sessions gone; admin force-logout → all gone + audit; disable user → all gone + audit
Parameterized regression @ParameterizedTest @MethodSource("writeEndpoints") — every POST/PUT/PATCH/DELETE in the OpenAPI spec returns 403 without CSRF (Sara's pattern — catches future controllers added without thinking)
Frontend CSRF token attached automatically by SvelteKit fetch wrapper; error-boundary handles CSRF_TOKEN_MISSING with the reload UX; rate-limit error shows the localized message
E2E (Playwright, multi-context) Two browser.newContext() instances logged in as same user; change password in one; verify the OTHER context's next request gets 401
Load (k6 smoke, optional) Hammer /api/auth/login with wrong password from one IP — verify 429 kicks in at attempt 6, audit log has the expected rows

Don't mock SessionRepository in integration tests — Spring Session JDBC semantics depend on the real Postgres-backed implementation (Sara).

Definition of Done

  • All ACs pass; tests start red, go green
  • Parameterized CSRF regression covers every write endpoint
  • Multi-context E2E test for password-change session invalidation
  • ErrorCode.CSRF_TOKEN_MISSING, TOO_MANY_LOGIN_ATTEMPTS added everywhere (Java + TS + i18n × 3)
  • AuditKind.ADMIN_FORCE_LOGOUT, LOGIN_RATE_LIMITED added; LOGOUT payload extended with reason
  • scripts/force-logout-user.sh lands + documented in docs/infrastructure/
  • pg_dump excludes spring_session* (backup script + doc)
  • Grafana panels + Loki alert provisioned
  • OpenAPI types regenerated
  • Coverage gate held
  • ADR appendix or follow-up ADR documenting the CSRF + rate-limit choices

— Filed as part of splitting #522 (replaces it together with #523). Closes the Phase-2 portion of the auth-model rewrite.

## Context This is **Phase 2 of 2** splitting #522. Blocked by #523 (Phase 1 — server-side session model). #523 replaces the cookie-as-credential model with opaque server-side sessions. With sessions in place, several defenses become possible that were inexpressible before: 1. **CSRF** can be re-enabled with Spring Security's synchronizer-token pattern (#523 keeps the existing `csrf.disable()` so the cutover stays scoped). 2. **Session revocation** becomes a real lever — password change, password reset, and admin force-logout can all delete the relevant `spring_session` rows. 3. **Login rate limiting** at the application layer (Bucket4j) becomes the right place for per-(IP, email) buckets instead of the coarse Caddy/fail2ban approach. This issue lands all three. Unlike #523, none of these are breaking — they are additive defenses on top of a working session model. ## ⚠ Hard dependency **Blocked by #523.** Do not start until that ships. The CSRF synchronizer token cookie depends on `HttpSession` being a real thing; the session-deletion paths depend on `spring_session` existing; the rate-limit error UX depends on `INVALID_CREDENTIALS` already being a frontend-mapped code. ## User need > **As an** operator > **I want** state-changing endpoints to require a server-issued CSRF token in addition to the SameSite cookie > **so that** a misconfiguration of cookie attributes or CORS allowed-origins does not silently open every POST/PUT/PATCH/DELETE to forgery. > **As an** operator > **I want** to be able to kick a compromised account's sessions > **so that** I can recover from a credential leak in seconds rather than waiting up to 24 h for cookie expiry. > **As a** user > **I want** changing my password to log my other devices out > **so that** I can react to "I think someone stole my session" by changing my password. > **As an** operator > **I want** a clear ceiling on brute-force login attempts > **so that** a stolen email address cannot be paired with a wordlist hammered against `/api/auth/login` at network speeds. ## Functional requirements **FR-AUTH-004** — Whenever an authenticated user changes their own password, all `spring_session` rows for that user OTHER than the current session SHALL be deleted. The current session SHALL be preserved (do not force the user to re-login on the device where they initiated the change). **FR-AUTH-005** — There SHALL be an admin-only endpoint `POST /api/admin/users/{userId}/force-logout` (or equivalent — final path is the implementer's call) that deletes all `spring_session` rows for the target user. Permission required: `ADMIN_USER` (existing permission — no new permission needed unless the implementer disagrees). UI surface is **deferred** — backend capability + audit log entry only in this issue. **FR-AUTH-008** — A successful password reset via `POST /api/auth/reset-password` SHALL delete all `spring_session` rows for the affected user. Unlike FR-AUTH-004, there is no "current session to preserve" — the password-reset flow is unauthenticated. **FR-AUTH-009 (CSRF)** — Spring Security's CSRF filter SHALL be re-enabled in `SecurityConfig` using `CookieCsrfTokenRepository.withHttpOnlyFalse()` (XSRF-TOKEN cookie pattern — decided in #522 review). State-changing requests (POST/PUT/PATCH/DELETE) without a valid `X-XSRF-TOKEN` header echoing the cookie's token SHALL return 403 with `ErrorCode.CSRF_TOKEN_MISSING`. GET/HEAD/OPTIONS/TRACE are exempt (Spring default). **FR-AUTH-010 (rate limit)** — `POST /api/auth/login` SHALL enforce a per-(IP, email) rate limit at the application layer using Bucket4j (or equivalent in-process token bucket). After 5 failed attempts within a 15-minute window, further attempts from that (IP, email) pair SHALL return HTTP 429 with `ErrorCode.TOO_MANY_LOGIN_ATTEMPTS`. The limit SHALL be independent of any reverse-proxy-layer protection (fail2ban via Caddy). A successful login SHALL reset the bucket. ## Non-functional requirements **NFR-SEC-103** — CSRF protection on state-changing endpoints SHALL be enforced by **two independent mechanisms**: 1. The session cookie's `SameSite=strict` attribute (from #523) 2. A server-issued CSRF token in an `XSRF-TOKEN` cookie that must be echoed in `X-XSRF-TOKEN` on writes Either alone is insufficient. Both SHALL pass for a state-changing request to succeed. **NFR-SEC-104** — Login attempts SHALL be rate-limited at the application layer per the FR-AUTH-010 thresholds. The 5/15min window MAY be tuned by configuration property without code change. **NFR-OBSV-101 (completion of #523's partial)** — Admin force-logout SHALL emit `AuditKind.ADMIN_FORCE_LOGOUT` audit entries with payload `{ adminUserId, targetUserId, sessionsRevokedCount, ip, ua }`. Bulk session deletion via password change/reset SHALL emit a single `LOGOUT` entry per session deleted, with payload `{ userId, ip, ua, reason: "password_change" | "password_reset" | "admin_force_logout" }`. **NFR-OBSV-102 (new)** — Rate-limited login attempts SHALL emit `AuditKind.LOGIN_RATE_LIMITED` entries with payload `{ email, ip, ua, attemptsInWindow }`. This is distinct from `LOGIN_FAILED` — the latter means "credentials wrong"; the former means "stop knocking." **NFR-COMPAT-101 (extends #523's)** — Existing dev tooling SHALL continue to work without code changes: - Vite proxy continues to forward `XSRF-TOKEN` and `X-XSRF-TOKEN` headers in both directions - E2E tests handle the CSRF token transparently (Playwright reads the cookie before the first write) - The `e2e` profile's admin seed is unaffected ## Acceptance criteria ``` # CSRF — FR-AUTH-009 / NFR-SEC-103 Given a state-changing endpoint (POST /api/persons, PUT /api/documents/{id}, DELETE /api/tags/{id}, etc.) When a request arrives with a valid session cookie but NO X-XSRF-TOKEN header Then the response is 403 with ErrorCode.CSRF_TOKEN_MISSING Given a state-changing endpoint When a request arrives with a valid session cookie and a stale or forged X-XSRF-TOKEN header Then the response is 403 with ErrorCode.CSRF_TOKEN_MISSING Given a successful login When the response is returned Then an XSRF-TOKEN cookie is present (NOT HttpOnly — JS must read it) Given a parameterized test enumerating every write endpoint in the OpenAPI spec When each is called with valid session + no CSRF token Then every endpoint returns 403 CSRF_TOKEN_MISSING (no endpoint is silently exempt — Sara's @ParameterizedTest pattern) # Session revocation on password change — FR-AUTH-004 Given a user with N>1 active sessions, currently authenticated on session S_current When they successfully change their password via /api/users/me/change-password (or whichever endpoint) Then session S_current is preserved And all OTHER N-1 spring_session rows for the user are deleted And N-1 audit entries of type LOGOUT with reason=password_change are recorded # Session revocation on password reset — FR-AUTH-008 Given a user with N active sessions When a password reset succeeds via /api/auth/reset-password Then all N spring_session rows for the user are deleted And N audit entries of type LOGOUT with reason=password_reset are recorded # Admin force-logout — FR-AUTH-005 Given an admin (has ADMIN_USER permission) and a target user with N active sessions When the admin calls POST /api/admin/users/{targetId}/force-logout Then the response is 200 (or 204) And all N spring_session rows for the target user are deleted And an audit entry of type ADMIN_FORCE_LOGOUT is recorded with {adminUserId, targetUserId, sessionsRevokedCount=N} And the target user's next request returns 401 Given a non-admin user (no ADMIN_USER permission) When they call POST /api/admin/users/{anyId}/force-logout Then the response is 403 Given an unauthenticated request When it calls POST /api/admin/users/{anyId}/force-logout Then the response is 401 # Rate limit — FR-AUTH-010 / NFR-SEC-104 Given a clean rate-limit bucket for (IP=1.2.3.4, email=foo@bar) When 5 failed login attempts arrive within 15 minutes Then attempt #6 returns 429 with ErrorCode.TOO_MANY_LOGIN_ATTEMPTS And no LOGIN_FAILED audit entry is written for attempt #6 (no extra noise) And a LOGIN_RATE_LIMITED audit entry IS written with {email, ip, ua, attemptsInWindow=6} Given a rate-limited bucket When 15 minutes have passed Then the bucket refills and the next attempt is evaluated normally Given a (IP, email) pair with 4 failed attempts When the 5th attempt is a SUCCESS Then the bucket resets immediately (subsequent failures restart the count from 0) # Forced disabled user (Sara's amendment) Given a user with an active session When their AppUser record is disabled by an admin (existing flow) Then their next request bearing the session cookie returns 401 And all spring_session rows for them are deleted in the same transaction as the disable And an audit entry of type ADMIN_FORCE_LOGOUT is recorded ``` ## Resolved decisions (carried from #522 review) | OQ | Decision | Rationale | |---|---|---| | CSRF token transport | XSRF-TOKEN cookie (Spring default) | One less moving part; works with `use:enhance`; cleaner than threading a JSON field through the login response | | Force-logout UI | Backend + audit only | UI deferred to a separate v1.2.0 issue — keep this PR bounded | | Force-logout CLI break-glass | **Yes** — small bash script | Tobi: useful when frontend is broken; cost is ~20 lines | | Logout-vs-password-change scope | Three distinct paths (logout=current only; password change=others; password reset / force-logout=all) | Threat-model differentiation per Nora | ## Implementation hints (non-binding) - **Bucket4j**: add `com.bucket4j:bucket4j-core` (pin version). In-process buckets keyed by `(InetAddress, email-lowercased)`. Bucket configuration: 5 tokens, refill 5 per 15 min. Wire as a filter or interceptor in front of `AuthController.login`. - **Spring Security CSRF**: ```java http.csrf(c -> c .csrfTokenRepository(CookieCsrfTokenRepository.withHttpOnlyFalse()) .csrfTokenRequestHandler(new CsrfTokenRequestAttributeHandler()) ); ``` Remove the comment in `SecurityConfig:54` ("CSRF disabled because SameSite holds the fort"). - **Session revocation API**: Spring Session exposes `FindByIndexNameSessionRepository.findByPrincipalName(email)` and `SessionRepository.deleteById(id)`. Wrap in a small `SessionRegistryService` inside the `auth` package (created in #523). - **Service boundary**: `AuthService.invalidateAllSessionsForUser(userId, reason)` and `AuthService.invalidateAllSessionsExceptCurrent(userId, currentSessionId, reason)`. Called by: - `UserService.changeOwnPassword(...)` → exceptCurrent - `PasswordResetService.resetPassword(...)` → all - `UserService.disableUser(...)` → all (Sara's amendment) - `AdminController.forceLogout(userId)` → all - **CSRF token failure UX**: SvelteKit error boundary catches `CSRF_TOKEN_MISSING` 403s and shows: - de: "Diese Seite wurde zu lange offen gelassen. Bitte neu laden und erneut versuchen." - en: "This page has been open too long. Please reload and try again." - es: "Esta página ha estado abierta demasiado tiempo. Por favor, recarga e inténtalo de nuevo." - Plus a "Seite neu laden" / "Reload" button — don't show a raw 403 to non-technical users. - **Login rate-limit UX**: - de: "Zu viele Anmeldeversuche. Bitte warte 15 Minuten und versuche es erneut." - en: "Too many login attempts. Please wait 15 minutes and try again." - es: "Demasiados intentos de inicio de sesión. Por favor, espera 15 minutos." - Do NOT show the exact bucket configuration — text says "15 minutes" generically. - **Password-change toast**: when the password change succeeds and FR-AUTH-004 fires, the success toast SHALL say: "Passwort geändert. Alle anderen Geräte wurden abgemeldet." (Leonie — security signal masquerading as a UX message). - **`scripts/force-logout-user.sh`**: small bash helper that POSTs to the force-logout endpoint with admin credentials from env vars. Document in `docs/infrastructure/`. - **CLAUDE.md update**: add the rate-limit pattern (`@RequirePermission` table already covers admin force-logout if `ADMIN_USER` is reused). - **Grafana panels** (Tobi): - Active session count: `SELECT COUNT(*) FROM spring_session` (gauge) - Login rate split by success/failure/rate-limited (timeseries from audit log) - Logout rate by reason (manual / password-change / password-reset / admin-force) (timeseries) - Loki alert: `rate({app="backend"} |= "LOGIN_FAILED" [5m]) > 10` — likely brute force - **`pg_dump` exclusion**: `--exclude-table=spring_session*` in the backup script. Document in `docs/infrastructure/`. Restores lose all sessions, which is acceptable. ## New ErrorCodes (this issue) | Code | Frontend message keys | Notes | |---|---|---| | `CSRF_TOKEN_MISSING` | `error.csrf_token_missing` | de/en/es per Leonie above | | `TOO_MANY_LOGIN_ATTEMPTS` | `error.too_many_login_attempts` | de/en/es per Leonie above | Both must be added to `ErrorCode.java`, mirrored in `frontend/src/lib/shared/errors.ts`, given a `case` in `getErrorMessage()`, and have i18n keys in all three locale files (per `CLAUDE.md` error-handling reminder). ## New AuditKinds (this issue) - `ADMIN_FORCE_LOGOUT` - `LOGIN_RATE_LIMITED` - (extend `LOGOUT` payload with optional `reason` field — schema-level) ## Out of scope - Force-logout admin UI (a separate v1.2.0 issue — I'll draft it once this is closer) - 2FA / MFA - Step-up auth for admin operations - SSO - Remember-me checkbox (defer — threat model change) - Public API tokens (machine-to-machine) ## Priority & sizing - MoSCoW: **Must** for v1.1.0 - Priority: **P1-high** (CSRF currently undefended; rate-limit absent) - Linked NFRs: NFR-SEC-103, NFR-SEC-104, NFR-OBSV-101, NFR-OBSV-102, NFR-COMPAT-101 - Depends on: #523 (Phase 1) ## Test strategy | Layer | Tests | |---|---| | Unit | `AuthService.invalidateAllSessionsForUser`, `invalidateAllSessionsExceptCurrent`; Bucket4j wrapper logic; CSRF token validation edge cases | | Slice | `AuthController.login` rate-limit path (429 + audit); admin force-logout 200/403/401; CSRF 403 paths | | Integration (Testcontainers + real Spring Session + Flyway) | password change → other sessions gone, current preserved; password reset → all sessions gone; admin force-logout → all gone + audit; disable user → all gone + audit | | Parameterized regression | `@ParameterizedTest @MethodSource("writeEndpoints")` — every POST/PUT/PATCH/DELETE in the OpenAPI spec returns 403 without CSRF (Sara's pattern — catches future controllers added without thinking) | | Frontend | CSRF token attached automatically by SvelteKit fetch wrapper; error-boundary handles `CSRF_TOKEN_MISSING` with the reload UX; rate-limit error shows the localized message | | E2E (Playwright, multi-context) | Two `browser.newContext()` instances logged in as same user; change password in one; verify the OTHER context's next request gets 401 | | Load (k6 smoke, optional) | Hammer `/api/auth/login` with wrong password from one IP — verify 429 kicks in at attempt 6, audit log has the expected rows | **Don't mock `SessionRepository`** in integration tests — Spring Session JDBC semantics depend on the real Postgres-backed implementation (Sara). ## Definition of Done - [ ] All ACs pass; tests start red, go green - [ ] Parameterized CSRF regression covers every write endpoint - [ ] Multi-context E2E test for password-change session invalidation - [ ] `ErrorCode.CSRF_TOKEN_MISSING`, `TOO_MANY_LOGIN_ATTEMPTS` added everywhere (Java + TS + i18n × 3) - [ ] `AuditKind.ADMIN_FORCE_LOGOUT`, `LOGIN_RATE_LIMITED` added; `LOGOUT` payload extended with `reason` - [ ] `scripts/force-logout-user.sh` lands + documented in `docs/infrastructure/` - [ ] `pg_dump` excludes `spring_session*` (backup script + doc) - [ ] Grafana panels + Loki alert provisioned - [ ] OpenAPI types regenerated - [ ] Coverage gate held - [ ] ADR appendix or follow-up ADR documenting the CSRF + rate-limit choices — Filed as part of splitting #522 (replaces it together with #523). Closes the Phase-2 portion of the auth-model rewrite.
marcel added this to the v1.1.0 milestone 2026-05-11 18:50:20 +02:00
marcel added the P1-highfeaturesecurity labels 2026-05-11 18:50:27 +02:00
Author
Owner

🏛️ Markus Keller — Senior Application Architect

Observations

  • Phase split between #523 and #524 is clean — #523 is the architectural cutover (breaking, one-shot), #524 is purely additive. Right call to keep them as separate PRs.
  • The service boundary in the implementation hints (AuthService.invalidateAllSessionsForUser / invalidateAllSessionsExceptCurrent, both living in the new auth package from #523) is correct. Three callers from three different domains — UserService.changeOwnPassword, PasswordResetService.resetPassword, and the new force-logout controller — all going through the published AuthService interface. Exactly the cross-domain pattern the layering rule wants.
  • The issue specifies POST /api/admin/users/{userId}/force-logout with permission ADMIN_USER. But the existing AdminController (backend/.../user/AdminController.java) is class-level @RequirePermission(Permission.ADMIN) and hosts infra ops (mass-import, backfills, thumbnails). The /api/admin/* namespace currently means "needs Permission.ADMIN". User-admin operations live in UserController under ADMIN_USER. The proposed endpoint mixes those two surfaces.

Recommendations

  • Land the force-logout endpoint on UserController, not AdminController. URL POST /api/users/{userId}/force-logout, permission @RequirePermission(Permission.ADMIN_USER). Reasons: (a) keeps the /api/admin/* namespace pure Permission.ADMIN — no exceptions to grep through during an audit; (b) UserController already owns user-admin endpoints under ADMIN_USER; (c) the layering rule says cross-domain data access goes through services — UserController.forceLogout calls userService.findById(...) then authService.invalidateAllSessionsForUser(...), the textbook cross-package boundary.
  • Sara's amendment ("disabled user → sessions deleted in same transaction") presumes a flow that doesn't exist yet. UserService.disableUser(...) is not in the code — AppUser.enabled is set to true on create at UserService.java:70/106/128 and never flipped. Either land a minimal disable endpoint here (own FR + AC + Permission.ADMIN_USER + audit) or split that AC out into a follow-up issue. I'd split — keeps this PR scoped to the three things its title promises.
  • Write the ADR before the code lands, as a separate file under docs/adr/. Suggested title: ADR-NNNN — defense-in-depth additions: CSRF tokens, session revocation, login rate limit. Context = post-#523 baseline. Decision = three additions. Alternatives = (1) relying on SameSite + CORS alone for CSRF, (2) keeping rate-limiting at fail2ban only. Consequences = in-process bucket state (no cluster support), CSRF cookie complexity in tests.
  • Docs cascade list to put in the PR description so reviewers can verify:
    • CLAUDE.md — error-code section gets two new entries (CSRF_TOKEN_MISSING, TOO_MANY_LOGIN_ATTEMPTS)
    • docs/ARCHITECTURE.md — permission table updated if ADMIN_USER's effective scope is meaningfully expanded
    • docs/architecture/c4/seq-auth-flow.puml — add the CSRF token cookie round-trip arrows (was already rewritten in #523; this is an amendment)
    • The audit-payload schema reference for LOGOUT (now carries reason) and the two new AuditKinds

Open Decisions

  • Endpoint surface for force-logout. /api/admin/users/{id}/force-logout (as written in the issue, under AdminController requiring ADMIN_USER) vs /api/users/{id}/force-logout (under UserController, matches the existing ADMIN_USER pattern there). The former matches admin-tool clustering; the latter keeps each controller's class-level permission consistent. I recommend the latter for the consistency reason.
## 🏛️ Markus Keller — Senior Application Architect ### Observations - Phase split between #523 and #524 is clean — #523 is the architectural cutover (breaking, one-shot), #524 is purely additive. Right call to keep them as separate PRs. - The service boundary in the implementation hints (`AuthService.invalidateAllSessionsForUser` / `invalidateAllSessionsExceptCurrent`, both living in the new `auth` package from #523) is correct. Three callers from three different domains — `UserService.changeOwnPassword`, `PasswordResetService.resetPassword`, and the new force-logout controller — all going through the published `AuthService` interface. Exactly the cross-domain pattern the layering rule wants. - The issue specifies `POST /api/admin/users/{userId}/force-logout` with permission `ADMIN_USER`. But the existing `AdminController` (`backend/.../user/AdminController.java`) is class-level `@RequirePermission(Permission.ADMIN)` and hosts infra ops (mass-import, backfills, thumbnails). The `/api/admin/*` namespace currently means "needs `Permission.ADMIN`". User-admin operations live in `UserController` under `ADMIN_USER`. The proposed endpoint mixes those two surfaces. ### Recommendations - **Land the force-logout endpoint on `UserController`, not `AdminController`.** URL `POST /api/users/{userId}/force-logout`, permission `@RequirePermission(Permission.ADMIN_USER)`. Reasons: (a) keeps the `/api/admin/*` namespace pure `Permission.ADMIN` — no exceptions to grep through during an audit; (b) `UserController` already owns user-admin endpoints under `ADMIN_USER`; (c) the layering rule says cross-domain data access goes through services — `UserController.forceLogout` calls `userService.findById(...)` then `authService.invalidateAllSessionsForUser(...)`, the textbook cross-package boundary. - **Sara's amendment ("disabled user → sessions deleted in same transaction") presumes a flow that doesn't exist yet.** `UserService.disableUser(...)` is not in the code — `AppUser.enabled` is set to `true` on create at `UserService.java:70/106/128` and never flipped. Either land a minimal disable endpoint here (own FR + AC + `Permission.ADMIN_USER` + audit) or split that AC out into a follow-up issue. I'd split — keeps this PR scoped to the three things its title promises. - **Write the ADR before the code lands**, as a separate file under `docs/adr/`. Suggested title: `ADR-NNNN — defense-in-depth additions: CSRF tokens, session revocation, login rate limit`. Context = post-#523 baseline. Decision = three additions. Alternatives = (1) relying on SameSite + CORS alone for CSRF, (2) keeping rate-limiting at fail2ban only. Consequences = in-process bucket state (no cluster support), CSRF cookie complexity in tests. - **Docs cascade list to put in the PR description** so reviewers can verify: - `CLAUDE.md` — error-code section gets two new entries (`CSRF_TOKEN_MISSING`, `TOO_MANY_LOGIN_ATTEMPTS`) - `docs/ARCHITECTURE.md` — permission table updated if `ADMIN_USER`'s effective scope is meaningfully expanded - `docs/architecture/c4/seq-auth-flow.puml` — add the CSRF token cookie round-trip arrows (was already rewritten in #523; this is an amendment) - The audit-payload schema reference for `LOGOUT` (now carries `reason`) and the two new AuditKinds ### Open Decisions - **Endpoint surface for force-logout.** `/api/admin/users/{id}/force-logout` (as written in the issue, under `AdminController` requiring `ADMIN_USER`) vs `/api/users/{id}/force-logout` (under `UserController`, matches the existing `ADMIN_USER` pattern there). The former matches admin-tool clustering; the latter keeps each controller's class-level permission consistent. I recommend the latter for the consistency reason.
Author
Owner

👨‍💻 Felix Brandt — Senior Fullstack Developer

Observations

  • Red/green discipline maps cleanly onto every FR — there are obvious first failing tests for each one.
  • UserService.changePassword(UUID, ChangePasswordDTO) at UserService.java:236 does NOT currently take the current session ID. To preserve the current session per FR-AUTH-004 it has to learn about the session at call time. Service stays servlet-free; the controller does the extraction.
  • Implementation hint says Bucket4j is keyed by (InetAddress, email-lowercased). InetAddress is a poor map key — .equals() can trigger DNS lookups and the contract is brittle across IPv4/IPv6 dual-stack hosts. Bucket4j's own docs use String keys.
  • The implementation hints are detailed enough that a TDD-driven implementer can start writing failing tests today (assuming #523 is merged first).

Recommendations

  • AuthService shape (both methods @Transactional, both return the revoked count so callers can emit the audit payload):
    int invalidateAllSessionsForUser(UUID userId, String reason);
    int invalidateAllSessionsExceptCurrent(UUID userId, String currentSessionId, String reason);
    
    Audit emission happens INSIDE the service. Callers cannot forget.
  • Inject the current session id at the controller layer. Pull request.getSession(false).getId() in the controller, pass through to the service:
    @PutMapping("/me/change-password")
    @RequirePermission(Permission.WRITE_ALL)  // verify which fits — likely a new "self" permission or none
    public void changePassword(@RequestBody ChangePasswordDTO dto, HttpServletRequest request) {
        UUID userId = currentUser().getId();
        String currentSessionId = request.getSession(false).getId();
        userService.changeOwnPassword(userId, currentSessionId, dto);
    }
    
  • Bucket key = String not InetAddress. Canonical form: ip.toLowerCase() + ":" + email.toLowerCase().strip(). Never call InetAddress.getByName(...) on user input — that would fire DNS lookups on attacker-controlled strings.
  • Bucket reset on success — order matters.
    if (!bucket.tryConsume(1)) {
        auditService.logRateLimited(email, ip, ua, attemptsInWindow);
        throw DomainException.tooManyRequests(ErrorCode.TOO_MANY_LOGIN_ATTEMPTS, ...);
    }
    AppUser user = authenticate(email, password);  // throws on bad creds
    bucketCache.invalidate(key);  // success → reset
    return user;
    
    Consume → authenticate → if success, invalidate. The bucket is consumed for the failed path; the reset only happens on success.
  • First failing tests, in TDD order (write these before any production code):
    1. AuthServiceTest.invalidateAllSessionsExceptCurrent_deletes_only_other_sessions — Mockito, findByPrincipalName returns 3 session ids, assert deleteById called with exactly the two NOT matching currentSessionId.
    2. LoginRateLimitFilterTest.returns_429_on_attempt_6_with_TOO_MANY_LOGIN_ATTEMPTS — drive 5×401, then assert 6×429.
    3. CsrfRegressionTest.every_write_endpoint_returns_403_without_csrf_token@ParameterizedTest, source = OpenAPI write endpoints, single session, no X-XSRF-TOKEN header.
    4. UserServiceIntegrationTest.changeOwnPassword_preserves_current_session_invalidates_others — Testcontainers, real spring_session, three sessions, change password from session A, assert A still present, B + C deleted, two LOGOUT audit rows written.
  • Extend the SvelteKit fetch wrapper centrally. Add the X-XSRF-TOKEN header injection in frontend/src/hooks.server.ts (the same handle layer that today proxies the auth_token cookie — will be re-shaped by #523). Do NOT sprinkle .set('X-XSRF-TOKEN', ...) across every +page.server.ts.
  • When the PR rips csrf().disable() out of SecurityConfig, delete the entire surrounding comment block. The comment explaining "CSRF defence is LOAD-BEARING on SameSite + CORS" is wrong once Spring Security CSRF is on — leaving it as a relic confuses future readers.
## 👨‍💻 Felix Brandt — Senior Fullstack Developer ### Observations - Red/green discipline maps cleanly onto every FR — there are obvious first failing tests for each one. - `UserService.changePassword(UUID, ChangePasswordDTO)` at `UserService.java:236` does NOT currently take the current session ID. To preserve the current session per FR-AUTH-004 it has to learn about the session at call time. Service stays servlet-free; the controller does the extraction. - Implementation hint says Bucket4j is keyed by `(InetAddress, email-lowercased)`. `InetAddress` is a poor map key — `.equals()` can trigger DNS lookups and the contract is brittle across IPv4/IPv6 dual-stack hosts. Bucket4j's own docs use `String` keys. - The implementation hints are detailed enough that a TDD-driven implementer can start writing failing tests today (assuming #523 is merged first). ### Recommendations - **`AuthService` shape (both methods `@Transactional`, both return the revoked count so callers can emit the audit payload):** ```java int invalidateAllSessionsForUser(UUID userId, String reason); int invalidateAllSessionsExceptCurrent(UUID userId, String currentSessionId, String reason); ``` Audit emission happens INSIDE the service. Callers cannot forget. - **Inject the current session id at the controller layer.** Pull `request.getSession(false).getId()` in the controller, pass through to the service: ```java @PutMapping("/me/change-password") @RequirePermission(Permission.WRITE_ALL) // verify which fits — likely a new "self" permission or none public void changePassword(@RequestBody ChangePasswordDTO dto, HttpServletRequest request) { UUID userId = currentUser().getId(); String currentSessionId = request.getSession(false).getId(); userService.changeOwnPassword(userId, currentSessionId, dto); } ``` - **Bucket key = `String` not `InetAddress`.** Canonical form: `ip.toLowerCase() + ":" + email.toLowerCase().strip()`. Never call `InetAddress.getByName(...)` on user input — that would fire DNS lookups on attacker-controlled strings. - **Bucket reset on success — order matters.** ```java if (!bucket.tryConsume(1)) { auditService.logRateLimited(email, ip, ua, attemptsInWindow); throw DomainException.tooManyRequests(ErrorCode.TOO_MANY_LOGIN_ATTEMPTS, ...); } AppUser user = authenticate(email, password); // throws on bad creds bucketCache.invalidate(key); // success → reset return user; ``` Consume → authenticate → if success, invalidate. The bucket is consumed for the failed path; the reset only happens on success. - **First failing tests, in TDD order (write these before any production code):** 1. `AuthServiceTest.invalidateAllSessionsExceptCurrent_deletes_only_other_sessions` — Mockito, `findByPrincipalName` returns 3 session ids, assert `deleteById` called with exactly the two NOT matching `currentSessionId`. 2. `LoginRateLimitFilterTest.returns_429_on_attempt_6_with_TOO_MANY_LOGIN_ATTEMPTS` — drive 5×401, then assert 6×429. 3. `CsrfRegressionTest.every_write_endpoint_returns_403_without_csrf_token` — `@ParameterizedTest`, source = OpenAPI write endpoints, single session, no `X-XSRF-TOKEN` header. 4. `UserServiceIntegrationTest.changeOwnPassword_preserves_current_session_invalidates_others` — Testcontainers, real `spring_session`, three sessions, change password from session A, assert A still present, B + C deleted, two `LOGOUT` audit rows written. - **Extend the SvelteKit fetch wrapper centrally.** Add the `X-XSRF-TOKEN` header injection in `frontend/src/hooks.server.ts` (the same handle layer that today proxies the `auth_token` cookie — will be re-shaped by #523). Do NOT sprinkle `.set('X-XSRF-TOKEN', ...)` across every `+page.server.ts`. - **When the PR rips `csrf().disable()` out of `SecurityConfig`, delete the entire surrounding comment block.** The comment explaining "CSRF defence is LOAD-BEARING on SameSite + CORS" is wrong once Spring Security CSRF is on — leaving it as a relic confuses future readers.
Author
Owner

⚙️ Tobias Wendt — DevOps & Platform Engineer

Observations

  • In-process Bucket4j is fine for the current deployment (single Hetzner VPS, single backend container, Caddy in front). Worth noting: it's a deployment-shape lock-in — the moment a second backend replica exists, a brute-forcer whose attempts hit instance A and then rotate to instance B gets a fresh bucket on B. Not a blocker today; document it next to the bucket config.
  • The Grafana panels listed in the implementation hints (active session count, login rate split, logout rate by reason) all derive from data this issue mandates be written.
  • pg_dump --exclude-table=spring_session* does glob both spring_session and spring_session_attributes — verified that pg_dump's pattern is a shell-style glob. Document explicitly that restores lose all sessions and that's acceptable (users re-login post-restore, which is the desired behaviour anyway after a DR event).
  • Caddy/fail2ban backstop remains valuable. Application-layer Bucket4j is per-(IP, email); fail2ban is per-IP across all endpoints — complementary, not redundant.

Recommendations

  • Pin bucket4j-core explicitly in backend/pom.xml — latest stable 8.x. Don't rely on a parent BOM. Renovate keeps it current.
  • Loki alert depends on log-mirroring of audit events. The proposed alert rate({app="backend"} |= "LOGIN_FAILED" [5m]) > 10 only fires if AuditService.logAfterCommit() ALSO writes a logger line, not just a DB row. Today's AuditService (backend/src/main/java/.../audit/AuditService.java) needs verification — if it's DB-only, the alert won't ever trigger. Two-line fix: add log.info("audit kind={} payload={}", kind, payload) next to the DB insert. That keeps Loki as the alerting plane and avoids adding a postgres exporter container on a CX32.
  • Add LOGIN_RATE_LIMITED as a third series on the login-rate Grafana panel. Three lines: success / failed / rate-limited. The slope of the rate-limited line is the brute-force signal; without it, you lose visibility precisely when an attack is most active.
  • scripts/force-logout-user.sh distribution & contract.
    • Document in docs/infrastructure/ next to s3-migration.md and friends.
    • Dependencies: curl + jq only — no bash 5 features.
    • Read APP_ADMIN_USERNAME and APP_ADMIN_PASSWORD from env — never accept on command line (leaks in ps).
    • Exit codes: 0 success, 1 auth failed, 2 target user not found, 3 network/HTTP error. Operator can wire it into a runbook.
  • Spring Boot 4 may ship a Spring Session JDBC starter — prefer that over hand-importing spring-session-jdbc + spring-session-core separately. Same reasoning as #523's implementation hints. Save one explicit version pin.
  • Health-check verification post-CSRF. /actuator/health is permitAll'd in SecurityConfig and is a GET — Spring's CSRF filter exempts GET by default, so the Docker Compose healthcheck: curl /actuator/health should keep passing. Confirm in CI before merge; if it breaks for any reason, that's a release blocker.
  • Renovate config update. When bucket4j-core lands, add a packageRule so its patch releases auto-merge like other dependencies.

Open Decisions

  • Brute-force alerting wiring. Loki + log.info mirror inside AuditService (two lines added, slight extra log volume) vs Postgres-side via a pg_exporter container (cleaner separation, adds a container to operate). Lean Loki + mirror for cost reasons; flag because some operators prefer to avoid touching AuditService for observability concerns.
## ⚙️ Tobias Wendt — DevOps & Platform Engineer ### Observations - In-process Bucket4j is fine for the current deployment (single Hetzner VPS, single backend container, Caddy in front). Worth noting: it's a deployment-shape lock-in — the moment a second backend replica exists, a brute-forcer whose attempts hit instance A and then rotate to instance B gets a fresh bucket on B. Not a blocker today; document it next to the bucket config. - The Grafana panels listed in the implementation hints (active session count, login rate split, logout rate by reason) all derive from data this issue mandates be written. - `pg_dump --exclude-table=spring_session*` does glob both `spring_session` and `spring_session_attributes` — verified that pg_dump's pattern is a shell-style glob. Document explicitly that restores lose all sessions and that's acceptable (users re-login post-restore, which is the desired behaviour anyway after a DR event). - Caddy/fail2ban backstop remains valuable. Application-layer Bucket4j is per-(IP, email); fail2ban is per-IP across all endpoints — complementary, not redundant. ### Recommendations - **Pin `bucket4j-core` explicitly in `backend/pom.xml`** — latest stable 8.x. Don't rely on a parent BOM. Renovate keeps it current. - **Loki alert depends on log-mirroring of audit events.** The proposed alert `rate({app="backend"} |= "LOGIN_FAILED" [5m]) > 10` only fires if `AuditService.logAfterCommit()` ALSO writes a logger line, not just a DB row. Today's `AuditService` (`backend/src/main/java/.../audit/AuditService.java`) needs verification — if it's DB-only, the alert won't ever trigger. Two-line fix: add `log.info("audit kind={} payload={}", kind, payload)` next to the DB insert. That keeps Loki as the alerting plane and avoids adding a postgres exporter container on a CX32. - **Add `LOGIN_RATE_LIMITED` as a third series on the login-rate Grafana panel.** Three lines: success / failed / rate-limited. The slope of the rate-limited line is the brute-force signal; without it, you lose visibility precisely when an attack is most active. - **`scripts/force-logout-user.sh` distribution & contract.** - Document in `docs/infrastructure/` next to `s3-migration.md` and friends. - Dependencies: `curl` + `jq` only — no bash 5 features. - Read `APP_ADMIN_USERNAME` and `APP_ADMIN_PASSWORD` from env — never accept on command line (leaks in `ps`). - Exit codes: `0` success, `1` auth failed, `2` target user not found, `3` network/HTTP error. Operator can wire it into a runbook. - **Spring Boot 4 may ship a Spring Session JDBC starter — prefer that over hand-importing `spring-session-jdbc` + `spring-session-core` separately.** Same reasoning as #523's implementation hints. Save one explicit version pin. - **Health-check verification post-CSRF.** `/actuator/health` is permitAll'd in `SecurityConfig` and is a GET — Spring's CSRF filter exempts GET by default, so the Docker Compose `healthcheck: curl /actuator/health` should keep passing. Confirm in CI before merge; if it breaks for any reason, that's a release blocker. - **Renovate config update.** When `bucket4j-core` lands, add a packageRule so its patch releases auto-merge like other dependencies. ### Open Decisions - **Brute-force alerting wiring.** Loki + `log.info` mirror inside `AuditService` (two lines added, slight extra log volume) vs Postgres-side via a `pg_exporter` container (cleaner separation, adds a container to operate). Lean Loki + mirror for cost reasons; flag because some operators prefer to avoid touching `AuditService` for observability concerns.
Author
Owner

🛡️ Nora "NullX" Steiner — Application Security Engineer

Observations

  • The defense-in-depth framing (SameSite + XSRF token) is correct. SameSite=strict is the primary CSRF control; the token is the belt-and-suspenders that catches misconfiguration regressions (cookie attribute flip to lax, CORS allowlist widening, browser-specific edge cases). Either alone is insufficient — this is exactly right.
  • Suppressing LOGIN_FAILED and emitting LOGIN_RATE_LIMITED instead on attempt #6 is correct — avoids log amplification under a sustained brute-force burst.
  • CookieCsrfTokenRepository.withHttpOnlyFalse() is the canonical Spring pattern. The token is a per-session nonce, not a secret — JS reading it is necessary for the double-submit pattern.
  • Two real concerns with the rate-limit design as currently specified:
    1. Per-(IP, email) keying is permissive against credential stuffing. With a leaked email wordlist of N entries, a single attacker IP can make 5×N attempts every 15 minutes before any application-layer control kicks in. The only thing stopping that is fail2ban at the network layer, and fail2ban thresholds default to noisier signals than application-layer audit emits.
    2. Bucket existence is a side channel. If the code path is "look up user → if user exists, consume bucket; if not, return generic error", an attacker can timing-probe email validity. The 204 on /api/auth/forgot-password is the gold standard for this; the rate-limit code path needs to match it.

Recommendations

  • Add a per-IP backstop bucket alongside the per-(IP, email) bucket. Suggested config: 20 failures / 15 min per IP across all (email) values. Both buckets must succeed for the attempt to be evaluated. This kills credential stuffing — even with a wordlist, the attacker hits the IP cap after 20 attempts regardless of how many distinct emails they cycle.

  • Always consume the bucket BEFORE looking up the user. Order:

    1. Check both buckets have tokens available.
    2. Consume one token from each.
    3. Look up email, validate password.
    4. On success, invalidate the per-(IP, email) bucket.

    This eliminates the "email existence" timing/state side channel: bucket behaviour is identical for valid and invalid emails. An attacker can no longer probe the user table by counting allowed attempts.

  • Lock down the IP source. Pick exactly one of X-Forwarded-For (when Caddy strips and rewrites) or request.getRemoteAddr() (raw). Document it in SecurityConfig and verify Caddy strips client-sent X-Forwarded-For before forwarding — otherwise an attacker can rotate buckets by spoofing the header. The bucket's security guarantee is only as strong as the IP source identity.

  • CSRF token rotation on login — Spring Security's CsrfAuthenticationStrategy rotates the token on SessionFixationProtectionStrategy.onAuthentication. Verify this still fires after #523's new /api/auth/login succeeds. Without it, a token pre-seeded before authentication remains valid post-login. One-line config check; must be in the test suite.

  • Cap LOGIN_RATE_LIMITED audit emission. An attacker who continues hammering after lockout writes one audit row per attempt, unbounded — that's an audit-table DoS vector. Debounce in-memory: emit one LOGIN_RATE_LIMITED per (ip, email) per window. Detection signal lost: none — one entry per window is enough for an alert. Implementation: a Caffeine cache with 15-min TTL keyed by the same bucket key.

  • The XSRF-TOKEN cookie attributes must mirror the session cookie's risk posture. Explicitly set SameSite=strict, Secure=true in non-dev profiles, Path=/. The token is not a secret, but downgrading attributes here widens the cross-site surface for token theft via subdomain MITM.

  • Force-logout response must not echo session IDs. Return { sessionsRevokedCount: N }. Session IDs are sensitive identifiers — minimum disclosure principle.

  • Sara's "disabled user → all sessions invalidated in same transaction" amendment is correct, and the test must assert atomicity. If the disable commits before session deletion, there's a window where a still-cookie-authenticated user makes one more request. Integration test with @Transactional rollback proves it.

Open Decisions

  • Rate-limit threshold: 5/15min (issue) vs 10/15min (safer for senior audience). The 60+ transcriber audience will mistype passwords routinely. 5 is the right ceiling for a public-facing service; 10 is more humane for a closed family archive of <20 users. The per-IP backstop above is what catches actual attacks, so the per-(IP, email) ceiling can be loosened without weakening the security posture. NFR-SEC-104 says configurable — pick a sensible default.
  • Per-IP backstop scope. Limit to /api/auth/login only (simpler), or extend to /api/auth/login + /api/auth/forgot-password + /api/auth/reset-password (catches enumeration via the reset flow too). Broader scope is the more defensive choice; narrower is easier to reason about.
## 🛡️ Nora "NullX" Steiner — Application Security Engineer ### Observations - The defense-in-depth framing (SameSite + XSRF token) is correct. SameSite=strict is the primary CSRF control; the token is the belt-and-suspenders that catches misconfiguration regressions (cookie attribute flip to `lax`, CORS allowlist widening, browser-specific edge cases). Either alone is insufficient — this is exactly right. - Suppressing `LOGIN_FAILED` and emitting `LOGIN_RATE_LIMITED` instead on attempt #6 is correct — avoids log amplification under a sustained brute-force burst. - `CookieCsrfTokenRepository.withHttpOnlyFalse()` is the canonical Spring pattern. The token is a per-session nonce, not a secret — JS reading it is necessary for the double-submit pattern. - Two real concerns with the rate-limit design as currently specified: 1. **Per-(IP, email) keying is permissive against credential stuffing.** With a leaked email wordlist of N entries, a single attacker IP can make `5×N` attempts every 15 minutes before any application-layer control kicks in. The only thing stopping that is fail2ban at the network layer, and fail2ban thresholds default to noisier signals than application-layer audit emits. 2. **Bucket existence is a side channel.** If the code path is "look up user → if user exists, consume bucket; if not, return generic error", an attacker can timing-probe email validity. The 204 on `/api/auth/forgot-password` is the gold standard for this; the rate-limit code path needs to match it. ### Recommendations - **Add a per-IP backstop bucket alongside the per-(IP, email) bucket.** Suggested config: `20 failures / 15 min` per IP across all (email) values. Both buckets must succeed for the attempt to be evaluated. This kills credential stuffing — even with a wordlist, the attacker hits the IP cap after 20 attempts regardless of how many distinct emails they cycle. - **Always consume the bucket BEFORE looking up the user.** Order: 1. Check both buckets have tokens available. 2. Consume one token from each. 3. Look up email, validate password. 4. On success, invalidate the per-(IP, email) bucket. This eliminates the "email existence" timing/state side channel: bucket behaviour is identical for valid and invalid emails. An attacker can no longer probe the user table by counting allowed attempts. - **Lock down the IP source.** Pick exactly one of `X-Forwarded-For` (when Caddy strips and rewrites) or `request.getRemoteAddr()` (raw). Document it in `SecurityConfig` and verify Caddy strips client-sent `X-Forwarded-For` before forwarding — otherwise an attacker can rotate buckets by spoofing the header. The bucket's security guarantee is only as strong as the IP source identity. - **CSRF token rotation on login** — Spring Security's `CsrfAuthenticationStrategy` rotates the token on `SessionFixationProtectionStrategy.onAuthentication`. Verify this still fires after #523's new `/api/auth/login` succeeds. Without it, a token pre-seeded before authentication remains valid post-login. One-line config check; must be in the test suite. - **Cap `LOGIN_RATE_LIMITED` audit emission.** An attacker who continues hammering after lockout writes one audit row per attempt, unbounded — that's an audit-table DoS vector. Debounce in-memory: emit one `LOGIN_RATE_LIMITED` per (ip, email) per window. Detection signal lost: none — one entry per window is enough for an alert. Implementation: a `Caffeine` cache with 15-min TTL keyed by the same bucket key. - **The `XSRF-TOKEN` cookie attributes must mirror the session cookie's risk posture.** Explicitly set `SameSite=strict`, `Secure=true` in non-dev profiles, `Path=/`. The token is not a secret, but downgrading attributes here widens the cross-site surface for token theft via subdomain MITM. - **Force-logout response must not echo session IDs.** Return `{ sessionsRevokedCount: N }`. Session IDs are sensitive identifiers — minimum disclosure principle. - **Sara's "disabled user → all sessions invalidated in same transaction" amendment is correct, and the test must assert atomicity.** If the disable commits before session deletion, there's a window where a still-cookie-authenticated user makes one more request. Integration test with `@Transactional` rollback proves it. ### Open Decisions - **Rate-limit threshold: 5/15min (issue) vs 10/15min (safer for senior audience).** The 60+ transcriber audience will mistype passwords routinely. 5 is the right ceiling for a public-facing service; 10 is more humane for a closed family archive of <20 users. The per-IP backstop above is what catches actual attacks, so the per-(IP, email) ceiling can be loosened without weakening the security posture. NFR-SEC-104 says configurable — pick a sensible default. - **Per-IP backstop scope.** Limit to `/api/auth/login` only (simpler), or extend to `/api/auth/login` + `/api/auth/forgot-password` + `/api/auth/reset-password` (catches enumeration via the reset flow too). Broader scope is the more defensive choice; narrower is easier to reason about.
Author
Owner

🧪 Sara Holt — Senior QA Engineer

Observations

  • The test pyramid in this issue is well-shaped — unit / slice / integration / parameterized regression / frontend / E2E / k6 — and it puts the right test at the right layer.
  • "Don't mock SessionRepository in integration tests" is correct. Spring Session JDBC's SQL behaviour (spring_session row deletion with cascade to spring_session_attributes) only manifests against real Postgres. H2 would hide it.
  • The @ParameterizedTest @MethodSource("writeEndpoints") CSRF regression is the highest-value test in the whole package — it catches every future controller someone adds without thinking about CSRF.
  • Multi-context Playwright E2E for password-change session invalidation is the right canonical test for FR-AUTH-004. Two browser.newContext(), login in both, change password in one, assert the OTHER returns 401 on next request.
  • "Forced disabled user" AC presumes a flow that doesn't exist (grep confirms UserService has no disableUser method). My amendment, but it's blocked on a precondition this PR doesn't deliver.

Recommendations

  • Inject a TimeMeter for rate-limit tests. Never Thread.sleep. Bucket4j supports custom time sources:

    TimeMeter clock = new TimeMeter() {
        private final AtomicLong nanos = new AtomicLong();
        public long currentTimeNanos() { return nanos.get(); }
        void advance(Duration d) { nanos.addAndGet(d.toNanos()); }
        public boolean isWallClockBased() { return false; }
    };
    Bucket bucket = Bucket.builder()
        .addLimit(Bandwidth.simple(5, Duration.ofMinutes(15)))
        .withCustomTimePrecision(clock)
        .build();
    

    Test runs in milliseconds. No flakiness. No 15-minute test executions.

  • CSRF parameterized regression — drive the seed list from the generated OpenAPI spec, not a hand-maintained list:

    static Stream<Arguments> writeEndpoints() {
        OpenAPI api = new OpenAPIV3Parser().read(
            "target/generated-resources/openapi.json");
        return api.getPaths().entrySet().stream()
            .flatMap(p -> p.getValue().readOperationsMap().entrySet().stream()
                .filter(e -> Set.of(POST, PUT, PATCH, DELETE).contains(e.getKey()))
                .map(e -> Arguments.of(e.getKey().name(), p.getKey())));
    }
    

    Zero maintenance when a new endpoint ships. The test list grows with the spec.

  • Audit-row assertions need Awaitility because they're written from an @TransactionalEventListener(AFTER_COMMIT). If a test assertion runs before the after-commit phase fires, it flakes. Wrap each audit-presence assertion in await().atMost(5, SECONDS).untilAsserted(() -> assertThat(...)).

  • Permission-boundary tests must include both 401 and 403 for every new endpoint:

    • 401 (no auth): no with(user(...)), expect status().isUnauthorized()
    • 403 (wrong perm): with(user("u").authorities(new SimpleGrantedAuthority("READ_ALL"))), expect status().isForbidden()

    These are different security failures; the AC list already has them for force-logout — keep the pattern for any new endpoint.

  • Coverage gate: hold the 88% branch line in pom.xml. New code in auth package and the new code in UserService.changeOwnPassword must each carry ≥80% own-branch coverage. If this PR dips the global line below 88%, it doesn't merge.

  • The multi-context E2E should assert audit + UI together. In one test: change password in context A, verify B's request returns 401 AND the audit table contains exactly N-1 LOGOUT rows with reason=password_change. If audit is missing, observability is silently broken — that's exactly the failure mode Nora's findings call out.

  • The "forced disabled user" AC can't have a green test until the disable flow exists. Split it into a follow-up issue (#525) and remove from #524's scope, or land a minimal disable endpoint in this PR. I'd vote for the split — keeps the test pyramid for #524 honest.

  • k6 smoke is optional per the issue text — keep it optional. A 5-minute k6 run on the merge step is cheap insurance against the rate-limit filter regressing latency. If the implementer skips it, that's fine, but having the script committed makes the future check trivial.

Open Decisions (none from my angle — recommendations above are concrete)

## 🧪 Sara Holt — Senior QA Engineer ### Observations - The test pyramid in this issue is well-shaped — unit / slice / integration / parameterized regression / frontend / E2E / k6 — and it puts the right test at the right layer. - "Don't mock `SessionRepository` in integration tests" is correct. Spring Session JDBC's SQL behaviour (`spring_session` row deletion with cascade to `spring_session_attributes`) only manifests against real Postgres. H2 would hide it. - The `@ParameterizedTest @MethodSource("writeEndpoints")` CSRF regression is the highest-value test in the whole package — it catches every future controller someone adds without thinking about CSRF. - Multi-context Playwright E2E for password-change session invalidation is the right canonical test for FR-AUTH-004. Two `browser.newContext()`, login in both, change password in one, assert the OTHER returns 401 on next request. - "Forced disabled user" AC presumes a flow that doesn't exist (grep confirms `UserService` has no `disableUser` method). My amendment, but it's blocked on a precondition this PR doesn't deliver. ### Recommendations - **Inject a `TimeMeter` for rate-limit tests. Never `Thread.sleep`.** Bucket4j supports custom time sources: ```java TimeMeter clock = new TimeMeter() { private final AtomicLong nanos = new AtomicLong(); public long currentTimeNanos() { return nanos.get(); } void advance(Duration d) { nanos.addAndGet(d.toNanos()); } public boolean isWallClockBased() { return false; } }; Bucket bucket = Bucket.builder() .addLimit(Bandwidth.simple(5, Duration.ofMinutes(15))) .withCustomTimePrecision(clock) .build(); ``` Test runs in milliseconds. No flakiness. No 15-minute test executions. - **CSRF parameterized regression — drive the seed list from the generated OpenAPI spec, not a hand-maintained list:** ```java static Stream<Arguments> writeEndpoints() { OpenAPI api = new OpenAPIV3Parser().read( "target/generated-resources/openapi.json"); return api.getPaths().entrySet().stream() .flatMap(p -> p.getValue().readOperationsMap().entrySet().stream() .filter(e -> Set.of(POST, PUT, PATCH, DELETE).contains(e.getKey())) .map(e -> Arguments.of(e.getKey().name(), p.getKey()))); } ``` Zero maintenance when a new endpoint ships. The test list grows with the spec. - **Audit-row assertions need `Awaitility`** because they're written from an `@TransactionalEventListener(AFTER_COMMIT)`. If a test assertion runs before the after-commit phase fires, it flakes. Wrap each audit-presence assertion in `await().atMost(5, SECONDS).untilAsserted(() -> assertThat(...))`. - **Permission-boundary tests must include both 401 and 403** for every new endpoint: - 401 (no auth): no `with(user(...))`, expect `status().isUnauthorized()` - 403 (wrong perm): `with(user("u").authorities(new SimpleGrantedAuthority("READ_ALL")))`, expect `status().isForbidden()` These are different security failures; the AC list already has them for force-logout — keep the pattern for any new endpoint. - **Coverage gate: hold the 88% branch line in `pom.xml`.** New code in `auth` package and the new code in `UserService.changeOwnPassword` must each carry ≥80% own-branch coverage. If this PR dips the global line below 88%, it doesn't merge. - **The multi-context E2E should assert audit + UI together.** In one test: change password in context A, verify B's request returns 401 AND the audit table contains exactly N-1 `LOGOUT` rows with `reason=password_change`. If audit is missing, observability is silently broken — that's exactly the failure mode Nora's findings call out. - **The "forced disabled user" AC can't have a green test until the disable flow exists.** Split it into a follow-up issue (#525) and remove from #524's scope, or land a minimal disable endpoint in this PR. I'd vote for the split — keeps the test pyramid for #524 honest. - **k6 smoke is optional per the issue text — keep it optional.** A 5-minute k6 run on the merge step is cheap insurance against the rate-limit filter regressing latency. If the implementer skips it, that's fine, but having the script committed makes the future check trivial. ### Open Decisions _(none from my angle — recommendations above are concrete)_
Author
Owner

🎨 Leonie Voss — UI/UX Design Lead

Observations

  • The CSRF error text is calm, three-locale, and ends in a reload CTA — exactly right. Not a raw 403.
  • The rate-limit message correctly hides bucket internals from the user. "15 minutes" is a user-facing word, not a reveal.
  • The password-change toast ("Passwort geändert. Alle anderen Geräte wurden abgemeldet.") is a strong UX-as-security pattern — the user gets a confirmation AND the security signal in the same touchpoint. Quietly excellent.
  • All three locales (de/en/es) are present in the implementation hints for CSRF + rate-limit. No English-only oversight.
  • The 60+ transcriber audience is most likely to encounter the CSRF error — they leave a transcription page open for hours during Kurrent decipherment. The UX has to be calm and obvious.

Recommendations

  • CSRF reload button: min-h-[44px] min-w-[44px] per WCAG 2.2 SC 2.5.8. Critical for the senior audience using a stylus or imprecise touch. Use the <button> element, not an <a>.
  • CSRF error container needs aria-live="polite" so screen-reader users get the message announced. Not a modal, not a banner — a polite live region above the page content.
  • Rate-limit message: drop the hard "15 minutes" claim. With a rolling token-bucket window (the Bucket4j idiom), the bucket may refill in 1 minute, not 15. Two options:
    • Static (recommended for first pass):

      • de: "Zu viele Anmeldeversuche. Bitte versuche es später erneut."
      • en: "Too many login attempts. Please try again later."
      • es: "Demasiados intentos de inicio de sesión. Por favor, inténtalo más tarde."

      Friendlier for the 60+ audience, gives up no useful info, no extra response field needed.

    • Dynamic countdown: backend returns retryAfterSeconds in the 429 body, frontend renders a live-updating "Try again in MM:SS". More work, helpful for stuck admins, distracting for seniors. Defer until support friction proves it's needed.

  • Password-change toast also needs en/es. The hints only show the German. Add:
    • en: "Password changed. All other devices have been signed out."
    • es: "Contraseña cambiada. Los demás dispositivos han sido desconectados."
  • Login form rate-limit response must not blank the email field. When the 429 renders, the email input stays populated. Otherwise a senior user re-types their email and loses context. Survives via SvelteKit's use:enhance + form return value — make sure the form-action fail(429, { email }) carries it back.
  • Color is not the only cue on the 429. Pair the red text/border with an icon (<svg> warning) and aria-invalid="true" on the email input. Project convention; restate here so the implementer doesn't miss it on this specific surface.
  • Force-logout admin UI is correctly deferred to v1.2.0 — but the audit row from this PR feeds aktivitaeten/ (the unified activity feed). Verify admins will see their own force-logout actions there post-merge; if not, file as a v1.2.0 follow-up alongside the UI.
  • The CSRF "Seite neu laden" / "Reload" button should be the only CTA, not paired with a "Cancel" or "Go back" — the user has no meaningful "back" from a CSRF failure. Single primary action, no decision burden.

Open Decisions

  • Rate-limit error text: static "later" vs dynamic countdown. Static is faster to ship, friendlier to the 60+ audience, leaks no internals. Dynamic countdown helps power users but adds a response field and frontend timer logic. I recommend static for v1.1.0 and revisit only if support traffic shows confusion.
## 🎨 Leonie Voss — UI/UX Design Lead ### Observations - The CSRF error text is calm, three-locale, and ends in a reload CTA — exactly right. Not a raw 403. - The rate-limit message correctly hides bucket internals from the user. "15 minutes" is a user-facing word, not a reveal. - The password-change toast ("Passwort geändert. Alle anderen Geräte wurden abgemeldet.") is a strong UX-as-security pattern — the user gets a confirmation AND the security signal in the same touchpoint. Quietly excellent. - All three locales (de/en/es) are present in the implementation hints for CSRF + rate-limit. No English-only oversight. - The 60+ transcriber audience is most likely to encounter the CSRF error — they leave a transcription page open for hours during Kurrent decipherment. The UX has to be calm and obvious. ### Recommendations - **CSRF reload button: `min-h-[44px] min-w-[44px]`** per WCAG 2.2 SC 2.5.8. Critical for the senior audience using a stylus or imprecise touch. Use the `<button>` element, not an `<a>`. - **CSRF error container needs `aria-live="polite"`** so screen-reader users get the message announced. Not a modal, not a banner — a polite live region above the page content. - **Rate-limit message: drop the hard "15 minutes" claim.** With a rolling token-bucket window (the Bucket4j idiom), the bucket may refill in 1 minute, not 15. Two options: - **Static (recommended for first pass):** - de: "Zu viele Anmeldeversuche. Bitte versuche es später erneut." - en: "Too many login attempts. Please try again later." - es: "Demasiados intentos de inicio de sesión. Por favor, inténtalo más tarde." Friendlier for the 60+ audience, gives up no useful info, no extra response field needed. - **Dynamic countdown:** backend returns `retryAfterSeconds` in the 429 body, frontend renders a live-updating "Try again in MM:SS". More work, helpful for stuck admins, distracting for seniors. Defer until support friction proves it's needed. - **Password-change toast also needs en/es.** The hints only show the German. Add: - en: "Password changed. All other devices have been signed out." - es: "Contraseña cambiada. Los demás dispositivos han sido desconectados." - **Login form rate-limit response must not blank the email field.** When the 429 renders, the email input stays populated. Otherwise a senior user re-types their email and loses context. Survives via SvelteKit's `use:enhance` + `form` return value — make sure the form-action `fail(429, { email })` carries it back. - **Color is not the only cue on the 429.** Pair the red text/border with an icon (`<svg>` warning) and `aria-invalid="true"` on the email input. Project convention; restate here so the implementer doesn't miss it on this specific surface. - **Force-logout admin UI is correctly deferred to v1.2.0** — but the audit row from this PR feeds `aktivitaeten/` (the unified activity feed). Verify admins will see their own force-logout actions there post-merge; if not, file as a v1.2.0 follow-up alongside the UI. - **The CSRF "Seite neu laden" / "Reload" button should be the only CTA**, not paired with a "Cancel" or "Go back" — the user has no meaningful "back" from a CSRF failure. Single primary action, no decision burden. ### Open Decisions - **Rate-limit error text: static "later" vs dynamic countdown.** Static is faster to ship, friendlier to the 60+ audience, leaks no internals. Dynamic countdown helps power users but adds a response field and frontend timer logic. I recommend static for v1.1.0 and revisit only if support traffic shows confusion.
Author
Owner

📋 Elicit — Requirements Engineer

Observations

  • This issue is unusually well-structured: FR/NFR IDs, EARS-flavoured wording on NFRs, Gherkin acceptance criteria, MoSCoW + priority + linked NFRs, explicit Out-of-Scope, Definition of Done, and a Resolved-Decisions carry-forward table from #522. Better than most professional teams ship. An LLM implementer can act on this with minimal interpretation.
  • The hard-dependency declaration on #523 (⚠ block at top) is loud and explicit — won't get acted on prematurely.
  • A few latent ambiguities and one buried inconsistency are worth resolving before implementation begins.

Recommendations

  • Disambiguate the token-bucket model in FR-AUTH-010 / NFR-SEC-104. "5 tokens, refill 5 per 15 min" can mean (a) rolling token bucket — 1 token refills every 3 minutes, the Bucket4j idiom — or (b) fixed 15-minute window — all 5 reset at once at the window boundary. The AC "When 15 minutes have passed, the bucket refills" reads like (b) but Bucket4j defaults to (a). Pick one and amend the FR + the AC. Recommend (a): fairer to a user who has been waiting, and standard Bucket4j shape.
  • Define attemptsInWindow precisely (NFR-OBSV-102 audit payload). Is it the cumulative number of attempts that have entered the bucket since it started filling, or the count at the moment of the rejected request (i.e. always ≥6)? Either is fine; the schema needs to say which.
  • Surface the AdminController/UserController endpoint inconsistency as a TBD. FR-AUTH-005 places force-logout at POST /api/admin/users/{userId}/force-logout with permission ADMIN_USER. Today's AdminController is class-level @RequirePermission(Permission.ADMIN) and hosts infra ops. Either the new endpoint sits awkwardly under the wrong permission namespace, or it has to land on UserController at POST /api/users/{userId}/force-logout. Markus flagged the same. Park it in TBD so the implementer can't pick wrong silently.
  • The "Forced disabled user" AC references a flow that does not exist. Grep confirms UserService has no disableUser method and AppUser.enabled is never set to false anywhere in production code. Two options:
    • (a) Add FR-AUTH-011 — Admin can disable a user account here, with its own AC (PATCH /api/users/{userId}/disable requiring ADMIN_USER, sets enabled=false, invalidates sessions in the same transaction, emits ADMIN_FORCE_LOGOUT). Adds one endpoint to scope.
    • (b) Split into a follow-up issue (#525), remove the AC from #524.
    • Recommend (b) — preserves the Phase-1/Phase-2 split discipline. The disable flow is its own story.
  • Verify NFR-COMPAT-101's Vite-proxy claim. "Vite proxy continues to forward XSRF-TOKEN and X-XSRF-TOKEN headers in both directions" — vite.config.ts currently only injects Authorization from the auth_token cookie (vite.config.ts:23–31). After #523 strips that block, the proxy is back to passing headers through unchanged. That MAY be sufficient for CSRF (Vite's proxy is transparent for non-injected headers by default), but it warrants explicit verification with one cURL through the running dev server. Add a footnote that says "verify" so it doesn't get assumed.
  • Add to Definition of Done: "docs/ARCHITECTURE.md permission table updated to reflect that ADMIN_USER now governs session revocation in addition to user CRUD." Even though no new Permission enum value is added, the scope of an existing one expands meaningfully.
  • Open a TBD register at the bottom of the issue body, with the items I and the other personas have raised. Suggested IDs: OQ-524-01OQ-524-06. Pre-#522, this issue assumed all questions were resolved — but the cross-persona review surfaces new ones, and a register prevents them from being silently re-resolved during implementation.

Open Decisions

  • Token-bucket semantics: rolling refill (1 token / 3 min) vs fixed window (all 5 reset at 15-min mark). Pick one before coding to avoid re-doing the rate-limit tests.
  • Scope of the disable-user flow: keep in #524 as added FR-AUTH-011 vs split to #525. Cleaner is split.
## 📋 Elicit — Requirements Engineer ### Observations - This issue is unusually well-structured: FR/NFR IDs, EARS-flavoured wording on NFRs, Gherkin acceptance criteria, MoSCoW + priority + linked NFRs, explicit Out-of-Scope, Definition of Done, and a Resolved-Decisions carry-forward table from #522. Better than most professional teams ship. An LLM implementer can act on this with minimal interpretation. - The hard-dependency declaration on #523 (⚠ block at top) is loud and explicit — won't get acted on prematurely. - A few latent ambiguities and one buried inconsistency are worth resolving before implementation begins. ### Recommendations - **Disambiguate the token-bucket model in FR-AUTH-010 / NFR-SEC-104.** "5 tokens, refill 5 per 15 min" can mean (a) rolling token bucket — 1 token refills every 3 minutes, the Bucket4j idiom — or (b) fixed 15-minute window — all 5 reset at once at the window boundary. The AC "When 15 minutes have passed, the bucket refills" reads like (b) but Bucket4j defaults to (a). Pick one and amend the FR + the AC. Recommend (a): fairer to a user who has been waiting, and standard Bucket4j shape. - **Define `attemptsInWindow` precisely** (NFR-OBSV-102 audit payload). Is it the cumulative number of attempts that have entered the bucket since it started filling, or the count at the moment of the rejected request (i.e. always `≥6`)? Either is fine; the schema needs to say which. - **Surface the AdminController/UserController endpoint inconsistency as a TBD.** FR-AUTH-005 places force-logout at `POST /api/admin/users/{userId}/force-logout` with permission `ADMIN_USER`. Today's `AdminController` is class-level `@RequirePermission(Permission.ADMIN)` and hosts infra ops. Either the new endpoint sits awkwardly under the wrong permission namespace, or it has to land on `UserController` at `POST /api/users/{userId}/force-logout`. Markus flagged the same. Park it in TBD so the implementer can't pick wrong silently. - **The "Forced disabled user" AC references a flow that does not exist.** Grep confirms `UserService` has no `disableUser` method and `AppUser.enabled` is never set to `false` anywhere in production code. Two options: - **(a)** Add `FR-AUTH-011 — Admin can disable a user account` here, with its own AC (`PATCH /api/users/{userId}/disable` requiring `ADMIN_USER`, sets `enabled=false`, invalidates sessions in the same transaction, emits `ADMIN_FORCE_LOGOUT`). Adds one endpoint to scope. - **(b)** Split into a follow-up issue (#525), remove the AC from #524. - Recommend (b) — preserves the Phase-1/Phase-2 split discipline. The disable flow is its own story. - **Verify NFR-COMPAT-101's Vite-proxy claim.** "Vite proxy continues to forward `XSRF-TOKEN` and `X-XSRF-TOKEN` headers in both directions" — `vite.config.ts` currently only injects `Authorization` from the `auth_token` cookie (`vite.config.ts:23–31`). After #523 strips that block, the proxy is back to passing headers through unchanged. That MAY be sufficient for CSRF (Vite's proxy is transparent for non-injected headers by default), but it warrants explicit verification with one cURL through the running dev server. Add a footnote that says "verify" so it doesn't get assumed. - **Add to Definition of Done:** "`docs/ARCHITECTURE.md` permission table updated to reflect that `ADMIN_USER` now governs session revocation in addition to user CRUD." Even though no new `Permission` enum value is added, the scope of an existing one expands meaningfully. - **Open a TBD register at the bottom of the issue body**, with the items I and the other personas have raised. Suggested IDs: `OQ-524-01` … `OQ-524-06`. Pre-#522, this issue assumed all questions were resolved — but the cross-persona review surfaces new ones, and a register prevents them from being silently re-resolved during implementation. ### Open Decisions - **Token-bucket semantics:** rolling refill (1 token / 3 min) vs fixed window (all 5 reset at 15-min mark). Pick one before coding to avoid re-doing the rate-limit tests. - **Scope of the disable-user flow:** keep in #524 as added FR-AUTH-011 vs split to #525. Cleaner is split.
Author
Owner

🗳️ Decision Queue — Action Required

7 decisions need your input before implementation starts. Grouped by theme; deduplicated where multiple personas raised the same item.

Architecture

  • Force-logout endpoint surface. Two options:

    • (A) POST /api/admin/users/{userId}/force-logout (as written in the issue) — fits the "admin tools" clustering, but the existing AdminController is class-level @RequirePermission(Permission.ADMIN); either you weaken that class annotation or split it into two controllers with different permissions in one namespace.
    • (B) POST /api/users/{userId}/force-logout under UserController with @RequirePermission(Permission.ADMIN_USER) — matches the existing user-admin pattern (InviteController, UserController write endpoints all use ADMIN_USER), keeps /api/admin/* a pure Permission.ADMIN namespace.

    Raised by: Markus, Elicit

  • Disable-user flow scope. Sara's "forced disabled user" amendment references a flow (UserService.disableUser, PATCH /api/users/{id}/disable) that does not exist in the codebase. Two options:

    • (A) Land it inside #524 as a new FR-AUTH-011: new endpoint + Permission.ADMIN_USER + audit kind + AC. Adds one controller method and a service method to scope.
    • (B) Split into a follow-up issue (#525), remove the "Forced disabled user" AC from #524's scope. Preserves the Phase-1/Phase-2 split discipline.

    Raised by: Markus, Sara, Elicit

Security

  • Token-bucket semantics for FR-AUTH-010. "5 tokens, refill 5 per 15 min" is ambiguous. Either:

    • (A) Rolling token bucket — 1 token refills every 3 minutes (Bucket4j idiom; fairer to a user who has waited).
    • (B) Fixed 15-minute window — all 5 tokens reset at the window boundary.

    Pick before the rate-limit tests get written. The implementation cost is identical; the user-visible behaviour differs.

    Raised by: Elicit

  • Per-(IP, email) rate-limit threshold. Issue says 5/15min, configurable. For the closed family archive (<20 users, mostly 60+ transcribers who mistype passwords), 5 is tight and false-lockouts will dominate support traffic. Options:

    • (A) 5/15min (issue default — appropriate for a public-facing service).
    • (B) 10/15min (humane for the senior audience — the per-IP backstop, if added, catches real attacks).
    • (C) Configurable, default 10 — matches NFR-SEC-104's "MAY be tuned by configuration property".

    Raised by: Nora

  • Add a per-IP backstop bucket? Per-(IP, email) keying alone allows 5 × N attempts per IP given a credential-stuffing wordlist of N emails. A per-IP backstop (e.g. 20 failures/15min across all emails) closes that gap. Options:

    • (A) Skip it — fail2ban at Caddy is the network-layer backstop.
    • (B) Add it, login only — simplest. 20/15min per IP across /api/auth/login.
    • (C) Add it, broader scope — extend to /api/auth/forgot-password + /api/auth/reset-password to catch enumeration via the reset flow too.

    Raised by: Nora

Infrastructure

  • Brute-force alerting wiring. Loki alert rate({app="backend"} |= "LOGIN_FAILED" [5m]) > 10 only fires if audit events are mirrored to the logger. Today's AuditService may be DB-only. Options:

    • (A) Mirror in AuditService — add log.info("audit kind={} ...", kind) next to the DB insert. Two lines of code, slight extra log volume, Loki stays the alerting plane.
    • (B) Postgres-side exporter — add a pg_exporter container to surface audit-row metrics to Prometheus. Cleaner separation, one more container to operate on the CX32.

    Raised by: Tobias

UX

  • Rate-limit error text. Issue's current text says "Please wait 15 minutes". With a rolling token bucket (Decision 3 option A), the bucket may refill in 1 minute — the literal "15 minutes" claim is then misleading. Options:

    • (A) Static, vague — "Too many login attempts. Please try again later." Friendly for the 60+ audience, leaks no internals, no extra response field.
    • (B) Dynamic countdown — backend returns retryAfterSeconds in the 429 body; frontend renders a live "Try again in MM:SS". Better for power users, distracting for seniors, more code.

    Raised by: Leonie


These are the only items that need your call before implementation can start. All other persona output is concrete recommendations — those can be picked up directly by the implementer.

## 🗳️ Decision Queue — Action Required _7 decisions need your input before implementation starts. Grouped by theme; deduplicated where multiple personas raised the same item._ ### Architecture - **Force-logout endpoint surface.** Two options: - **(A)** `POST /api/admin/users/{userId}/force-logout` (as written in the issue) — fits the "admin tools" clustering, but the existing `AdminController` is class-level `@RequirePermission(Permission.ADMIN)`; either you weaken that class annotation or split it into two controllers with different permissions in one namespace. - **(B)** `POST /api/users/{userId}/force-logout` under `UserController` with `@RequirePermission(Permission.ADMIN_USER)` — matches the existing user-admin pattern (`InviteController`, `UserController` write endpoints all use `ADMIN_USER`), keeps `/api/admin/*` a pure `Permission.ADMIN` namespace. _Raised by: Markus, Elicit_ - **Disable-user flow scope.** Sara's "forced disabled user" amendment references a flow (`UserService.disableUser`, `PATCH /api/users/{id}/disable`) that does not exist in the codebase. Two options: - **(A)** Land it inside #524 as a new FR-AUTH-011: new endpoint + `Permission.ADMIN_USER` + audit kind + AC. Adds one controller method and a service method to scope. - **(B)** Split into a follow-up issue (#525), remove the "Forced disabled user" AC from #524's scope. Preserves the Phase-1/Phase-2 split discipline. _Raised by: Markus, Sara, Elicit_ ### Security - **Token-bucket semantics for FR-AUTH-010.** "5 tokens, refill 5 per 15 min" is ambiguous. Either: - **(A) Rolling token bucket** — 1 token refills every 3 minutes (Bucket4j idiom; fairer to a user who has waited). - **(B) Fixed 15-minute window** — all 5 tokens reset at the window boundary. Pick before the rate-limit tests get written. The implementation cost is identical; the user-visible behaviour differs. _Raised by: Elicit_ - **Per-(IP, email) rate-limit threshold.** Issue says 5/15min, configurable. For the closed family archive (<20 users, mostly 60+ transcribers who mistype passwords), 5 is tight and false-lockouts will dominate support traffic. Options: - **(A) 5/15min** (issue default — appropriate for a public-facing service). - **(B) 10/15min** (humane for the senior audience — the per-IP backstop, if added, catches real attacks). - **(C) Configurable, default 10** — matches NFR-SEC-104's "MAY be tuned by configuration property". _Raised by: Nora_ - **Add a per-IP backstop bucket?** Per-(IP, email) keying alone allows `5 × N` attempts per IP given a credential-stuffing wordlist of N emails. A per-IP backstop (e.g. 20 failures/15min across all emails) closes that gap. Options: - **(A) Skip it** — fail2ban at Caddy is the network-layer backstop. - **(B) Add it, login only** — simplest. 20/15min per IP across `/api/auth/login`. - **(C) Add it, broader scope** — extend to `/api/auth/forgot-password` + `/api/auth/reset-password` to catch enumeration via the reset flow too. _Raised by: Nora_ ### Infrastructure - **Brute-force alerting wiring.** Loki alert `rate({app="backend"} |= "LOGIN_FAILED" [5m]) > 10` only fires if audit events are mirrored to the logger. Today's `AuditService` may be DB-only. Options: - **(A) Mirror in `AuditService`** — add `log.info("audit kind={} ...", kind)` next to the DB insert. Two lines of code, slight extra log volume, Loki stays the alerting plane. - **(B) Postgres-side exporter** — add a `pg_exporter` container to surface audit-row metrics to Prometheus. Cleaner separation, one more container to operate on the CX32. _Raised by: Tobias_ ### UX - **Rate-limit error text.** Issue's current text says "Please wait 15 minutes". With a rolling token bucket (Decision 3 option A), the bucket may refill in 1 minute — the literal "15 minutes" claim is then misleading. Options: - **(A) Static, vague** — "Too many login attempts. Please try again later." Friendly for the 60+ audience, leaks no internals, no extra response field. - **(B) Dynamic countdown** — backend returns `retryAfterSeconds` in the 429 body; frontend renders a live "Try again in MM:SS". Better for power users, distracting for seniors, more code. _Raised by: Leonie_ --- These are the only items that need your call before implementation can start. All other persona output is concrete recommendations — those can be picked up directly by the implementer.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: marcel/familienarchiv#524