feat(audit): domain-level audit log for archive activity #274

Closed
opened 2026-04-19 11:44:00 +02:00 by marcel · 9 comments
Owner

Problem

Several upcoming features — the dashboard Family Pulse (#271), the activity feed (#271), and future user contribution stats — need to answer questions like:

  • How many documents were transcribed this week, and by whom?
  • What did user X contribute this month?
  • When was the last time this document was touched, and who touched it?

The current model cannot answer these questions reliably. Document.updatedAt fires on every field change (metadata, status, system updates) and carries no actor context. There is no way to distinguish "user added a sender" from "OCR service updated the transcription field."

Solution

A dedicated audit_log table, populated by the service layer via an @Async AuditService. Domain-level semantic events ("TEXT_SAVED", "METADATA_UPDATED") rather than raw column diffs.

Data Model

-- V46__add_audit_log.sql
CREATE TABLE audit_log (
    id          UUID        PRIMARY KEY DEFAULT gen_random_uuid(),
    happened_at TIMESTAMPTZ NOT NULL DEFAULT now(),
    actor_id    UUID        REFERENCES app_users(id) ON DELETE SET NULL,
    kind        VARCHAR(50) NOT NULL,
    document_id UUID        REFERENCES documents(id) ON DELETE CASCADE,
    entity_type VARCHAR(30) NOT NULL,
    entity_id   UUID        NOT NULL,
    payload     JSONB
);

CREATE INDEX idx_audit_log_happened_at  ON audit_log (happened_at DESC);
CREATE INDEX idx_audit_log_document_id  ON audit_log (document_id);
CREATE INDEX idx_audit_log_actor_id     ON audit_log (actor_id);
CREATE INDEX idx_audit_log_kind         ON audit_log (kind);
  • document_id is denormalized on every row so "all events for document X" is a single index scan.
  • kind is a semantic application label — queryable and human-readable.
  • payload JSONB is optional per event type (e.g. { "oldStatus": "UPLOADED", "newStatus": "TRANSCRIBED" } for status changes).

AuditKind Enum

public enum AuditKind {
    FILE_UPLOADED,       // document transitioned PLACEHOLDER → UPLOADED
    STATUS_CHANGED,      // document status changed (payload: oldStatus, newStatus)
    METADATA_UPDATED,    // sender, receivers, tags, date, location changed
    TEXT_SAVED,          // transcription block text saved/updated
    BLOCK_REVIEWED,      // transcription block marked reviewed = true
    ANNOTATION_CREATED,  // new polygon annotation added to document
}

Instrumentation Points (5 service methods)

Service Method Event
DocumentService updateDocument() METADATA_UPDATED or STATUS_CHANGED (check which fields changed)
DocumentService upload transition FILE_UPLOADED on PLACEHOLDER → UPLOADED
TranscriptionBlockService save() / update() TEXT_SAVED only when text field actually changes
TranscriptionBlockService markReviewed() BLOCK_REVIEWED when reviewed flips to true
AnnotationService create() ANNOTATION_CREATED

AuditService

@Service
@RequiredArgsConstructor
@Slf4j
public class AuditService {
    private final AuditLogRepository auditLogRepository;

    @Async
    public void log(AuditKind kind, UUID actorId, UUID documentId,
                    String entityType, UUID entityId, Map<String, Object> payload) {
        try {
            auditLogRepository.save(AuditLog.builder()
                .kind(kind.name())
                .actorId(actorId)
                .documentId(documentId)
                .entityType(entityType)
                .entityId(entityId)
                .payload(payload)
                .build());
        } catch (Exception e) {
            // Audit failure must never propagate to the caller
            log.error("Audit log write failed: kind={}, document={}", kind, documentId, e);
        }
    }
}

@Async is critical: audit writes must never slow down the user-facing operation. Fire-and-forget. Failures are logged but not rethrown.

How This Unlocks the Dashboard (#271)

Once this is in, all three dashboard endpoints become simple queries:

-- Family Pulse: transcribed this week
SELECT kind, COUNT(DISTINCT document_id)
FROM audit_log
WHERE happened_at >= date_trunc('week', now())
GROUP BY kind;

-- Activity feed
SELECT * FROM audit_log
WHERE happened_at >= now() - interval '7 days'
ORDER BY happened_at DESC
LIMIT 10;

-- Resume card: my last touched document
SELECT document_id FROM audit_log
WHERE actor_id = :me AND kind = 'TEXT_SAVED'
ORDER BY happened_at DESC
LIMIT 1;

Effort Estimate

Task Effort
V46 migration (table + indexes) 0.5 days
AuditLog entity, AuditLogRepository, AuditService 0.5 days
Instrumentation of 5 service methods 1 day
Tests (unit + integration per instrumented method) 1 day
Total ~3 days

Out of Scope

  • Audit log UI (viewing history per document) — separate issue
  • Login/logout events — separate issue
  • Retention policy / log pruning — can be added later, simple DELETE WHERE happened_at < now() - interval '1 year'

Prerequisite for

  • #271 — Dashboard Family Pulse and activity feed
## Problem Several upcoming features — the dashboard Family Pulse (#271), the activity feed (#271), and future user contribution stats — need to answer questions like: - How many documents were transcribed this week, and by whom? - What did user X contribute this month? - When was the last time this document was touched, and who touched it? The current model cannot answer these questions reliably. `Document.updatedAt` fires on every field change (metadata, status, system updates) and carries no actor context. There is no way to distinguish "user added a sender" from "OCR service updated the transcription field." ## Solution A dedicated `audit_log` table, populated by the service layer via an `@Async AuditService`. Domain-level semantic events ("TEXT_SAVED", "METADATA_UPDATED") rather than raw column diffs. ## Data Model ```sql -- V46__add_audit_log.sql CREATE TABLE audit_log ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), happened_at TIMESTAMPTZ NOT NULL DEFAULT now(), actor_id UUID REFERENCES app_users(id) ON DELETE SET NULL, kind VARCHAR(50) NOT NULL, document_id UUID REFERENCES documents(id) ON DELETE CASCADE, entity_type VARCHAR(30) NOT NULL, entity_id UUID NOT NULL, payload JSONB ); CREATE INDEX idx_audit_log_happened_at ON audit_log (happened_at DESC); CREATE INDEX idx_audit_log_document_id ON audit_log (document_id); CREATE INDEX idx_audit_log_actor_id ON audit_log (actor_id); CREATE INDEX idx_audit_log_kind ON audit_log (kind); ``` - `document_id` is denormalized on every row so "all events for document X" is a single index scan. - `kind` is a semantic application label — queryable and human-readable. - `payload JSONB` is optional per event type (e.g. `{ "oldStatus": "UPLOADED", "newStatus": "TRANSCRIBED" }` for status changes). ## AuditKind Enum ```java public enum AuditKind { FILE_UPLOADED, // document transitioned PLACEHOLDER → UPLOADED STATUS_CHANGED, // document status changed (payload: oldStatus, newStatus) METADATA_UPDATED, // sender, receivers, tags, date, location changed TEXT_SAVED, // transcription block text saved/updated BLOCK_REVIEWED, // transcription block marked reviewed = true ANNOTATION_CREATED, // new polygon annotation added to document } ``` ## Instrumentation Points (5 service methods) | Service | Method | Event | |---|---|---| | `DocumentService` | `updateDocument()` | `METADATA_UPDATED` or `STATUS_CHANGED` (check which fields changed) | | `DocumentService` | upload transition | `FILE_UPLOADED` on PLACEHOLDER → UPLOADED | | `TranscriptionBlockService` | `save()` / `update()` | `TEXT_SAVED` only when `text` field actually changes | | `TranscriptionBlockService` | `markReviewed()` | `BLOCK_REVIEWED` when `reviewed` flips to `true` | | `AnnotationService` | `create()` | `ANNOTATION_CREATED` | ## AuditService ```java @Service @RequiredArgsConstructor @Slf4j public class AuditService { private final AuditLogRepository auditLogRepository; @Async public void log(AuditKind kind, UUID actorId, UUID documentId, String entityType, UUID entityId, Map<String, Object> payload) { try { auditLogRepository.save(AuditLog.builder() .kind(kind.name()) .actorId(actorId) .documentId(documentId) .entityType(entityType) .entityId(entityId) .payload(payload) .build()); } catch (Exception e) { // Audit failure must never propagate to the caller log.error("Audit log write failed: kind={}, document={}", kind, documentId, e); } } } ``` `@Async` is critical: audit writes must never slow down the user-facing operation. Fire-and-forget. Failures are logged but not rethrown. ## How This Unlocks the Dashboard (#271) Once this is in, all three dashboard endpoints become simple queries: ```sql -- Family Pulse: transcribed this week SELECT kind, COUNT(DISTINCT document_id) FROM audit_log WHERE happened_at >= date_trunc('week', now()) GROUP BY kind; -- Activity feed SELECT * FROM audit_log WHERE happened_at >= now() - interval '7 days' ORDER BY happened_at DESC LIMIT 10; -- Resume card: my last touched document SELECT document_id FROM audit_log WHERE actor_id = :me AND kind = 'TEXT_SAVED' ORDER BY happened_at DESC LIMIT 1; ``` ## Effort Estimate | Task | Effort | |---|---| | V46 migration (table + indexes) | 0.5 days | | `AuditLog` entity, `AuditLogRepository`, `AuditService` | 0.5 days | | Instrumentation of 5 service methods | 1 day | | Tests (unit + integration per instrumented method) | 1 day | | **Total** | **~3 days** | ## Out of Scope - Audit log UI (viewing history per document) — separate issue - Login/logout events — separate issue - Retention policy / log pruning — can be added later, simple `DELETE WHERE happened_at < now() - interval '1 year'` ## Prerequisite for - #271 — Dashboard Family Pulse and activity feed
marcel added the collaborationfeature labels 2026-04-19 11:44:07 +02:00
Author
Owner

Design decision: pageNumber in audit event payload

Following discussion on #271, the Family Pulse headline stat is "N Seiten bearbeitet" — counting distinct pages worked on this week. To avoid a join at query time, pageNumber must be written into the payload JSONB at event creation time.

Affected event kinds:

Kind Where pageNumber comes from
ANNOTATION_CREATED DocumentAnnotation.pageNumber — available directly at creation time
TEXT_SAVED TranscriptionBlock.annotationIdDocumentAnnotation.pageNumber — must be resolved in the service before logging

Payload shape for these two events:

{ "pageNumber": 3 }

Query this enables (no joins):

SELECT COUNT(DISTINCT (document_id, payload->>'pageNumber'))
FROM audit_log
WHERE kind IN ('ANNOTATION_CREATED', 'TEXT_SAVED')
AND happened_at >= :weekStart

Implementation note: when logging TEXT_SAVED, the service must load the parent DocumentAnnotation to resolve pageNumber before calling AuditService.log(). This is a single lookup by annotationId and can be done within the same transaction as the block save.

## Design decision: `pageNumber` in audit event payload Following discussion on #271, the Family Pulse headline stat is **"N Seiten bearbeitet"** — counting distinct pages worked on this week. To avoid a join at query time, `pageNumber` must be written into the `payload` JSONB at event creation time. **Affected event kinds:** | Kind | Where `pageNumber` comes from | |---|---| | `ANNOTATION_CREATED` | `DocumentAnnotation.pageNumber` — available directly at creation time | | `TEXT_SAVED` | `TranscriptionBlock.annotationId` → `DocumentAnnotation.pageNumber` — must be resolved in the service before logging | **Payload shape for these two events:** ```json { "pageNumber": 3 } ``` **Query this enables (no joins):** ```sql SELECT COUNT(DISTINCT (document_id, payload->>'pageNumber')) FROM audit_log WHERE kind IN ('ANNOTATION_CREATED', 'TEXT_SAVED') AND happened_at >= :weekStart ``` **Implementation note:** when logging `TEXT_SAVED`, the service must load the parent `DocumentAnnotation` to resolve `pageNumber` before calling `AuditService.log()`. This is a single lookup by `annotationId` and can be done within the same transaction as the block save.
Author
Owner

👨‍💻 Felix Brandt — Senior Fullstack Developer

Observations

  • Wrong service name in the spec. The issue references TranscriptionBlockService throughout, but the actual class is TranscriptionService (service/TranscriptionService.java). The instrumentation table needs updating before implementation begins.

  • reviewBlock() toggles, it doesn't set. TranscriptionService.reviewBlock() does block.setReviewed(!block.isReviewed()) — it flips both ways. The BLOCK_REVIEWED event must only fire when transitioning from false → true. Implementation needs to capture the old value before the flip:

    boolean wasReviewed = block.isReviewed();
    block.setReviewed(!wasReviewed);
    TranscriptionBlock saved = blockRepository.save(block);
    if (!wasReviewed && saved.isReviewed()) {
        auditService.log(AuditKind.BLOCK_REVIEWED, actorId, documentId, ...);
    }
    
  • TEXT_SAVED "only when text actually changes" needs an explicit diff. updateBlock() always calls block.setText(text) without comparing old vs. new. The implementation must capture String previousText = block.getText() before the update and skip the audit call if unchanged.

  • Two annotation creation paths. AnnotationService has both createAnnotation() (user-drawn) and createOcrAnnotation() (OCR-generated). The issue references create() which matches neither. Clarify: should OCR-generated annotations emit ANNOTATION_CREATED? My recommendation: no — OCR annotations are system-generated, not user contributions. Only createAnnotation() should instrument ANNOTATION_CREATED.

  • Map<String, Object> payload → JPA JSONB mapping not specified. The AuditLog entity will need @JdbcTypeCode(SqlTypes.JSON) (Hibernate 6 / Spring Boot 4 style) on the payload field — not @Column(columnDefinition = "jsonb"). The issue omits this detail.

  • Store AuditKind directly on the entity. Using kind.name() in AuditService is fragile — it bypasses enum type safety. Use @Enumerated(EnumType.STRING) on AuditLog.kind and store the enum directly.

Recommendations

  • Fix service name references: TranscriptionService.updateBlock() and TranscriptionService.reviewBlock() are the actual instrumentation points.
  • Capture old state before mutating for both TEXT_SAVED and BLOCK_REVIEWED guards — one line each, before the mutation.
  • Add @JdbcTypeCode(SqlTypes.JSON) to the payload field in the entity.
  • Use @Enumerated(EnumType.STRING) on AuditLog.kind — drop the manual .name() call in AuditService.log().
## 👨‍💻 Felix Brandt — Senior Fullstack Developer ### Observations - **Wrong service name in the spec.** The issue references `TranscriptionBlockService` throughout, but the actual class is `TranscriptionService` (`service/TranscriptionService.java`). The instrumentation table needs updating before implementation begins. - **`reviewBlock()` toggles, it doesn't set.** `TranscriptionService.reviewBlock()` does `block.setReviewed(!block.isReviewed())` — it flips both ways. The `BLOCK_REVIEWED` event must only fire when transitioning from `false → true`. Implementation needs to capture the old value before the flip: ```java boolean wasReviewed = block.isReviewed(); block.setReviewed(!wasReviewed); TranscriptionBlock saved = blockRepository.save(block); if (!wasReviewed && saved.isReviewed()) { auditService.log(AuditKind.BLOCK_REVIEWED, actorId, documentId, ...); } ``` - **`TEXT_SAVED` "only when text actually changes" needs an explicit diff.** `updateBlock()` always calls `block.setText(text)` without comparing old vs. new. The implementation must capture `String previousText = block.getText()` before the update and skip the audit call if unchanged. - **Two annotation creation paths.** `AnnotationService` has both `createAnnotation()` (user-drawn) and `createOcrAnnotation()` (OCR-generated). The issue references `create()` which matches neither. Clarify: should OCR-generated annotations emit `ANNOTATION_CREATED`? My recommendation: no — OCR annotations are system-generated, not user contributions. Only `createAnnotation()` should instrument `ANNOTATION_CREATED`. - **`Map<String, Object>` payload → JPA JSONB mapping not specified.** The `AuditLog` entity will need `@JdbcTypeCode(SqlTypes.JSON)` (Hibernate 6 / Spring Boot 4 style) on the `payload` field — not `@Column(columnDefinition = "jsonb")`. The issue omits this detail. - **Store `AuditKind` directly on the entity.** Using `kind.name()` in `AuditService` is fragile — it bypasses enum type safety. Use `@Enumerated(EnumType.STRING)` on `AuditLog.kind` and store the enum directly. ### Recommendations - Fix service name references: `TranscriptionService.updateBlock()` and `TranscriptionService.reviewBlock()` are the actual instrumentation points. - Capture old state before mutating for both `TEXT_SAVED` and `BLOCK_REVIEWED` guards — one line each, before the mutation. - Add `@JdbcTypeCode(SqlTypes.JSON)` to the `payload` field in the entity. - Use `@Enumerated(EnumType.STRING)` on `AuditLog.kind` — drop the manual `.name()` call in `AuditService.log()`.
Author
Owner

🏛️ Markus Keller — Application Architect

Observations

  • @Async within @Transactional has a timing hazard — the biggest architectural risk in this issue. When auditService.log() is called from inside a @Transactional service method, Spring dispatches the audit write to the thread pool immediately — but the calling transaction may not have committed yet. The async thread opens its own transaction and tries to insert a row with a FK reference to document_id. If the calling transaction commits after the async thread reads, the FK lookup succeeds. If the async thread runs first (plausible under load), it sees a non-existent document_id and the FK constraint fires — the audit write fails and is silently swallowed by the catch block. This is not a theoretical edge case; it is the standard problem with @Async called from @Transactional.

    The established Spring fix is afterCommit synchronization:

    TransactionSynchronizationManager.registerSynchronization(
        new TransactionSynchronization() {
            @Override
            public void afterCommit() {
                auditService.log(kind, actorId, documentId, entityType, entityId, payload);
            }
        }
    );
    

    This queues the async dispatch for after the parent transaction commits, eliminating the hazard. For new document creation, this is essential; for updates to existing documents, it still prevents phantom reads.

  • entity_type / entity_id columns are premature complexity. Looking at the three dashboard queries in the issue, none of them use entity_type or entity_id — they query by kind, document_id, actor_id, and happened_at. These columns add EAV-style flexibility without a current consumer. Drop them from V46 and add them in a future migration when a concrete use case exists.

  • Dedicated audit/ package. AuditLog, AuditKind, AuditService, and AuditLogRepository should live in org.raddatz.familienarchiv.audit/ — not scattered across model/, service/, and repository/. Audit is a bounded domain. It owns its repository, its entity, and its service. Other services call AuditService by injection (service → service, never service → repository across domains).

  • The @Async executor is shared with all existing async work (MassImport, OCR progress notifications). See Tobias's note on pool sizing. From an architecture perspective, a dedicated executor bean for audit is correct — AuditService should declare @Async("auditExecutor") and AsyncConfig should register a second small pool.

Recommendations

  • Use TransactionSynchronizationManager.registerSynchronization() + afterCommit() to defer async dispatch until the parent transaction is committed. This is the correct pattern — not a workaround.
  • Remove entity_type / entity_id from V46. The current query patterns don't need them. Add them later if a concrete audit-by-entity-type query is needed.
  • Create an audit/ package. No other package should import from audit/ repositories directly.

Open Decisions

  • entity_type / entity_id — keep or drop? The columns add flexibility (future query: "all events on block X") at the cost of schema noise and non-FK referential integrity. Options: (a) drop now and add in a future migration when needed — zero maintenance cost today; (b) keep — some implementation complexity, no FK guarantees, no current consumer. I lean toward (a), but if the block-level history UI is coming soon, keeping them avoids a migration later.
## 🏛️ Markus Keller — Application Architect ### Observations - **`@Async` within `@Transactional` has a timing hazard — the biggest architectural risk in this issue.** When `auditService.log()` is called from inside a `@Transactional` service method, Spring dispatches the audit write to the thread pool immediately — but the calling transaction may not have committed yet. The async thread opens its own transaction and tries to insert a row with a FK reference to `document_id`. If the calling transaction commits after the async thread reads, the FK lookup succeeds. If the async thread runs first (plausible under load), it sees a non-existent `document_id` and the FK constraint fires — the audit write fails and is silently swallowed by the catch block. This is not a theoretical edge case; it is the standard problem with `@Async` called from `@Transactional`. The established Spring fix is `afterCommit` synchronization: ```java TransactionSynchronizationManager.registerSynchronization( new TransactionSynchronization() { @Override public void afterCommit() { auditService.log(kind, actorId, documentId, entityType, entityId, payload); } } ); ``` This queues the async dispatch for after the parent transaction commits, eliminating the hazard. For new document creation, this is essential; for updates to existing documents, it still prevents phantom reads. - **`entity_type` / `entity_id` columns are premature complexity.** Looking at the three dashboard queries in the issue, none of them use `entity_type` or `entity_id` — they query by `kind`, `document_id`, `actor_id`, and `happened_at`. These columns add EAV-style flexibility without a current consumer. Drop them from V46 and add them in a future migration when a concrete use case exists. - **Dedicated `audit/` package.** `AuditLog`, `AuditKind`, `AuditService`, and `AuditLogRepository` should live in `org.raddatz.familienarchiv.audit/` — not scattered across `model/`, `service/`, and `repository/`. Audit is a bounded domain. It owns its repository, its entity, and its service. Other services call `AuditService` by injection (service → service, never service → repository across domains). - **The `@Async` executor is shared with all existing async work** (MassImport, OCR progress notifications). See Tobias's note on pool sizing. From an architecture perspective, a dedicated executor bean for audit is correct — `AuditService` should declare `@Async("auditExecutor")` and `AsyncConfig` should register a second small pool. ### Recommendations - Use `TransactionSynchronizationManager.registerSynchronization()` + `afterCommit()` to defer async dispatch until the parent transaction is committed. This is the correct pattern — not a workaround. - Remove `entity_type` / `entity_id` from V46. The current query patterns don't need them. Add them later if a concrete audit-by-entity-type query is needed. - Create an `audit/` package. No other package should import from `audit/` repositories directly. ### Open Decisions - **`entity_type` / `entity_id` — keep or drop?** The columns add flexibility (future query: "all events on block X") at the cost of schema noise and non-FK referential integrity. Options: (a) drop now and add in a future migration when needed — zero maintenance cost today; (b) keep — some implementation complexity, no FK guarantees, no current consumer. I lean toward (a), but if the block-level history UI is coming soon, keeping them avoids a migration later.
Author
Owner

🔒 Nora Steiner — Security Engineer

Observations

  • ON DELETE SET NULL on actor_id silently drops audit attribution. The schema uses actor_id REFERENCES app_users(id) ON DELETE SET NULL. When a user is deleted, every audit row they created becomes actor_id = NULL. This is GDPR-correct for right-to-erasure, but it means "what did this user do?" becomes unanswerable after deletion. This is a deliberate trade-off and should be documented explicitly in the migration file — not left as an implicit schema behavior.

  • The audit log must be append-only, enforced at the database layer. The application role (app_user) should have INSERT but not UPDATE or DELETE on audit_log. Without this, a bug (or compromised service account) could silently modify or purge audit events. Add this to V46:

    REVOKE UPDATE, DELETE ON audit_log FROM app_user;
    

    This is free defense and costs nothing to implement.

  • payload JSONB is an open-ended PII surface. The payload is untyped — any service method can pass any Map<String, Object>. If a developer accidentally passes the full transcription text (e.g., Map.of("text", block.getText())) in a TEXT_SAVED event, that content is now stored indefinitely in the audit log. The issue should enumerate, as explicit constants or Javadoc on AuditKind, exactly what payload fields are permitted for each event kind. This prevents accidental PII logging and makes future GDPR audits tractable.

  • Fire-and-forget + AbortPolicy = silently dropped security events. The existing AsyncConfig uses ThreadPoolExecutor.AbortPolicy, which throws RejectedExecutionException when the queue (capacity 10) is full. AuditService.log() catches Exception — so rejected audit tasks are caught, logged, and lost. For a security audit trail, silent gaps under load are a known risk. The issue acknowledges that "audit failure must never propagate to the caller" — which is correct — but the trade-off should be explicit: this audit log provides activity visibility for the dashboard, not tamper-evident security forensics. That's fine for this use case, but it should be stated.

  • No missing injection risks from JSONB. The payload is a Map<String, Object> serialized by Hibernate/Jackson — not user-controlled SQL strings injected into queries. The three example dashboard queries use parameterized JPQL / native queries. No injection surface here. ✓

Recommendations

  • Add to V46 migration: REVOKE UPDATE, DELETE ON audit_log FROM app_user; — makes the audit log append-only at the DB layer.
  • Add a comment to the migration explaining the ON DELETE SET NULL intent: "By design — GDPR right-to-erasure. Deleted users' events retain the event timestamp and kind but lose actor attribution."
  • Add Javadoc to AuditKind enum values specifying the allowed payload shape for each kind. Example:
    /** Payload: {@code {"oldStatus": "UPLOADED", "newStatus": "TRANSCRIBED"}} */
    STATUS_CHANGED,
    /** Payload: {@code {"pageNumber": 3}} */
    TEXT_SAVED,
    
    This makes the contract visible and prevents future PII leaks via payload.
## 🔒 Nora Steiner — Security Engineer ### Observations - **`ON DELETE SET NULL` on `actor_id` silently drops audit attribution.** The schema uses `actor_id REFERENCES app_users(id) ON DELETE SET NULL`. When a user is deleted, every audit row they created becomes `actor_id = NULL`. This is GDPR-correct for right-to-erasure, but it means "what did this user do?" becomes unanswerable after deletion. This is a deliberate trade-off and should be documented explicitly in the migration file — not left as an implicit schema behavior. - **The audit log must be append-only, enforced at the database layer.** The application role (`app_user`) should have `INSERT` but not `UPDATE` or `DELETE` on `audit_log`. Without this, a bug (or compromised service account) could silently modify or purge audit events. Add this to V46: ```sql REVOKE UPDATE, DELETE ON audit_log FROM app_user; ``` This is free defense and costs nothing to implement. - **`payload JSONB` is an open-ended PII surface.** The payload is untyped — any service method can pass any `Map<String, Object>`. If a developer accidentally passes the full transcription text (e.g., `Map.of("text", block.getText())`) in a `TEXT_SAVED` event, that content is now stored indefinitely in the audit log. The issue should enumerate, as explicit constants or Javadoc on `AuditKind`, exactly what payload fields are permitted for each event kind. This prevents accidental PII logging and makes future GDPR audits tractable. - **Fire-and-forget + `AbortPolicy` = silently dropped security events.** The existing `AsyncConfig` uses `ThreadPoolExecutor.AbortPolicy`, which throws `RejectedExecutionException` when the queue (capacity 10) is full. `AuditService.log()` catches `Exception` — so rejected audit tasks are caught, logged, and lost. For a security audit trail, silent gaps under load are a known risk. The issue acknowledges that "audit failure must never propagate to the caller" — which is correct — but the trade-off should be explicit: this audit log provides activity visibility for the dashboard, not tamper-evident security forensics. That's fine for this use case, but it should be stated. - **No missing injection risks from JSONB.** The `payload` is a `Map<String, Object>` serialized by Hibernate/Jackson — not user-controlled SQL strings injected into queries. The three example dashboard queries use parameterized JPQL / native queries. No injection surface here. ✓ ### Recommendations - Add to V46 migration: `REVOKE UPDATE, DELETE ON audit_log FROM app_user;` — makes the audit log append-only at the DB layer. - Add a comment to the migration explaining the `ON DELETE SET NULL` intent: "By design — GDPR right-to-erasure. Deleted users' events retain the event timestamp and kind but lose actor attribution." - Add Javadoc to `AuditKind` enum values specifying the allowed payload shape for each kind. Example: ```java /** Payload: {@code {"oldStatus": "UPLOADED", "newStatus": "TRANSCRIBED"}} */ STATUS_CHANGED, /** Payload: {@code {"pageNumber": 3}} */ TEXT_SAVED, ``` This makes the contract visible and prevents future PII leaks via payload.
Author
Owner

🧪 Sara Holt — QA Engineer

Observations

  • The issue's test estimate ("1 day for unit + integration per instrumented method") is plausible but needs a concrete test plan. Without specified test cases, this will likely produce happy-path-only coverage and miss the conditional logic that makes this feature correct.

  • @Async makes integration test synchronization non-trivial. Unit tests mock AuditService directly — no sync issues. Integration tests need to verify that the async write completes before asserting on the DB. Thread.sleep() is wrong here. Use Awaitility:

    await().atMost(5, SECONDS).until(() ->
        auditLogRepository.countByDocumentIdAndKind(docId, AuditKind.TEXT_SAVED.name()) == 1
    );
    
  • @Transactional on integration test methods will cause false positives for async writes. The standard @Transactional rollback pattern on @SpringBootTest tests will roll back the test's transaction — but the async audit write runs on a separate thread in a separate transaction and will NOT be rolled back. This means rows accumulate across tests, causing count-based assertions to fail on the second run. Use @Sql with explicit DELETE FROM audit_log in @BeforeEach instead of relying on rollback.

  • Three conditional invariants need explicit test coverage:

    1. TEXT_SAVED fires when text changes — and does NOT fire when text is unchanged
    2. BLOCK_REVIEWED fires when reviewed flips from false → true — and does NOT fire on true → false
    3. METADATA_UPDATED vs. STATUS_CHANGED split in updateDocument — the correct kind is emitted for each scenario
  • The dashboard queries should be tested now, not deferred. The three example SQL queries in the issue are the whole point of this feature. Write integration tests against a real Postgres container that seed audit rows and verify the query results — proving the schema actually supports the dashboard before #271 implementation begins.

Recommendations

Test plan:

Layer Test class What it covers
Unit AuditServiceTest log() saves to repo; caught exceptions don't propagate
Unit TranscriptionServiceTest TEXT_SAVED only on text change; BLOCK_REVIEWED only on false→true
Unit DocumentServiceTest METADATA_UPDATED vs. STATUS_CHANGED discrimination
Integration AuditLogIntegrationTest Async write completes (Awaitility); FK constraints hold; append-only (no update/delete)
Integration AuditDashboardQueryTest Family Pulse, Activity Feed, and Resume Card queries return correct results
  • Use @Sql("classpath:cleanup-audit-log.sql") on @BeforeEach to handle async cross-test contamination.
  • Verify the REVOKE UPDATE, DELETE constraint in AuditLogIntegrationTest by asserting that an attempt to update an audit row throws DataAccessException.
## 🧪 Sara Holt — QA Engineer ### Observations - **The issue's test estimate ("1 day for unit + integration per instrumented method") is plausible but needs a concrete test plan.** Without specified test cases, this will likely produce happy-path-only coverage and miss the conditional logic that makes this feature correct. - **`@Async` makes integration test synchronization non-trivial.** Unit tests mock `AuditService` directly — no sync issues. Integration tests need to verify that the async write completes before asserting on the DB. `Thread.sleep()` is wrong here. Use Awaitility: ```java await().atMost(5, SECONDS).until(() -> auditLogRepository.countByDocumentIdAndKind(docId, AuditKind.TEXT_SAVED.name()) == 1 ); ``` - **`@Transactional` on integration test methods will cause false positives for async writes.** The standard `@Transactional` rollback pattern on `@SpringBootTest` tests will roll back the test's transaction — but the async audit write runs on a separate thread in a separate transaction and will NOT be rolled back. This means rows accumulate across tests, causing count-based assertions to fail on the second run. Use `@Sql` with explicit `DELETE FROM audit_log` in `@BeforeEach` instead of relying on rollback. - **Three conditional invariants need explicit test coverage:** 1. `TEXT_SAVED` fires when text changes — and does NOT fire when text is unchanged 2. `BLOCK_REVIEWED` fires when reviewed flips from `false → true` — and does NOT fire on `true → false` 3. `METADATA_UPDATED` vs. `STATUS_CHANGED` split in `updateDocument` — the correct kind is emitted for each scenario - **The dashboard queries should be tested now, not deferred.** The three example SQL queries in the issue are the whole point of this feature. Write integration tests against a real Postgres container that seed audit rows and verify the query results — proving the schema actually supports the dashboard before #271 implementation begins. ### Recommendations **Test plan:** | Layer | Test class | What it covers | |---|---|---| | Unit | `AuditServiceTest` | `log()` saves to repo; caught exceptions don't propagate | | Unit | `TranscriptionServiceTest` | `TEXT_SAVED` only on text change; `BLOCK_REVIEWED` only on false→true | | Unit | `DocumentServiceTest` | `METADATA_UPDATED` vs. `STATUS_CHANGED` discrimination | | Integration | `AuditLogIntegrationTest` | Async write completes (Awaitility); FK constraints hold; append-only (no update/delete) | | Integration | `AuditDashboardQueryTest` | Family Pulse, Activity Feed, and Resume Card queries return correct results | - Use `@Sql("classpath:cleanup-audit-log.sql")` on `@BeforeEach` to handle async cross-test contamination. - Verify the `REVOKE UPDATE, DELETE` constraint in `AuditLogIntegrationTest` by asserting that an attempt to update an audit row throws `DataAccessException`.
Author
Owner

🚀 Tobias Wendt — DevOps & Platform Engineer

Observations

  • The existing async executor is shared and undersized for audit load. AsyncConfig configures a single taskExecutor with maxPoolSize(2) and queueCapacity(10), using AbortPolicy. This pool already serves MassImportService and OCR progress notifications. Adding high-frequency audit writes (every TEXT_SAVED, every BLOCK_REVIEWED) to this pool means:

    • Under a transcription session where a user saves 15 blocks, the queue fills up and additional TEXT_SAVED events are rejected
    • RejectedExecutionException is caught silently — audit events are dropped without any visible signal
    • One slow async import job effectively blocks all audit writes

    The fix is a dedicated named executor for audit:

    @Bean("auditExecutor")
    public Executor auditExecutor() {
        ThreadPoolTaskExecutor e = new ThreadPoolTaskExecutor();
        e.setCorePoolSize(1);
        e.setMaxPoolSize(2);
        e.setQueueCapacity(50);
        e.setThreadNamePrefix("Audit-");
        e.setRejectedExecutionHandler(new ThreadPoolExecutor.CallerRunsPolicy());
        return e;
    }
    

    And on AuditService: @Async("auditExecutor"). Note CallerRunsPolicy rather than AbortPolicy — under queue saturation, the caller thread handles the write synchronously rather than dropping the event. This is an acceptable degradation for a non-critical audit trail.

  • audit_log will grow unboundedly — estimate row volume now. The issue mentions a retention policy as future work. Before shipping, it's worth a rough estimate: 5 active users × 20 transcription saves/day × 365 days = ~36k rows/year. At ~500 bytes/row (UUID columns + JSONB payload), that's ~18MB/year — entirely negligible. The retention policy can stay as future work; the table won't cause operational problems for years.

  • No observability gap for dropped events. Currently, if AuditService.log() catches a RejectedExecutionException (dropped by AbortPolicy), it logs at ERROR level but nothing counts it. Adding a simple Micrometer counter would make this visible in Grafana:

    meterRegistry.counter("audit.events.dropped", "kind", kind.name()).increment();
    

    Not essential for V1, but worth a follow-up issue.

  • Migration numbering is correct. Latest migration is V45 (add_invite_tokens), so V46 is the right version number. ✓

  • No Docker Compose or CI changes needed for this feature. The audit log is PostgreSQL-native — no new service, no new port, no infrastructure change. ✓

Recommendations

  • Add a dedicated @Bean("auditExecutor") in AsyncConfig with queueCapacity(50) and CallerRunsPolicy — keeps audit writes isolated from import/OCR async work and degrades gracefully under load instead of silently dropping events.
  • File a follow-up issue for the Micrometer dropped-event counter — not a blocker, but worth tracking.
## 🚀 Tobias Wendt — DevOps & Platform Engineer ### Observations - **The existing async executor is shared and undersized for audit load.** `AsyncConfig` configures a single `taskExecutor` with `maxPoolSize(2)` and `queueCapacity(10)`, using `AbortPolicy`. This pool already serves `MassImportService` and OCR progress notifications. Adding high-frequency audit writes (every `TEXT_SAVED`, every `BLOCK_REVIEWED`) to this pool means: - Under a transcription session where a user saves 15 blocks, the queue fills up and additional `TEXT_SAVED` events are rejected - `RejectedExecutionException` is caught silently — audit events are dropped without any visible signal - One slow async import job effectively blocks all audit writes The fix is a dedicated named executor for audit: ```java @Bean("auditExecutor") public Executor auditExecutor() { ThreadPoolTaskExecutor e = new ThreadPoolTaskExecutor(); e.setCorePoolSize(1); e.setMaxPoolSize(2); e.setQueueCapacity(50); e.setThreadNamePrefix("Audit-"); e.setRejectedExecutionHandler(new ThreadPoolExecutor.CallerRunsPolicy()); return e; } ``` And on `AuditService`: `@Async("auditExecutor")`. Note `CallerRunsPolicy` rather than `AbortPolicy` — under queue saturation, the caller thread handles the write synchronously rather than dropping the event. This is an acceptable degradation for a non-critical audit trail. - **`audit_log` will grow unboundedly — estimate row volume now.** The issue mentions a retention policy as future work. Before shipping, it's worth a rough estimate: 5 active users × 20 transcription saves/day × 365 days = ~36k rows/year. At ~500 bytes/row (UUID columns + JSONB payload), that's ~18MB/year — entirely negligible. The retention policy can stay as future work; the table won't cause operational problems for years. - **No observability gap for dropped events.** Currently, if `AuditService.log()` catches a `RejectedExecutionException` (dropped by `AbortPolicy`), it logs at `ERROR` level but nothing counts it. Adding a simple Micrometer counter would make this visible in Grafana: ```java meterRegistry.counter("audit.events.dropped", "kind", kind.name()).increment(); ``` Not essential for V1, but worth a follow-up issue. - **Migration numbering is correct.** Latest migration is V45 (`add_invite_tokens`), so V46 is the right version number. ✓ - **No Docker Compose or CI changes needed for this feature.** The audit log is PostgreSQL-native — no new service, no new port, no infrastructure change. ✓ ### Recommendations - Add a dedicated `@Bean("auditExecutor")` in `AsyncConfig` with `queueCapacity(50)` and `CallerRunsPolicy` — keeps audit writes isolated from import/OCR async work and degrades gracefully under load instead of silently dropping events. - File a follow-up issue for the Micrometer dropped-event counter — not a blocker, but worth tracking.
Author
Owner

🎨 Leonie Voss — UX Designer & Accessibility Strategist

No UX concerns from my angle. This is a pure backend data infrastructure issue — no UI surface is affected, and the issue correctly scopes out both the audit log UI and the dashboard itself as separate issues.

One observation worth capturing for when the dashboard (#271) is designed: the AuditKind vocabulary ("TEXT_SAVED", "BLOCK_REVIEWED", "ANNOTATION_CREATED") is developer-facing. The dashboard UI will need to translate these into human-readable, audience-appropriate phrasing — "Marcel hat 3 Seiten transkribiert" rather than "3 TEXT_SAVED events". That translation layer belongs in the dashboard issue, not here, but the semantic event model in this issue (domain events rather than raw column diffs) is exactly the right foundation for it. Good call on that design choice.

## 🎨 Leonie Voss — UX Designer & Accessibility Strategist No UX concerns from my angle. This is a pure backend data infrastructure issue — no UI surface is affected, and the issue correctly scopes out both the audit log UI and the dashboard itself as separate issues. One observation worth capturing for when the dashboard (#271) is designed: the `AuditKind` vocabulary ("TEXT_SAVED", "BLOCK_REVIEWED", "ANNOTATION_CREATED") is developer-facing. The dashboard UI will need to translate these into human-readable, audience-appropriate phrasing — "Marcel hat 3 Seiten transkribiert" rather than "3 TEXT_SAVED events". That translation layer belongs in the dashboard issue, not here, but the semantic event model in this issue (domain events rather than raw column diffs) is exactly the right foundation for it. Good call on that design choice.
Author
Owner

🗳️ Decision Queue — Action Required

1 decision needs your input before implementation starts.

Architecture

  • entity_type / entity_id columns — keep or drop from V46? None of the three dashboard queries use these columns; they query by kind, document_id, actor_id, and happened_at only. Keeping them adds EAV-style flexibility (future: "all events on block X") but without FK guarantees and with no current consumer. Options: (a) drop them now — zero cost, add in a future migration when a concrete query needs them; (b) keep them — avoids a future migration if the block-level history UI (#274 is already scoped as "Audit log UI — separate issue") comes soon. (Raised by: Markus)
## 🗳️ Decision Queue — Action Required _1 decision needs your input before implementation starts._ ### Architecture - **`entity_type` / `entity_id` columns — keep or drop from V46?** None of the three dashboard queries use these columns; they query by `kind`, `document_id`, `actor_id`, and `happened_at` only. Keeping them adds EAV-style flexibility (future: "all events on block X") but without FK guarantees and with no current consumer. Options: (a) drop them now — zero cost, add in a future migration when a concrete query needs them; (b) keep them — avoids a future migration if the block-level history UI (#274 is already scoped as "Audit log UI — separate issue") comes soon. _(Raised by: Markus)_
Author
Owner

We drop the columns

We drop the columns
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: marcel/familienarchiv#274