feat(audit): domain-level audit log for archive activity #274
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Problem
Several upcoming features — the dashboard Family Pulse (#271), the activity feed (#271), and future user contribution stats — need to answer questions like:
The current model cannot answer these questions reliably.
Document.updatedAtfires on every field change (metadata, status, system updates) and carries no actor context. There is no way to distinguish "user added a sender" from "OCR service updated the transcription field."Solution
A dedicated
audit_logtable, populated by the service layer via an@Async AuditService. Domain-level semantic events ("TEXT_SAVED", "METADATA_UPDATED") rather than raw column diffs.Data Model
document_idis denormalized on every row so "all events for document X" is a single index scan.kindis a semantic application label — queryable and human-readable.payload JSONBis optional per event type (e.g.{ "oldStatus": "UPLOADED", "newStatus": "TRANSCRIBED" }for status changes).AuditKind Enum
Instrumentation Points (5 service methods)
DocumentServiceupdateDocument()METADATA_UPDATEDorSTATUS_CHANGED(check which fields changed)DocumentServiceFILE_UPLOADEDon PLACEHOLDER → UPLOADEDTranscriptionBlockServicesave()/update()TEXT_SAVEDonly whentextfield actually changesTranscriptionBlockServicemarkReviewed()BLOCK_REVIEWEDwhenreviewedflips totrueAnnotationServicecreate()ANNOTATION_CREATEDAuditService
@Asyncis critical: audit writes must never slow down the user-facing operation. Fire-and-forget. Failures are logged but not rethrown.How This Unlocks the Dashboard (#271)
Once this is in, all three dashboard endpoints become simple queries:
Effort Estimate
AuditLogentity,AuditLogRepository,AuditServiceOut of Scope
DELETE WHERE happened_at < now() - interval '1 year'Prerequisite for
Design decision:
pageNumberin audit event payloadFollowing discussion on #271, the Family Pulse headline stat is "N Seiten bearbeitet" — counting distinct pages worked on this week. To avoid a join at query time,
pageNumbermust be written into thepayloadJSONB at event creation time.Affected event kinds:
pageNumbercomes fromANNOTATION_CREATEDDocumentAnnotation.pageNumber— available directly at creation timeTEXT_SAVEDTranscriptionBlock.annotationId→DocumentAnnotation.pageNumber— must be resolved in the service before loggingPayload shape for these two events:
Query this enables (no joins):
Implementation note: when logging
TEXT_SAVED, the service must load the parentDocumentAnnotationto resolvepageNumberbefore callingAuditService.log(). This is a single lookup byannotationIdand can be done within the same transaction as the block save.👨💻 Felix Brandt — Senior Fullstack Developer
Observations
Wrong service name in the spec. The issue references
TranscriptionBlockServicethroughout, but the actual class isTranscriptionService(service/TranscriptionService.java). The instrumentation table needs updating before implementation begins.reviewBlock()toggles, it doesn't set.TranscriptionService.reviewBlock()doesblock.setReviewed(!block.isReviewed())— it flips both ways. TheBLOCK_REVIEWEDevent must only fire when transitioning fromfalse → true. Implementation needs to capture the old value before the flip:TEXT_SAVED"only when text actually changes" needs an explicit diff.updateBlock()always callsblock.setText(text)without comparing old vs. new. The implementation must captureString previousText = block.getText()before the update and skip the audit call if unchanged.Two annotation creation paths.
AnnotationServicehas bothcreateAnnotation()(user-drawn) andcreateOcrAnnotation()(OCR-generated). The issue referencescreate()which matches neither. Clarify: should OCR-generated annotations emitANNOTATION_CREATED? My recommendation: no — OCR annotations are system-generated, not user contributions. OnlycreateAnnotation()should instrumentANNOTATION_CREATED.Map<String, Object>payload → JPA JSONB mapping not specified. TheAuditLogentity will need@JdbcTypeCode(SqlTypes.JSON)(Hibernate 6 / Spring Boot 4 style) on thepayloadfield — not@Column(columnDefinition = "jsonb"). The issue omits this detail.Store
AuditKinddirectly on the entity. Usingkind.name()inAuditServiceis fragile — it bypasses enum type safety. Use@Enumerated(EnumType.STRING)onAuditLog.kindand store the enum directly.Recommendations
TranscriptionService.updateBlock()andTranscriptionService.reviewBlock()are the actual instrumentation points.TEXT_SAVEDandBLOCK_REVIEWEDguards — one line each, before the mutation.@JdbcTypeCode(SqlTypes.JSON)to thepayloadfield in the entity.@Enumerated(EnumType.STRING)onAuditLog.kind— drop the manual.name()call inAuditService.log().🏛️ Markus Keller — Application Architect
Observations
@Asyncwithin@Transactionalhas a timing hazard — the biggest architectural risk in this issue. WhenauditService.log()is called from inside a@Transactionalservice method, Spring dispatches the audit write to the thread pool immediately — but the calling transaction may not have committed yet. The async thread opens its own transaction and tries to insert a row with a FK reference todocument_id. If the calling transaction commits after the async thread reads, the FK lookup succeeds. If the async thread runs first (plausible under load), it sees a non-existentdocument_idand the FK constraint fires — the audit write fails and is silently swallowed by the catch block. This is not a theoretical edge case; it is the standard problem with@Asynccalled from@Transactional.The established Spring fix is
afterCommitsynchronization:This queues the async dispatch for after the parent transaction commits, eliminating the hazard. For new document creation, this is essential; for updates to existing documents, it still prevents phantom reads.
entity_type/entity_idcolumns are premature complexity. Looking at the three dashboard queries in the issue, none of them useentity_typeorentity_id— they query bykind,document_id,actor_id, andhappened_at. These columns add EAV-style flexibility without a current consumer. Drop them from V46 and add them in a future migration when a concrete use case exists.Dedicated
audit/package.AuditLog,AuditKind,AuditService, andAuditLogRepositoryshould live inorg.raddatz.familienarchiv.audit/— not scattered acrossmodel/,service/, andrepository/. Audit is a bounded domain. It owns its repository, its entity, and its service. Other services callAuditServiceby injection (service → service, never service → repository across domains).The
@Asyncexecutor is shared with all existing async work (MassImport, OCR progress notifications). See Tobias's note on pool sizing. From an architecture perspective, a dedicated executor bean for audit is correct —AuditServiceshould declare@Async("auditExecutor")andAsyncConfigshould register a second small pool.Recommendations
TransactionSynchronizationManager.registerSynchronization()+afterCommit()to defer async dispatch until the parent transaction is committed. This is the correct pattern — not a workaround.entity_type/entity_idfrom V46. The current query patterns don't need them. Add them later if a concrete audit-by-entity-type query is needed.audit/package. No other package should import fromaudit/repositories directly.Open Decisions
entity_type/entity_id— keep or drop? The columns add flexibility (future query: "all events on block X") at the cost of schema noise and non-FK referential integrity. Options: (a) drop now and add in a future migration when needed — zero maintenance cost today; (b) keep — some implementation complexity, no FK guarantees, no current consumer. I lean toward (a), but if the block-level history UI is coming soon, keeping them avoids a migration later.🔒 Nora Steiner — Security Engineer
Observations
ON DELETE SET NULLonactor_idsilently drops audit attribution. The schema usesactor_id REFERENCES app_users(id) ON DELETE SET NULL. When a user is deleted, every audit row they created becomesactor_id = NULL. This is GDPR-correct for right-to-erasure, but it means "what did this user do?" becomes unanswerable after deletion. This is a deliberate trade-off and should be documented explicitly in the migration file — not left as an implicit schema behavior.The audit log must be append-only, enforced at the database layer. The application role (
app_user) should haveINSERTbut notUPDATEorDELETEonaudit_log. Without this, a bug (or compromised service account) could silently modify or purge audit events. Add this to V46:This is free defense and costs nothing to implement.
payload JSONBis an open-ended PII surface. The payload is untyped — any service method can pass anyMap<String, Object>. If a developer accidentally passes the full transcription text (e.g.,Map.of("text", block.getText())) in aTEXT_SAVEDevent, that content is now stored indefinitely in the audit log. The issue should enumerate, as explicit constants or Javadoc onAuditKind, exactly what payload fields are permitted for each event kind. This prevents accidental PII logging and makes future GDPR audits tractable.Fire-and-forget +
AbortPolicy= silently dropped security events. The existingAsyncConfigusesThreadPoolExecutor.AbortPolicy, which throwsRejectedExecutionExceptionwhen the queue (capacity 10) is full.AuditService.log()catchesException— so rejected audit tasks are caught, logged, and lost. For a security audit trail, silent gaps under load are a known risk. The issue acknowledges that "audit failure must never propagate to the caller" — which is correct — but the trade-off should be explicit: this audit log provides activity visibility for the dashboard, not tamper-evident security forensics. That's fine for this use case, but it should be stated.No missing injection risks from JSONB. The
payloadis aMap<String, Object>serialized by Hibernate/Jackson — not user-controlled SQL strings injected into queries. The three example dashboard queries use parameterized JPQL / native queries. No injection surface here. ✓Recommendations
REVOKE UPDATE, DELETE ON audit_log FROM app_user;— makes the audit log append-only at the DB layer.ON DELETE SET NULLintent: "By design — GDPR right-to-erasure. Deleted users' events retain the event timestamp and kind but lose actor attribution."AuditKindenum values specifying the allowed payload shape for each kind. Example:🧪 Sara Holt — QA Engineer
Observations
The issue's test estimate ("1 day for unit + integration per instrumented method") is plausible but needs a concrete test plan. Without specified test cases, this will likely produce happy-path-only coverage and miss the conditional logic that makes this feature correct.
@Asyncmakes integration test synchronization non-trivial. Unit tests mockAuditServicedirectly — no sync issues. Integration tests need to verify that the async write completes before asserting on the DB.Thread.sleep()is wrong here. Use Awaitility:@Transactionalon integration test methods will cause false positives for async writes. The standard@Transactionalrollback pattern on@SpringBootTesttests will roll back the test's transaction — but the async audit write runs on a separate thread in a separate transaction and will NOT be rolled back. This means rows accumulate across tests, causing count-based assertions to fail on the second run. Use@Sqlwith explicitDELETE FROM audit_login@BeforeEachinstead of relying on rollback.Three conditional invariants need explicit test coverage:
TEXT_SAVEDfires when text changes — and does NOT fire when text is unchangedBLOCK_REVIEWEDfires when reviewed flips fromfalse → true— and does NOT fire ontrue → falseMETADATA_UPDATEDvs.STATUS_CHANGEDsplit inupdateDocument— the correct kind is emitted for each scenarioThe dashboard queries should be tested now, not deferred. The three example SQL queries in the issue are the whole point of this feature. Write integration tests against a real Postgres container that seed audit rows and verify the query results — proving the schema actually supports the dashboard before #271 implementation begins.
Recommendations
Test plan:
AuditServiceTestlog()saves to repo; caught exceptions don't propagateTranscriptionServiceTestTEXT_SAVEDonly on text change;BLOCK_REVIEWEDonly on false→trueDocumentServiceTestMETADATA_UPDATEDvs.STATUS_CHANGEDdiscriminationAuditLogIntegrationTestAuditDashboardQueryTest@Sql("classpath:cleanup-audit-log.sql")on@BeforeEachto handle async cross-test contamination.REVOKE UPDATE, DELETEconstraint inAuditLogIntegrationTestby asserting that an attempt to update an audit row throwsDataAccessException.🚀 Tobias Wendt — DevOps & Platform Engineer
Observations
The existing async executor is shared and undersized for audit load.
AsyncConfigconfigures a singletaskExecutorwithmaxPoolSize(2)andqueueCapacity(10), usingAbortPolicy. This pool already servesMassImportServiceand OCR progress notifications. Adding high-frequency audit writes (everyTEXT_SAVED, everyBLOCK_REVIEWED) to this pool means:TEXT_SAVEDevents are rejectedRejectedExecutionExceptionis caught silently — audit events are dropped without any visible signalThe fix is a dedicated named executor for audit:
And on
AuditService:@Async("auditExecutor"). NoteCallerRunsPolicyrather thanAbortPolicy— under queue saturation, the caller thread handles the write synchronously rather than dropping the event. This is an acceptable degradation for a non-critical audit trail.audit_logwill grow unboundedly — estimate row volume now. The issue mentions a retention policy as future work. Before shipping, it's worth a rough estimate: 5 active users × 20 transcription saves/day × 365 days = ~36k rows/year. At ~500 bytes/row (UUID columns + JSONB payload), that's ~18MB/year — entirely negligible. The retention policy can stay as future work; the table won't cause operational problems for years.No observability gap for dropped events. Currently, if
AuditService.log()catches aRejectedExecutionException(dropped byAbortPolicy), it logs atERRORlevel but nothing counts it. Adding a simple Micrometer counter would make this visible in Grafana:Not essential for V1, but worth a follow-up issue.
Migration numbering is correct. Latest migration is V45 (
add_invite_tokens), so V46 is the right version number. ✓No Docker Compose or CI changes needed for this feature. The audit log is PostgreSQL-native — no new service, no new port, no infrastructure change. ✓
Recommendations
@Bean("auditExecutor")inAsyncConfigwithqueueCapacity(50)andCallerRunsPolicy— keeps audit writes isolated from import/OCR async work and degrades gracefully under load instead of silently dropping events.🎨 Leonie Voss — UX Designer & Accessibility Strategist
No UX concerns from my angle. This is a pure backend data infrastructure issue — no UI surface is affected, and the issue correctly scopes out both the audit log UI and the dashboard itself as separate issues.
One observation worth capturing for when the dashboard (#271) is designed: the
AuditKindvocabulary ("TEXT_SAVED", "BLOCK_REVIEWED", "ANNOTATION_CREATED") is developer-facing. The dashboard UI will need to translate these into human-readable, audience-appropriate phrasing — "Marcel hat 3 Seiten transkribiert" rather than "3 TEXT_SAVED events". That translation layer belongs in the dashboard issue, not here, but the semantic event model in this issue (domain events rather than raw column diffs) is exactly the right foundation for it. Good call on that design choice.🗳️ Decision Queue — Action Required
1 decision needs your input before implementation starts.
Architecture
entity_type/entity_idcolumns — keep or drop from V46? None of the three dashboard queries use these columns; they query bykind,document_id,actor_id, andhappened_atonly. Keeping them adds EAV-style flexibility (future: "all events on block X") but without FK guarantees and with no current consumer. Options: (a) drop them now — zero cost, add in a future migration when a concrete query needs them; (b) keep them — avoids a future migration if the block-level history UI (#274 is already scoped as "Audit log UI — separate issue") comes soon. (Raised by: Markus)We drop the columns