From 823735b09aff3be4427d15f63449c07fe50ac863 Mon Sep 17 00:00:00 2001 From: Marcel Date: Tue, 5 May 2026 13:26:14 +0200 Subject: [PATCH] chore: remove .agent planning docs from branch These are LLM-generated planning documents for a different issue (import pipeline work), unrelated to the domain packaging refactor. Co-Authored-By: Claude Sonnet 4.6 --- .agent/PLAN.md | 274 ------------------------------------ .agent/current-plan.md | 305 ----------------------------------------- 2 files changed, 579 deletions(-) delete mode 100644 .agent/PLAN.md delete mode 100644 .agent/current-plan.md diff --git a/.agent/PLAN.md b/.agent/PLAN.md deleted file mode 100644 index 844307f4..00000000 --- a/.agent/PLAN.md +++ /dev/null @@ -1,274 +0,0 @@ -# Import Pipeline: ODS Alignment Plan - -## Context - -The real data source is an ODS spreadsheet (`zzfamilienarchiv Walter und Eugenie 2025-04-10.ods`) with 1,508 rows and 14 columns, living alongside PDF files (`W-0001.pdf`, `C-0451.pdf`, etc.) in `familienarchiv_raw/`. The existing import pipeline was built speculatively without seeing the actual data. It has several structural mismatches that need to be resolved before any real import can run. - -`ExcelService` (the web-upload import path) will be **deleted entirely**. The only import path is `MassImportService`, which reads an ODS file from the `/import` directory on the filesystem. This simplifies the scope significantly. - ---- - -## What the ODS Actually Contains - -| Col | Header | Example value | Action | -|-----|----------------------|------------------------------------------|-----------------| -| 0 | Index | `W-0001` | → `originalFilename` (+ `.pdf`) | -| 1 | Box | `V` | → `archiveBox` (new field) | -| 2 | Mappe | `1` | → `archiveFolder` (new field) | -| 3 | Von | `Walter de Gruyter` | → `sender` (Person) | -| 4 | BriefeschreiberIn | `Walter de Gruyter` | Ignored (redundant with col 3) | -| 5 | An | `Eugenie de Gruyter geb. Müller` | → `receivers` (Person, parse multi) | -| 6 | EmpfängerIn | `Eugenie Müller` | Ignored (redundant with col 5) | -| 7 | Datum | `1888-02-15` (ISO date string) | → `documentDate` | -| 8 | Datum Originalformat | `15.2.1888` | Ignored | -| 9 | Ort | `Rotterdam` | → `location` | -| 10 | Schlagwort | `Brautbriefe` | → `tags` | -| 11 | Inhalt | `Geschäftsreise` | → `summary` | -| 12 | Zeitlicher Kontext | `Brautbriefe von Walter...` | Skipped (no clear mapping) | -| 13 | Transkript | (mostly empty for now) | → `transcription` | - ---- - -## Changes - -### 1. Delete ExcelService - -`ExcelService.java` is deleted. All references to it (in `AdminController` or wherever it is injected) are removed. Going forward, `MassImportService` is the sole import mechanism. The web-upload flow that previously called `ExcelService` is removed from the controller. - -**Why:** The user confirmed the ODS-from-filesystem path is the only import workflow. Keeping dead code would create maintenance confusion. - ---- - -### 2. File Format: ODS support via WorkbookFactory - -**Current behaviour:** `MassImportService` constructs `new XSSFWorkbook(inputStream)`, which only handles `.xlsx`. The ODS file throws immediately. - -**Fix:** Replace with `WorkbookFactory.create(fis)`. Apache POI 5.x's `WorkbookFactory` auto-detects the format and handles `.xlsx`, `.xls`, and `.ods` without any extra dependencies. Also update `findExcelFile()` which currently filters by `.endsWith(".xlsx")` — change the filter to accept `.ods`, `.xlsx`, and `.xls`. - -**Why not add `odftoolkit`?** We already have `poi` and `poi-ooxml` at 5.5.0. `WorkbookFactory` covers this case. A second spreadsheet library would be redundant. - ---- - -### 3. Column Index Defaults - -**Current defaults (wrong):** -``` -app.import.excel.col.filename=0 date=1 location=2 transcription=3 -``` - -**Correct indices:** -``` -filename=0 box=1 folder=2 sender=3 receivers=5 date=7 location=9 tags=10 summary=11 transcription=13 -``` - -**Fix:** Update `@Value` defaults in `MassImportService` and set explicit values in `application.properties`. Remove the old defaults from `ExcelService` (which is deleted). Rename the property prefix from `app.import.excel.col.*` to `app.import.col.*` since the format is no longer Excel-specific. - ---- - -### 4. Filename Resolution: Index → PDF - -**Current behaviour:** Cell value used directly as `originalFilename`. - -**Actual situation:** Col 0 is the bare index (e.g., `W-0001`). PDF files are named `W-0001.pdf`. The import must append `.pdf`. - -**Fix:** After reading col 0, append `.pdf` if the value contains no `.`: -```java -if (!filename.contains(".")) filename = filename + ".pdf"; -``` - ---- - -### 5. Document Title: German Date Format - -**Current behaviour:** Title is set to the raw filename, e.g. `W-0001.pdf`. - -**Fix:** Build title from `{Index} – {date in German format} – {location}`. Use `DateTimeFormatter` with locale `de`: -``` -W-0001 – 15. Februar 1888 – Rotterdam -``` -If date is missing, omit date segment. If location is missing, omit location segment. The index alone is acceptable as a minimum title. - -**German month formatting:** Use `DateTimeFormatter.ofPattern("d. MMMM yyyy", Locale.GERMAN)`. - ---- - -### 6. Date Parsing: Add String Fallback - -**Current behaviour:** Only handles numeric date-formatted cells (`DateUtil.isCellDateFormatted()`). - -**Actual data:** Col 7 contains ISO date strings (`1888-02-15`) stored as text in LibreOffice ODS. These have `CellType.STRING`, so the existing code silently produces `null` dates for every row. - -**Fix:** Extract a helper method `parseDate(Cell)`: -```java -private LocalDate parseDate(Cell cell) { - if (cell == null) return null; - if (cell.getCellType() == CellType.NUMERIC && DateUtil.isCellDateFormatted(cell)) - return cell.getDateCellValue().toInstant().atZone(ZoneId.systemDefault()).toLocalDate(); - if (cell.getCellType() == CellType.STRING) { - try { return LocalDate.parse(cell.getStringCellValue().trim()); } - catch (DateTimeParseException e) { return null; } - } - return null; -} -``` - ---- - -### 7. Sender: Text → Person (lookup-or-create) - -**Current behaviour:** Sender is never set. - -**Actual data:** Col 3 (`Von`) is always a single name string, e.g. `Walter de Gruyter`, `Eugenie de Gruyter geb. Müller`. - -**Fix:** Extract a `findOrCreatePerson(String rawName)` helper: -1. Look up by `alias` exact match (case-insensitive). Use a new repository method `findByAliasIgnoreCase(String)` on `PersonRepository`. -2. If not found, create with: - - `alias` = full raw string - - `firstName` / `lastName` = best-effort split (see §9 below) -3. Return the `Person` and set on `document.setSender(...)`. - ---- - -### 8. Receivers: Text → Person(s) with Normalization - -**Current behaviour:** Receivers are never set. - -**Actual data (exhaustive set of multi-receiver patterns):** -``` -'Clara Cram u Ellen B-M' -'Clara u Familie' -'Clara u Herbert Cram' -'Ella u Walter Dieckmann' -'Eugenie u Walter de Gruyter' -'Hedi und Tutu (Gruber)' -'Herbert und Clara Cram' -'Walter und Eugenie' -'Walter und Eugenie de Gruyter' -``` - -**Parsing algorithm for col 5 (`An`):** - -1. **Strip `geb.` clauses** — remove ` geb. \w+` from the string (maiden name annotations are not useful for matching). -2. **Extract parenthesised last name** — if the string ends with `(Something)`, capture `Something` as the shared last name and strip it. -3. **Split on separator** — split on ` und ` or ` u ` (whole-word match with `\s+u\s+` or `\s+und\s+`). -4. **Filter** — discard any segment that is exactly `Familie` (it's not a person). -5. **Distribute shared last name** — find the last name in the rightmost segment. Known multi-word last name particles: `de Gruyter`. Known single-word last names: `Cram`, `Dieckmann`, `Gruber`, `Müller`, `Wolff`. These are hardcoded as a lookup list. If the last segment ends with a known last name and an earlier segment has no last name (i.e., it is a single token), append that last name to the earlier segment. -6. **Handle no-last-name cases** — if no last name can be determined at all (e.g., `Walter und Eugenie`), proceed with just the first name; `lastName` will be set to `""` (empty string — tolerated since the model has `nullable = false` and we need something; using `"?"` as placeholder is clearer). -7. **findOrCreatePerson** for each resulting name segment, then add all to `document.getReceivers()`. - -**Examples:** -| Raw | Result | -|-----|--------| -| `Walter und Eugenie de Gruyter` | [Walter de Gruyter, Eugenie de Gruyter] | -| `Herbert und Clara Cram` | [Herbert Cram, Clara Cram] | -| `Hedi und Tutu (Gruber)` | [Hedi Gruber, Tutu Gruber] | -| `Clara Cram u Ellen B-M` | [Clara Cram, Ellen B-M] | -| `Clara u Familie` | [Clara] | -| `Walter und Eugenie` | [Walter (?), Eugenie (?)] | -| `Eugenie de Gruyter geb. Müller` | [Eugenie de Gruyter] | - -**Why normalise?** Without normalisation, `Herbert und Clara Cram` would become one person with a nonsensical name and would never match separate `Herbert Cram` or `Clara Cram` entries from other rows. Normalisation means subsequent rows referencing the same individual will reuse the same `Person` record. - -**Why hardcode the last names?** There are only 6 known family names in this archive. Adding a configurable list would be over-engineering for a one-family archive. If the archive expands, the list can be extended. - ---- - -### 9. Name Splitting Helper (firstName / lastName) - -Used when creating a new `Person` who cannot be found by alias. - -**Algorithm:** -1. Strip any ` geb. \w+` suffix. -2. Check if the string ends with a known last name (from the list in §8). If yes, everything before it is `firstName`, and that is `lastName`. -3. If `de Gruyter` is detected as the last name, it is multi-word — `firstName` is everything before `de Gruyter`. -4. Otherwise, split on the last space: `firstName` = everything before, `lastName` = last word. -5. If only one token (no space), `firstName` = token, `lastName` = `"?"`. - -This logic lives in a single static utility method `PersonNameParser.split(String)` returning a record `SplitName(String firstName, String lastName)`. Keeping it static and pure makes it straightforward to unit-test without a Spring context. - ---- - -### 10. Tags: Lookup-or-Create - -**Current behaviour:** Tags are never imported. - -**Fix:** Read col 10 (`Schlagwort`). If non-blank: -```java -Tag tag = tagRepository.findByNameIgnoreCase(value) - .orElseGet(() -> tagRepository.save(Tag.builder().name(value).build())); -document.getTags().add(tag); -``` - -Tags are imported as-is. The `TagRepository` already has `findByNameIgnoreCase`, so deduplication is free. - ---- - -### 11. Summary: Map "Inhalt" (Col 11) - -Read col 11 (`Inhalt`) and set on `document.setSummary(...)`. Short content keywords (`Geschäftsreise`, `Reisepläne`) are useful for full-text search even if they're terse. - -Col 12 (`Zeitlicher Kontext`) is skipped — it is often a duplicate of context already encoded in sender/receiver/tags. - ---- - -### 12. New Model Fields: archiveBox and archiveFolder - -Cols 1 and 2 (`Box`, `Mappe`) identify the physical storage location of the original document. They have no counterpart in the model today. - -**Changes:** -1. Add to `Document.java`: - ```java - @Column(name = "archive_box") - private String archiveBox; - - @Column(name = "archive_folder") - private String archiveFolder; - ``` -2. Flyway migration `V4__add_archive_fields_to_documents.sql`: - ```sql - ALTER TABLE documents ADD COLUMN archive_box VARCHAR(255); - ALTER TABLE documents ADD COLUMN archive_folder VARCHAR(255); - ``` -3. Import logic reads col 1 → `archiveBox`, col 2 → `archiveFolder`. - ---- - -### 13. PersonRepository: Add findByAliasIgnoreCase - -Add one method to `PersonRepository`: -```java -Optional findByAliasIgnoreCase(String alias); -``` -Spring Data generates the query automatically. No other repository changes are needed. - ---- - -## Overwrite Behaviour (No Change) - -The existing skip logic stays: if a document already exists in the DB and its status is not `PLACEHOLDER`, it is skipped. This prevents accidental data loss on re-runs. The assumption is that if someone has manually enriched a document beyond placeholder stage, that work should not be overwritten by a re-import. - ---- - -## Summary of All File Changes - -| File | Change | -|------|--------| -| `ExcelService.java` | **Deleted** | -| `AdminController.java` (or wherever ExcelService is injected) | Remove ExcelService injection and its endpoint | -| `MassImportService.java` | `WorkbookFactory`, new column indices, `.ods` discovery, filename fix, title, date parsing, sender, receivers, tags, summary, archiveBox/archiveFolder | -| `PersonNameParser.java` (new) | Static utility: `split(String)` → `SplitName`, `parseReceivers(String)` → `List` | -| `PersonRepository.java` | Add `findByAliasIgnoreCase(String)` | -| `Document.java` | Add `archiveBox`, `archiveFolder` fields | -| `V4__add_archive_fields_to_documents.sql` (new) | `ALTER TABLE` for both new columns | -| `application.properties` | Update/add `app.import.col.*` properties | - ---- - -## What We Are Not Changing - -- **Col 4 (`BriefeschreiberIn`)** — redundant with col 3. -- **Col 6 (`EmpfängerIn`)** — redundant with col 5. -- **Col 8 (`Datum Originalformat`)** — ISO date in col 7 is strictly better. -- **Col 12 (`Zeitlicher Kontext`)** — no clear mapping, often duplicates other fields. -- **`persons` table schema** — `alias` serves as the full-name store without a schema change. -- **`TagRepository`** — existing `findByNameIgnoreCase` is sufficient. diff --git a/.agent/current-plan.md b/.agent/current-plan.md deleted file mode 100644 index 525abf87..00000000 --- a/.agent/current-plan.md +++ /dev/null @@ -1,305 +0,0 @@ -# Plan: Notifications (#71) + @mentions (#72) - -## Context - -### Existing code that matters -- `DocumentComment` — entity with `id`, `documentId`, `annotationId`, `parentId`, `authorId`, `authorName`, `content`, `replies` (transient). No mention storage yet. -- `CommentService` — `postComment`, `replyToComment`, `editComment`, `deleteComment`. Returns `DocumentComment` directly (no response DTO). -- `CreateCommentDTO` — only has `content`. Needs `mentionedUserIds` added. -- `AppUser` — has `id`, `username`, `firstName`, `lastName`, `email`. No notification preferences yet. -- `PasswordResetService` — uses `JavaMailSender` (`@Autowired(required = false)`) + `SimpleMailMessage`. `NotificationService` follows the exact same pattern. -- Latest migration: `V13__add_file_hash.sql`. -- `CommentThread.svelte` — uses fetch-based API calls (not SvelteKit form actions), plain `