docs(legibility): migrate CLAUDE.md rules into human docs — DOC-7

Processes all 7 CLAUDE.md files according to the 3-bucket classification. Migration targets (CONTRIBUTING.md, docs/ARCHITECTURE.md, docs/DEPLOYMENT.md, domain READMEs) are introduced by DOC-2/4/5/6 — this PR must merge last. ### scripts/CLAUDE.md → scripts/README.md New `scripts/README.md` with full script documentation (preserving the ⚠️ destructive-operation warning on reset-db.sh). `scripts/CLAUDE.md` reduced to a pointer + "document new scripts in README.md" reminder. ### .devcontainer/CLAUDE.md → .devcontainer/README.md New `.devcontainer/README.md` with all configuration, usage, and limitations. `devcontainer/CLAUDE.md` reduced to a single pointer line. ### docs/CLAUDE.md → docs/README.md New `docs/README.md` covering the folder structure, ADR guide, infrastructure docs, and specs folder. `docs/CLAUDE.md` reduced to pointer + ADR reminder. ### ocr-service/CLAUDE.md Reduced to pointer to `ocr-service/README.md` (content migrated in DOC-6). Kept LLM reminders: single-node constraint, ALLOWED_PDF_HOSTS SSRF risk. ### backend/CLAUDE.md - Layering Rules → pointer to docs/ARCHITECTURE.md - Error Handling → pointer to CONTRIBUTING.md + reminder - Security/Permissions → pointer to docs/ARCHITECTURE.md + reminder - Package Structure → tagged TODO post-REFACTOR-1 - Fixed errors.ts path to frontend/src/lib/shared/errors.ts - Added ANNOTATE_ALL + BLOG_WRITE to permission list - Key Entities, Entity Code Style, Services → kept (Bucket-2) ### root CLAUDE.md - Stack, Infrastructure, Dev Container → pointers - Layering Rules, Error Handling, Security, OpenAPI, API Client, Date Handling, UI Components, Frontend Error Handling → pointers + reminders - Package Structure → tagged TODO post-REFACTOR-1 - Domain Model, Entity Code Style, Form Actions, Styling → kept (Bucket-2) ### frontend/CLAUDE.md - API Client Pattern, Date Handling → pointers + reminders - Key UI Components → pointer to domain READMEs - Styling, Form Actions, How to Run, Vite Proxy, i18n → kept (Bucket-2) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-05 23:33:41 +02:00
parent 513fda2888
commit 86c13a230c
11 changed files with 452 additions and 732 deletions
--- a/scripts/README.md
+++ b/scripts/README.md
@@ -0,0 +1,161 @@
+# scripts/
+
+Utility scripts for development, data management, model downloads, and database operations. These are standalone shell and Python scripts used outside the normal application runtime.
+
+## Scripts
+
+### `reset-db.sh`
+
+**Purpose**: Hard-reset the development database, wiping all documents, persons, tags, and related data.
+
+**Usage:**
+
+```bash
+./scripts/reset-db.sh
+# Type 'yes' to confirm
+```
+
+**What it truncates:**
+
+- `transcription_block_versions`
+- `transcription_blocks`
+- `comment_mentions`
+- `document_comments`
+- `document_annotations`
+- `document_versions`
+- `notifications`
+- `documents`
+- `person_name_aliases`
+- `persons`
+- `tag`
+
+> ⚠️ **Destructive operation — only for development!** This wipes ALL data. Not reversible without a backup.
+
+---
+
+### `rebuild-frontend.sh`
+
+**Purpose**: Force a clean rebuild of the frontend Docker container.
+
+**Usage:**
+
+```bash
+./scripts/rebuild-frontend.sh
+```
+
+---
+
+### `download-kraken-models.sh`
+
+**Purpose**: Download Kraken HTR models for German Kurrent and Sütterlin scripts.
+
+**Usage:**
+
+```bash
+./scripts/download-kraken-models.sh
+```
+
+Downloads models into `./ocr-service/models/` or the `ocr_models` Docker volume. Models are ~100–500 MB each.
+
+---
+
+### `download-paperless.sh`
+
+**Purpose**: Download exported documents from a Paperless-ngx instance.
+
+**Usage:**
+
+```bash
+./scripts/download-paperless.sh
+```
+
+Requires environment variables or config for the Paperless API endpoint and token.
+
+---
+
+### `flatten-paperless.sh`
+
+**Purpose**: Flatten nested Paperless export directories into a single import-ready structure.
+
+**Usage:**
+
+```bash
+./scripts/flatten-paperless.sh
+```
+
+---
+
+### `generate_data.py`
+
+**Purpose**: Generate synthetic test data for development.
+
+**Usage:**
+
+```bash
+python scripts/generate_data.py
+```
+
+Generates fake documents, persons, and tags suitable for load testing or UI development.
+
+---
+
+### `prepare_historical_dict.py`
+
+**Purpose**: Build a historical German word dictionary for the OCR spell-checker.
+
+**Usage:**
+
+```bash
+python scripts/prepare_historical_dict.py
+```
+
+Processes raw word lists into the format expected by `ocr-service/spell_check.py`.
+
+---
+
+### `schema.sql`
+
+**Purpose**: Complete database schema dump for reference.
+
+**Note**: Flyway migrations in `backend/src/main/resources/db/migration/` are the source of truth for schema evolution. `schema.sql` is a snapshot for quick reference only.
+
+---
+
+### `large-data.sql`
+
+**Purpose**: Pre-seeded dataset with a large number of documents for performance testing.
+
+**Usage:**
+
+```bash
+# Import into PostgreSQL
+docker exec -i archive-db psql -U archive_user -d family_archive_db < scripts/large-data.sql
+```
+
+## How to Use
+
+Most scripts should be run from the **repository root**:
+
+```bash
+# Database reset
+./scripts/reset-db.sh
+
+# Model download
+./scripts/download-kraken-models.sh
+
+# Data generation
+cd scripts && python generate_data.py
+```
+
+Ensure scripts are executable:
+
+```bash
+chmod +x scripts/*.sh
+```
+
+## Adding New Scripts
+
+1. Place the script in `scripts/`
+2. Add a header comment describing purpose and usage
+3. Make it executable (`chmod +x`)
+4. Document it in this `README.md`