Files

Marcel 86c13a230c docs(legibility): migrate CLAUDE.md rules into human docs — DOC-7

Processes all 7 CLAUDE.md files according to the 3-bucket classification.
Migration targets (CONTRIBUTING.md, docs/ARCHITECTURE.md, docs/DEPLOYMENT.md,
domain READMEs) are introduced by DOC-2/4/5/6 — this PR must merge last.

### scripts/CLAUDE.md → scripts/README.md
New `scripts/README.md` with full script documentation (preserving the
⚠️ destructive-operation warning on reset-db.sh). `scripts/CLAUDE.md`
reduced to a pointer + "document new scripts in README.md" reminder.

### .devcontainer/CLAUDE.md → .devcontainer/README.md
New `.devcontainer/README.md` with all configuration, usage, and limitations.
`devcontainer/CLAUDE.md` reduced to a single pointer line.

### docs/CLAUDE.md → docs/README.md
New `docs/README.md` covering the folder structure, ADR guide, infrastructure
docs, and specs folder. `docs/CLAUDE.md` reduced to pointer + ADR reminder.

### ocr-service/CLAUDE.md
Reduced to pointer to `ocr-service/README.md` (content migrated in DOC-6).
Kept LLM reminders: single-node constraint, ALLOWED_PDF_HOSTS SSRF risk.

### backend/CLAUDE.md
- Layering Rules → pointer to docs/ARCHITECTURE.md
- Error Handling → pointer to CONTRIBUTING.md + reminder
- Security/Permissions → pointer to docs/ARCHITECTURE.md + reminder
- Package Structure → tagged TODO post-REFACTOR-1
- Fixed errors.ts path to frontend/src/lib/shared/errors.ts
- Added ANNOTATE_ALL + BLOG_WRITE to permission list
- Key Entities, Entity Code Style, Services → kept (Bucket-2)

### root CLAUDE.md
- Stack, Infrastructure, Dev Container → pointers
- Layering Rules, Error Handling, Security, OpenAPI, API Client,
  Date Handling, UI Components, Frontend Error Handling → pointers + reminders
- Package Structure → tagged TODO post-REFACTOR-1
- Domain Model, Entity Code Style, Form Actions, Styling → kept (Bucket-2)

### frontend/CLAUDE.md
- API Client Pattern, Date Handling → pointers + reminders
- Key UI Components → pointer to domain READMEs
- Styling, Form Actions, How to Run, Vite Proxy, i18n → kept (Bucket-2)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

2026-05-06 07:41:02 +02:00

3.1 KiB

Raw Blame History

scripts/

Utility scripts for development, data management, model downloads, and database operations. These are standalone shell and Python scripts used outside the normal application runtime.

Scripts

`reset-db.sh`

Purpose: Hard-reset the development database, wiping all documents, persons, tags, and related data.

Usage:

./scripts/reset-db.sh
# Type 'yes' to confirm

What it truncates:

transcription_block_versions
transcription_blocks
comment_mentions
document_comments
document_annotations
document_versions
notifications
documents
person_name_aliases
persons
tag

⚠️ Destructive operation — only for development! This wipes ALL data. Not reversible without a backup.

`rebuild-frontend.sh`

Purpose: Force a clean rebuild of the frontend Docker container.

Usage:

./scripts/rebuild-frontend.sh

`download-kraken-models.sh`

Purpose: Download Kraken HTR models for German Kurrent and Sütterlin scripts.

Usage:

./scripts/download-kraken-models.sh

Downloads models into ./ocr-service/models/ or the ocr_models Docker volume. Models are ~100–500 MB each.

`download-paperless.sh`

Purpose: Download exported documents from a Paperless-ngx instance.

Usage:

./scripts/download-paperless.sh

Requires environment variables or config for the Paperless API endpoint and token.

`flatten-paperless.sh`

Purpose: Flatten nested Paperless export directories into a single import-ready structure.

Usage:

./scripts/flatten-paperless.sh

`generate_data.py`

Purpose: Generate synthetic test data for development.

Usage:

python scripts/generate_data.py

Generates fake documents, persons, and tags suitable for load testing or UI development.

`prepare_historical_dict.py`

Purpose: Build a historical German word dictionary for the OCR spell-checker.

Usage:

python scripts/prepare_historical_dict.py

Processes raw word lists into the format expected by ocr-service/spell_check.py.

`schema.sql`

Purpose: Complete database schema dump for reference.

Note: Flyway migrations in backend/src/main/resources/db/migration/ are the source of truth for schema evolution. schema.sql is a snapshot for quick reference only.

`large-data.sql`

Purpose: Pre-seeded dataset with a large number of documents for performance testing.

Usage:

# Import into PostgreSQL
docker exec -i archive-db psql -U archive_user -d family_archive_db < scripts/large-data.sql

How to Use

Most scripts should be run from the repository root:

# Database reset
./scripts/reset-db.sh

# Model download
./scripts/download-kraken-models.sh

# Data generation
cd scripts && python generate_data.py

Ensure scripts are executable:

chmod +x scripts/*.sh

Adding New Scripts

Place the script in scripts/
Add a header comment describing purpose and usage
Make it executable (chmod +x)
Document it in this README.md

3.1 KiB Raw Blame History Unescape Escape

scripts/

Scripts

reset-db.sh

rebuild-frontend.sh

download-kraken-models.sh

download-paperless.sh

flatten-paperless.sh

generate_data.py

prepare_historical_dict.py

schema.sql

large-data.sql

How to Use

Adding New Scripts

3.1 KiB

Raw Blame History

`reset-db.sh`

`rebuild-frontend.sh`

`download-kraken-models.sh`

`download-paperless.sh`

`flatten-paperless.sh`

`generate_data.py`

`prepare_historical_dict.py`

`schema.sql`

`large-data.sql`