# Scripts — Familienarchiv ## Overview Utility scripts for development, data management, model downloads, and database operations. These are standalone shell and Python scripts used outside the normal application runtime. ## Scripts ### `reset-db.sh` **Purpose**: Hard-reset the development database, wiping all documents, persons, tags, and related data. **Usage:** ```bash ./scripts/reset-db.sh # Type 'yes' to confirm ``` **What it truncates:** - `transcription_block_versions` - `transcription_blocks` - `comment_mentions` - `document_comments` - `document_annotations` - `document_versions` - `notifications` - `documents` - `person_name_aliases` - `persons` - `tag` > ⚠️ **Destructive operation** — only for development! --- ### `rebuild-frontend.sh` **Purpose**: Force a clean rebuild of the frontend Docker container. **Usage:** ```bash ./scripts/rebuild-frontend.sh ``` --- ### `download-kraken-models.sh` **Purpose**: Download Kraken HTR models for German Kurrent and Sütterlin scripts. **Usage:** ```bash ./scripts/download-kraken-models.sh ``` Downloads models into `./ocr-service/models/` or the `ocr_models` Docker volume. Models are ~100-500 MB each. --- ### `download-paperless.sh` **Purpose**: Download exported documents from a Paperless-ngx instance. **Usage:** ```bash ./scripts/download-paperless.sh ``` Requires environment variables or config for the Paperless API endpoint and token. --- ### `flatten-paperless.sh` **Purpose**: Flatten nested Paperless export directories into a single import-ready structure. **Usage:** ```bash ./scripts/flatten-paperless.sh ``` --- ### `generate_data.py` **Purpose**: Generate synthetic test data for development. **Usage:** ```bash python scripts/generate_data.py ``` Generates fake documents, persons, and tags suitable for load testing or UI development. --- ### `prepare_historical_dict.py` **Purpose**: Build a historical German word dictionary for the OCR spell-checker. **Usage:** ```bash python scripts/prepare_historical_dict.py ``` Processes raw word lists into the format expected by `ocr-service/spell_check.py`. --- ### `schema.sql` **Purpose**: Complete database schema dump for reference. **Note**: Flyway migrations in `backend/src/main/resources/db/migration/` are the source of truth for schema evolution. `schema.sql` is a snapshot for quick reference only. --- ### `large-data.sql` **Purpose**: Pre-seeded dataset with a large number of documents for performance testing. **Usage:** ```bash # Import into PostgreSQL docker exec -i archive-db psql -U archive_user -d family_archive_db < scripts/large-data.sql ``` ## How to Use Most scripts should be run from the **repository root**: ```bash # Database reset ./scripts/reset-db.sh # Model download ./scripts/download-kraken-models.sh # Data generation cd scripts && python generate_data.py ``` Ensure scripts are executable: ```bash chmod +x scripts/*.sh ``` ## Adding New Scripts 1. Place the script in `scripts/` 2. Add a header comment describing purpose and usage 3. Make it executable (`chmod +x`) 4. Document it in this `CLAUDE.md`