3.0 KiB
Scripts — Familienarchiv
Overview
Utility scripts for development, data management, model downloads, and database operations. These are standalone shell and Python scripts used outside the normal application runtime.
Scripts
reset-db.sh
Purpose: Hard-reset the development database, wiping all documents, persons, tags, and related data.
Usage:
./scripts/reset-db.sh
# Type 'yes' to confirm
What it truncates:
transcription_block_versionstranscription_blockscomment_mentionsdocument_commentsdocument_annotationsdocument_versionsnotificationsdocumentsperson_name_aliasespersonstag
⚠️ Destructive operation — only for development!
rebuild-frontend.sh
Purpose: Force a clean rebuild of the frontend Docker container.
Usage:
./scripts/rebuild-frontend.sh
download-kraken-models.sh
Purpose: Download Kraken HTR models for German Kurrent and Sütterlin scripts.
Usage:
./scripts/download-kraken-models.sh
Downloads models into ./ocr-service/models/ or the ocr_models Docker volume. Models are ~100-500 MB each.
download-paperless.sh
Purpose: Download exported documents from a Paperless-ngx instance.
Usage:
./scripts/download-paperless.sh
Requires environment variables or config for the Paperless API endpoint and token.
flatten-paperless.sh
Purpose: Flatten nested Paperless export directories into a single import-ready structure.
Usage:
./scripts/flatten-paperless.sh
generate_data.py
Purpose: Generate synthetic test data for development.
Usage:
python scripts/generate_data.py
Generates fake documents, persons, and tags suitable for load testing or UI development.
prepare_historical_dict.py
Purpose: Build a historical German word dictionary for the OCR spell-checker.
Usage:
python scripts/prepare_historical_dict.py
Processes raw word lists into the format expected by ocr-service/spell_check.py.
schema.sql
Purpose: Complete database schema dump for reference.
Note: Flyway migrations in backend/src/main/resources/db/migration/ are the source of truth for schema evolution. schema.sql is a snapshot for quick reference only.
large-data.sql
Purpose: Pre-seeded dataset with a large number of documents for performance testing.
Usage:
# Import into PostgreSQL
docker exec -i archive-db psql -U archive_user -d family_archive_db < scripts/large-data.sql
How to Use
Most scripts should be run from the repository root:
# Database reset
./scripts/reset-db.sh
# Model download
./scripts/download-kraken-models.sh
# Data generation
cd scripts && python generate_data.py
Ensure scripts are executable:
chmod +x scripts/*.sh
Adding New Scripts
- Place the script in
scripts/ - Add a header comment describing purpose and usage
- Make it executable (
chmod +x) - Document it in this
CLAUDE.md