Files
familienarchiv/scripts/CLAUDE.md
2026-05-05 12:39:20 +02:00

3.0 KiB

Scripts — Familienarchiv

Overview

Utility scripts for development, data management, model downloads, and database operations. These are standalone shell and Python scripts used outside the normal application runtime.

Scripts

reset-db.sh

Purpose: Hard-reset the development database, wiping all documents, persons, tags, and related data.

Usage:

./scripts/reset-db.sh
# Type 'yes' to confirm

What it truncates:

  • transcription_block_versions
  • transcription_blocks
  • comment_mentions
  • document_comments
  • document_annotations
  • document_versions
  • notifications
  • documents
  • person_name_aliases
  • persons
  • tag

⚠️ Destructive operation — only for development!


rebuild-frontend.sh

Purpose: Force a clean rebuild of the frontend Docker container.

Usage:

./scripts/rebuild-frontend.sh

download-kraken-models.sh

Purpose: Download Kraken HTR models for German Kurrent and Sütterlin scripts.

Usage:

./scripts/download-kraken-models.sh

Downloads models into ./ocr-service/models/ or the ocr_models Docker volume. Models are ~100-500 MB each.


download-paperless.sh

Purpose: Download exported documents from a Paperless-ngx instance.

Usage:

./scripts/download-paperless.sh

Requires environment variables or config for the Paperless API endpoint and token.


flatten-paperless.sh

Purpose: Flatten nested Paperless export directories into a single import-ready structure.

Usage:

./scripts/flatten-paperless.sh

generate_data.py

Purpose: Generate synthetic test data for development.

Usage:

python scripts/generate_data.py

Generates fake documents, persons, and tags suitable for load testing or UI development.


prepare_historical_dict.py

Purpose: Build a historical German word dictionary for the OCR spell-checker.

Usage:

python scripts/prepare_historical_dict.py

Processes raw word lists into the format expected by ocr-service/spell_check.py.


schema.sql

Purpose: Complete database schema dump for reference.

Note: Flyway migrations in backend/src/main/resources/db/migration/ are the source of truth for schema evolution. schema.sql is a snapshot for quick reference only.


large-data.sql

Purpose: Pre-seeded dataset with a large number of documents for performance testing.

Usage:

# Import into PostgreSQL
docker exec -i archive-db psql -U archive_user -d family_archive_db < scripts/large-data.sql

How to Use

Most scripts should be run from the repository root:

# Database reset
./scripts/reset-db.sh

# Model download
./scripts/download-kraken-models.sh

# Data generation
cd scripts && python generate_data.py

Ensure scripts are executable:

chmod +x scripts/*.sh

Adding New Scripts

  1. Place the script in scripts/
  2. Add a header comment describing purpose and usage
  3. Make it executable (chmod +x)
  4. Document it in this CLAUDE.md