Files
familienarchiv/scripts/README.md
Marcel e2c86626f7 docs(legibility): migrate CLAUDE.md rules into human docs — DOC-7
Processes all 7 CLAUDE.md files according to the 3-bucket classification.
Migration targets (CONTRIBUTING.md, docs/ARCHITECTURE.md, docs/DEPLOYMENT.md,
domain READMEs) are introduced by DOC-2/4/5/6 — this PR must merge last.

### scripts/CLAUDE.md → scripts/README.md
New `scripts/README.md` with full script documentation (preserving the
⚠️ destructive-operation warning on reset-db.sh). `scripts/CLAUDE.md`
reduced to a pointer + "document new scripts in README.md" reminder.

### .devcontainer/CLAUDE.md → .devcontainer/README.md
New `.devcontainer/README.md` with all configuration, usage, and limitations.
`devcontainer/CLAUDE.md` reduced to a single pointer line.

### docs/CLAUDE.md → docs/README.md
New `docs/README.md` covering the folder structure, ADR guide, infrastructure
docs, and specs folder. `docs/CLAUDE.md` reduced to pointer + ADR reminder.

### ocr-service/CLAUDE.md
Reduced to pointer to `ocr-service/README.md` (content migrated in DOC-6).
Kept LLM reminders: single-node constraint, ALLOWED_PDF_HOSTS SSRF risk.

### backend/CLAUDE.md
- Layering Rules → pointer to docs/ARCHITECTURE.md
- Error Handling → pointer to CONTRIBUTING.md + reminder
- Security/Permissions → pointer to docs/ARCHITECTURE.md + reminder
- Package Structure → tagged TODO post-REFACTOR-1
- Fixed errors.ts path to frontend/src/lib/shared/errors.ts
- Added ANNOTATE_ALL + BLOG_WRITE to permission list
- Key Entities, Entity Code Style, Services → kept (Bucket-2)

### root CLAUDE.md
- Stack, Infrastructure, Dev Container → pointers
- Layering Rules, Error Handling, Security, OpenAPI, API Client,
  Date Handling, UI Components, Frontend Error Handling → pointers + reminders
- Package Structure → tagged TODO post-REFACTOR-1
- Domain Model, Entity Code Style, Form Actions, Styling → kept (Bucket-2)

### frontend/CLAUDE.md
- API Client Pattern, Date Handling → pointers + reminders
- Key UI Components → pointer to domain READMEs
- Styling, Form Actions, How to Run, Vite Proxy, i18n → kept (Bucket-2)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-05 23:33:41 +02:00

162 lines
3.1 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# scripts/
Utility scripts for development, data management, model downloads, and database operations. These are standalone shell and Python scripts used outside the normal application runtime.
## Scripts
### `reset-db.sh`
**Purpose**: Hard-reset the development database, wiping all documents, persons, tags, and related data.
**Usage:**
```bash
./scripts/reset-db.sh
# Type 'yes' to confirm
```
**What it truncates:**
- `transcription_block_versions`
- `transcription_blocks`
- `comment_mentions`
- `document_comments`
- `document_annotations`
- `document_versions`
- `notifications`
- `documents`
- `person_name_aliases`
- `persons`
- `tag`
> ⚠️ **Destructive operation — only for development!** This wipes ALL data. Not reversible without a backup.
---
### `rebuild-frontend.sh`
**Purpose**: Force a clean rebuild of the frontend Docker container.
**Usage:**
```bash
./scripts/rebuild-frontend.sh
```
---
### `download-kraken-models.sh`
**Purpose**: Download Kraken HTR models for German Kurrent and Sütterlin scripts.
**Usage:**
```bash
./scripts/download-kraken-models.sh
```
Downloads models into `./ocr-service/models/` or the `ocr_models` Docker volume. Models are ~100500 MB each.
---
### `download-paperless.sh`
**Purpose**: Download exported documents from a Paperless-ngx instance.
**Usage:**
```bash
./scripts/download-paperless.sh
```
Requires environment variables or config for the Paperless API endpoint and token.
---
### `flatten-paperless.sh`
**Purpose**: Flatten nested Paperless export directories into a single import-ready structure.
**Usage:**
```bash
./scripts/flatten-paperless.sh
```
---
### `generate_data.py`
**Purpose**: Generate synthetic test data for development.
**Usage:**
```bash
python scripts/generate_data.py
```
Generates fake documents, persons, and tags suitable for load testing or UI development.
---
### `prepare_historical_dict.py`
**Purpose**: Build a historical German word dictionary for the OCR spell-checker.
**Usage:**
```bash
python scripts/prepare_historical_dict.py
```
Processes raw word lists into the format expected by `ocr-service/spell_check.py`.
---
### `schema.sql`
**Purpose**: Complete database schema dump for reference.
**Note**: Flyway migrations in `backend/src/main/resources/db/migration/` are the source of truth for schema evolution. `schema.sql` is a snapshot for quick reference only.
---
### `large-data.sql`
**Purpose**: Pre-seeded dataset with a large number of documents for performance testing.
**Usage:**
```bash
# Import into PostgreSQL
docker exec -i archive-db psql -U archive_user -d family_archive_db < scripts/large-data.sql
```
## How to Use
Most scripts should be run from the **repository root**:
```bash
# Database reset
./scripts/reset-db.sh
# Model download
./scripts/download-kraken-models.sh
# Data generation
cd scripts && python generate_data.py
```
Ensure scripts are executable:
```bash
chmod +x scripts/*.sh
```
## Adding New Scripts
1. Place the script in `scripts/`
2. Add a header comment describing purpose and usage
3. Make it executable (`chmod +x`)
4. Document it in this `README.md`