diff --git a/docs/superpowers/plans/2026-06-07-spacy-nlp-service.md b/docs/superpowers/plans/2026-06-07-spacy-nlp-service.md
new file mode 100644
index 00000000..fde4ac7e
--- /dev/null
+++ b/docs/superpowers/plans/2026-06-07-spacy-nlp-service.md
@@ -0,0 +1,1257 @@
+# spaCy NLP Service Prototype — Implementation Plan
+
+> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
+
+**Goal:** Build `nlp-service/` — a FastAPI service that parses free-text search queries into structured extractions (person names, role, dates, keywords) using spaCy, as a drop-in replacement for the current Ollama service.
+
+**Architecture:** Five-step pipeline (NER → role detection → date parsing → keyword extraction → assembly) in `extractor.py`. `main.py` exposes `/parse` and `/health` via FastAPI. Models baked into the Docker image at build time — no volume needed.
+
+**Tech Stack:** Python 3.11, FastAPI 0.115, spaCy 3.8 (`de_core_news_sm` / `en_core_web_sm` / `es_core_news_sm`), dateparser 1.2, pytest
+
+---
+
+## File Map
+
+| File | Responsibility |
+|---|---|
+| `nlp-service/models.py` | Pydantic request/response types — the extraction contract |
+| `nlp-service/extractor.py` | NLP pipeline: model loading + 5 extraction steps |
+| `nlp-service/main.py` | FastAPI app — `/parse`, `/health`, lifespan model loading |
+| `nlp-service/requirements.txt` | Python dependencies |
+| `nlp-service/Dockerfile` | Image — python:3.11-slim, models baked in, non-root user |
+| `nlp-service/CLAUDE.md` | Service-level docs |
+| `nlp-service/test_extractor.py` | Unit + integration tests for the pipeline |
+| `nlp-service/test_main.py` | HTTP contract tests for the FastAPI endpoints |
+
+---
+
+## Task 1: Scaffold — requirements.txt, CLAUDE.md, models.py
+
+**Files:**
+- Create: `nlp-service/requirements.txt`
+- Create: `nlp-service/CLAUDE.md`
+- Create: `nlp-service/models.py`
+- Create: `nlp-service/test_extractor.py` (skeleton only)
+
+- [ ] **Step 1: Create `nlp-service/requirements.txt`**
+
+```
+fastapi[standard]==0.115.6
+uvicorn[standard]==0.34.0
+spacy>=3.8,<4.0
+dateparser>=1.2,<2.0
+pytest>=8.0,<9.0
+httpx>=0.28,<1.0
+```
+
+- [ ] **Step 2: Create `nlp-service/CLAUDE.md`**
+
+```markdown
+# NLP Service
+
+Lightweight FastAPI service that parses free-text search queries into structured extractions,
+replacing Ollama for the Familienarchiv NL search feature.
+
+## Stack
+
+- Python 3.11, FastAPI 0.115, spaCy 3.8, dateparser 1.2
+
+## Endpoints
+
+- `POST /parse` — parse a free-text query, return extraction matching `OllamaExtraction` contract
+- `GET /health` — returns `{"status": "ok"}` when all models are loaded
+
+## Running locally
+
+\`\`\`bash
+pip install -r requirements.txt
+python -m spacy download de_core_news_sm en_core_web_sm es_core_news_sm
+uvicorn main:app --reload --port 8001
+
+curl -X POST http://localhost:8001/parse \
+  -H "Content-Type: application/json" \
+  -d '{"query": "Briefe von Opa Hermann an Marie vor 1920", "lang": "de"}'
+\`\`\`
+
+## Testing
+
+\`\`\`bash
+pytest -v
+\`\`\`
+
+## Design spec
+
+See `docs/superpowers/specs/2026-06-07-spacy-nlp-service-design.md`.
+
+## Notes
+
+This is a **prototype** for extraction quality evaluation. No docker-compose integration or
+Java-side changes in this iteration. The extraction contract matches `OllamaExtraction` in
+`backend/src/main/java/org/raddatz/familienarchiv/search/`.
+```
+
+- [ ] **Step 3: Write the failing test for Pydantic models**
+
+Create `nlp-service/test_extractor.py`:
+
+```python
+import pytest
+from pydantic import ValidationError
+
+
+# ── Models ──────────────────────────────────────────────────────────────────
+
+def test_parse_request_valid():
+    from models import ParseRequest
+    req = ParseRequest(query="Briefe von Opa", lang="de")
+    assert req.query == "Briefe von Opa"
+    assert req.lang == "de"
+
+
+def test_parse_request_rejects_unknown_lang():
+    from models import ParseRequest
+    with pytest.raises(ValidationError):
+        ParseRequest(query="Letters from grandpa", lang="fr")
+
+
+def test_parse_response_serializes_nulls():
+    from models import ParseResponse
+    resp = ParseResponse(
+        personNames=["Opa"],
+        personRole="sender",
+        dateFrom=None,
+        dateTo="1920-12-31",
+        keywords=["brief"],
+        rawQuery="Briefe von Opa",
+    )
+    data = resp.model_dump()
+    assert data["dateFrom"] is None
+    assert data["dateTo"] == "1920-12-31"
+    assert data["personRole"] == "sender"
+```
+
+- [ ] **Step 4: Run to confirm failure**
+
+```bash
+cd nlp-service
+pip install -r requirements.txt
+pytest test_extractor.py::test_parse_request_valid -v
+```
+
+Expected: `ModuleNotFoundError: No module named 'models'`
+
+- [ ] **Step 5: Create `nlp-service/models.py`**
+
+```python
+from __future__ import annotations
+from typing import Literal
+from pydantic import BaseModel
+
+
+class ParseRequest(BaseModel):
+    query: str
+    lang: Literal["de", "en", "es"]
+
+
+class ParseResponse(BaseModel):
+    personNames: list[str]
+    personRole: Literal["sender", "receiver", "any"]
+    dateFrom: str | None
+    dateTo: str | None
+    keywords: list[str]
+    rawQuery: str
+```
+
+- [ ] **Step 6: Run tests to confirm they pass**
+
+```bash
+pytest test_extractor.py::test_parse_request_valid \
+       test_extractor.py::test_parse_request_rejects_unknown_lang \
+       test_extractor.py::test_parse_response_serializes_nulls -v
+```
+
+Expected: `3 passed`
+
+- [ ] **Step 7: Commit**
+
+```bash
+git add nlp-service/
+git commit -m "feat(nlp-service): scaffold — models, requirements, CLAUDE.md"
+```
+
+---
+
+## Task 2: spaCy model loading
+
+**Files:**
+- Create: `nlp-service/extractor.py`
+- Modify: `nlp-service/test_extractor.py`
+
+Before running these tests, the three spaCy models must be installed:
+
+```bash
+python -m spacy download de_core_news_sm en_core_web_sm es_core_news_sm
+```
+
+- [ ] **Step 1: Write the failing tests**
+
+Append to `nlp-service/test_extractor.py`:
+
+```python
+# ── Model loading ────────────────────────────────────────────────────────────
+
+import pytest
+
+
+@pytest.fixture(scope="session")
+def nlp_de():
+    from extractor import get_nlp
+    return get_nlp("de")
+
+
+@pytest.fixture(scope="session")
+def nlp_en():
+    from extractor import get_nlp
+    return get_nlp("en")
+
+
+@pytest.fixture(scope="session")
+def nlp_es():
+    from extractor import get_nlp
+    return get_nlp("es")
+
+
+def test_get_nlp_de_loads(nlp_de):
+    doc = nlp_de("Test")
+    assert doc is not None
+
+
+def test_get_nlp_en_loads(nlp_en):
+    doc = nlp_en("Test")
+    assert doc is not None
+
+
+def test_get_nlp_es_loads(nlp_es):
+    doc = nlp_es("Prueba")
+    assert doc is not None
+
+
+def test_get_nlp_unknown_lang_raises():
+    from extractor import get_nlp
+    with pytest.raises(ValueError, match="Unsupported language"):
+        get_nlp("fr")
+```
+
+- [ ] **Step 2: Run to confirm failure**
+
+```bash
+pytest test_extractor.py::test_get_nlp_de_loads -v
+```
+
+Expected: `ModuleNotFoundError: No module named 'extractor'`
+
+- [ ] **Step 3: Create `nlp-service/extractor.py` with model loading**
+
+```python
+from __future__ import annotations
+
+import re
+from datetime import date
+
+import dateparser
+import spacy
+from spacy.language import Language
+
+from models import ParseResponse
+
+# ── Language model registry ──────────────────────────────────────────────────
+
+_MODEL_NAMES: dict[str, str] = {
+    "de": "de_core_news_sm",
+    "en": "en_core_web_sm",
+    "es": "es_core_news_sm",
+}
+
+_nlp_cache: dict[str, Language] = {}
+
+
+def get_nlp(lang: str) -> Language:
+    if lang not in _MODEL_NAMES:
+        raise ValueError(f"Unsupported language: {lang!r}. Valid: {list(_MODEL_NAMES)}")
+    if lang not in _nlp_cache:
+        _nlp_cache[lang] = spacy.load(_MODEL_NAMES[lang])
+    return _nlp_cache[lang]
+
+
+def load_all_models() -> None:
+    for lang in _MODEL_NAMES:
+        get_nlp(lang)
+```
+
+- [ ] **Step 4: Run tests to confirm they pass**
+
+```bash
+pytest test_extractor.py::test_get_nlp_de_loads \
+       test_extractor.py::test_get_nlp_en_loads \
+       test_extractor.py::test_get_nlp_es_loads \
+       test_extractor.py::test_get_nlp_unknown_lang_raises -v
+```
+
+Expected: `4 passed`
+
+- [ ] **Step 5: Commit**
+
+```bash
+git add nlp-service/extractor.py nlp-service/test_extractor.py
+git commit -m "feat(nlp-service): spaCy model loading with get_nlp/load_all_models"
+```
+
+---
+
+## Task 3: Person name extraction (NER)
+
+**Files:**
+- Modify: `nlp-service/extractor.py`
+- Modify: `nlp-service/test_extractor.py`
+
+- [ ] **Step 1: Write the failing tests**
+
+Append to `nlp-service/test_extractor.py`:
+
+```python
+# ── Person name extraction ───────────────────────────────────────────────────
+
+def _make_doc_with_ents(nlp, text: str, char_ents: list[tuple[int, int, str]]):
+    """Create a Doc with manually injected entity spans (no NER model needed)."""
+    doc = nlp.make_doc(text)
+    spans = [doc.char_span(s, e, label=lbl) for s, e, lbl in char_ents]
+    doc.ents = [sp for sp in spans if sp is not None]
+    return doc
+
+
+def test_extract_person_names_two_persons(nlp_de):
+    from extractor import extract_person_names
+    # "Briefe von Opa Hermann an Marie"
+    #  0123456789012345678901234567890
+    #            1111111111222222222233
+    # "Opa Hermann" = 11..22, "Marie" = 26..31
+    doc = _make_doc_with_ents(nlp_de, "Briefe von Opa Hermann an Marie", [
+        (11, 22, "PER"),
+        (26, 31, "PER"),
+    ])
+    assert extract_person_names(doc) == ["Opa Hermann", "Marie"]
+
+
+def test_extract_person_names_preserves_order(nlp_de):
+    from extractor import extract_person_names
+    # Reversed: "Marie von Opa" — Marie comes first in text
+    # "Marie" = 0..5, "Opa" = 10..13
+    doc = _make_doc_with_ents(nlp_de, "Marie von Opa", [
+        (0, 5, "PER"),
+        (10, 13, "PER"),
+    ])
+    assert extract_person_names(doc) == ["Marie", "Opa"]
+
+
+def test_extract_person_names_empty(nlp_de):
+    from extractor import extract_person_names
+    doc = _make_doc_with_ents(nlp_de, "Briefe aus dem Krieg", [])
+    assert extract_person_names(doc) == []
+
+
+def test_extract_person_names_ignores_non_per(nlp_de):
+    from extractor import extract_person_names
+    # DATE entity should not appear in personNames
+    doc = _make_doc_with_ents(nlp_de, "Briefe 1920", [(7, 11, "DATE")])
+    assert extract_person_names(doc) == []
+```
+
+- [ ] **Step 2: Run to confirm failure**
+
+```bash
+pytest test_extractor.py::test_extract_person_names_two_persons -v
+```
+
+Expected: `ImportError: cannot import name 'extract_person_names' from 'extractor'`
+
+- [ ] **Step 3: Add `extract_person_names` to `extractor.py`**
+
+Add after the model loading section:
+
+```python
+# ── Step 1: Person name extraction ──────────────────────────────────────────
+
+def extract_person_names(doc) -> list[str]:
+    """Return PER entity texts in left-to-right span order."""
+    return [ent.text for ent in doc.ents if ent.label_ == "PER"]
+```
+
+- [ ] **Step 4: Run tests to confirm they pass**
+
+```bash
+pytest test_extractor.py::test_extract_person_names_two_persons \
+       test_extractor.py::test_extract_person_names_preserves_order \
+       test_extractor.py::test_extract_person_names_empty \
+       test_extractor.py::test_extract_person_names_ignores_non_per -v
+```
+
+Expected: `4 passed`
+
+- [ ] **Step 5: Commit**
+
+```bash
+git add nlp-service/extractor.py nlp-service/test_extractor.py
+git commit -m "feat(nlp-service): NER person name extraction"
+```
+
+---
+
+## Task 4: Role detection
+
+**Files:**
+- Modify: `nlp-service/extractor.py`
+- Modify: `nlp-service/test_extractor.py`
+
+Role is only meaningful when exactly one PER entity is found. The function checks:
+1. Dependency-tree children of the PER span's root with `dep_` in `("case", "prep", "mo")`
+2. Fallback: the token immediately before the span
+
+- [ ] **Step 1: Write the failing tests**
+
+Append to `nlp-service/test_extractor.py`:
+
+```python
+# ── Role detection ───────────────────────────────────────────────────────────
+
+def test_role_sender_von(nlp_de):
+    from extractor import detect_person_role
+    # "Briefe von Marie" — "von" immediately before "Marie"
+    # B=0..6, ' '=6, v=7..10, ' '=10, M=11..16
+    doc = _make_doc_with_ents(nlp_de, "Briefe von Marie", [(11, 16, "PER")])
+    per_spans = list(doc.ents)
+    assert detect_person_role(doc, per_spans, "de") == "sender"
+
+
+def test_role_receiver_an(nlp_de):
+    from extractor import detect_person_role
+    # "Briefe an Marie" — "an" immediately before "Marie"
+    # B=0..6, ' '=6, a=7..9, ' '=9, M=10..15
+    doc = _make_doc_with_ents(nlp_de, "Briefe an Marie", [(10, 15, "PER")])
+    per_spans = list(doc.ents)
+    assert detect_person_role(doc, per_spans, "de") == "receiver"
+
+
+def test_role_two_persons_returns_any(nlp_de):
+    from extractor import detect_person_role
+    # "von Opa an Marie" — two PER spans → always "any"
+    # v=0..3, ' '=3, O=4..7, ' '=7, a=8..10, ' '=10, M=11..16
+    doc = _make_doc_with_ents(nlp_de, "von Opa an Marie", [
+        (4, 7, "PER"),
+        (11, 16, "PER"),
+    ])
+    per_spans = list(doc.ents)
+    assert detect_person_role(doc, per_spans, "de") == "any"
+
+
+def test_role_no_prep_returns_any(nlp_de):
+    from extractor import detect_person_role
+    # "Briefe Marie" — no preposition
+    # B=0..6, ' '=6, M=7..12
+    doc = _make_doc_with_ents(nlp_de, "Briefe Marie", [(7, 12, "PER")])
+    per_spans = list(doc.ents)
+    assert detect_person_role(doc, per_spans, "de") == "any"
+
+
+def test_role_empty_returns_any(nlp_de):
+    from extractor import detect_person_role
+    doc = _make_doc_with_ents(nlp_de, "Briefe 1920", [])
+    assert detect_person_role(doc, [], "de") == "any"
+
+
+def test_role_sender_from_english(nlp_en):
+    from extractor import detect_person_role
+    # "letters from Marie" — "from" before "Marie"
+    # l=0..7, ' '=7, f=8..12, ' '=12, M=13..18
+    doc = _make_doc_with_ents(nlp_en, "letters from Marie", [(13, 18, "PER")])
+    per_spans = list(doc.ents)
+    assert detect_person_role(doc, per_spans, "en") == "sender"
+
+
+def test_role_receiver_to_english(nlp_en):
+    from extractor import detect_person_role
+    # "letters to Marie"
+    # l=0..7, ' '=7, t=8..10, ' '=10, M=11..16
+    doc = _make_doc_with_ents(nlp_en, "letters to Marie", [(11, 16, "PER")])
+    per_spans = list(doc.ents)
+    assert detect_person_role(doc, per_spans, "en") == "receiver"
+```
+
+- [ ] **Step 2: Run to confirm failure**
+
+```bash
+pytest test_extractor.py::test_role_sender_von -v
+```
+
+Expected: `ImportError: cannot import name 'detect_person_role' from 'extractor'`
+
+- [ ] **Step 3: Add role detection constants and function to `extractor.py`**
+
+Add after `extract_person_names`:
+
+```python
+# ── Step 2: Role detection ───────────────────────────────────────────────────
+
+_SENDER_PREPS: dict[str, frozenset[str]] = {
+    "de": frozenset({"von", "vom"}),
+    "en": frozenset({"from", "by"}),
+    "es": frozenset({"de", "por"}),
+}
+
+_RECEIVER_PREPS: dict[str, frozenset[str]] = {
+    "de": frozenset({"an", "nach", "für"}),
+    "en": frozenset({"to", "for"}),
+    "es": frozenset({"para", "a"}),
+}
+
+
+def detect_person_role(doc, per_spans: list, lang: str) -> str:
+    """Return 'sender', 'receiver', or 'any'.
+
+    Only meaningful for single-PER queries — two-person queries always return
+    'any' because Java derives direction from list position.
+    """
+    if len(per_spans) != 1:
+        return "any"
+
+    span = per_spans[0]
+    root = span.root
+    sender = _SENDER_PREPS[lang]
+    receiver = _RECEIVER_PREPS[lang]
+
+    # Primary: dependency-tree children of the PER root
+    for child in root.children:
+        if child.dep_ in ("case", "prep", "mo"):
+            if child.lower_ in sender:
+                return "sender"
+            if child.lower_ in receiver:
+                return "receiver"
+
+    # Fallback: token immediately before the span start
+    if span.start > 0:
+        prev = doc[span.start - 1]
+        if prev.lower_ in sender:
+            return "sender"
+        if prev.lower_ in receiver:
+            return "receiver"
+
+    return "any"
+```
+
+- [ ] **Step 4: Run tests to confirm they pass**
+
+```bash
+pytest test_extractor.py::test_role_sender_von \
+       test_extractor.py::test_role_receiver_an \
+       test_extractor.py::test_role_two_persons_returns_any \
+       test_extractor.py::test_role_no_prep_returns_any \
+       test_extractor.py::test_role_empty_returns_any \
+       test_extractor.py::test_role_sender_from_english \
+       test_extractor.py::test_role_receiver_to_english -v
+```
+
+Expected: `7 passed`
+
+- [ ] **Step 5: Commit**
+
+```bash
+git add nlp-service/extractor.py nlp-service/test_extractor.py
+git commit -m "feat(nlp-service): role detection (sender/receiver/any)"
+```
+
+---
+
+## Task 5: Date parsing
+
+**Files:**
+- Modify: `nlp-service/extractor.py`
+- Modify: `nlp-service/test_extractor.py`
+
+Direction is detected from the token immediately before each DATE span. For "zwischen/between/entre", both DATE spans form the range (sorted so earlier = `dateFrom`). A bare year with no direction token produces a closed year-range (`dateFrom` = Jan 1, `dateTo` = Dec 31).
+
+Note: "nach" appears in both `_RECEIVER_PREPS["de"]` and the date-after set. This is safe — role detection only examines tokens before PER spans; date parsing only examines tokens before DATE spans. They operate on different span types.
+
+- [ ] **Step 1: Write the failing tests**
+
+Append to `nlp-service/test_extractor.py`:
+
+```python
+# ── Date parsing ─────────────────────────────────────────────────────────────
+
+def test_date_vor_1920(nlp_de):
+    from extractor import extract_dates
+    # "Briefe vor 1920" — "1920" at chars 11..15
+    doc = _make_doc_with_ents(nlp_de, "Briefe vor 1920", [(11, 15, "DATE")])
+    date_from, date_to = extract_dates(doc, "de")
+    assert date_from is None
+    assert date_to == "1920-12-31"
+
+
+def test_date_nach_1900(nlp_de):
+    from extractor import extract_dates
+    # "Briefe nach 1900" — "1900" at chars 12..16
+    doc = _make_doc_with_ents(nlp_de, "Briefe nach 1900", [(12, 16, "DATE")])
+    date_from, date_to = extract_dates(doc, "de")
+    assert date_from == "1900-01-01"
+    assert date_to is None
+
+
+def test_date_zwischen_1900_und_1920(nlp_de):
+    from extractor import extract_dates
+    # "zwischen 1900 und 1920"
+    # z=0..8, ' '=8, 1900=9..13, ' '=13, u=14..17, ' '=17, 1920=18..22
+    doc = _make_doc_with_ents(nlp_de, "zwischen 1900 und 1920", [
+        (9, 13, "DATE"),
+        (18, 22, "DATE"),
+    ])
+    date_from, date_to = extract_dates(doc, "de")
+    assert date_from == "1900-01-01"
+    assert date_to == "1920-12-31"
+
+
+def test_date_bare_year_makes_range(nlp_de):
+    from extractor import extract_dates
+    # "Briefe 1920" — no direction token → year-range
+    # B=0..6, ' '=6, 1920=7..11
+    doc = _make_doc_with_ents(nlp_de, "Briefe 1920", [(7, 11, "DATE")])
+    date_from, date_to = extract_dates(doc, "de")
+    assert date_from == "1920-01-01"
+    assert date_to == "1920-12-31"
+
+
+def test_date_no_date_entity(nlp_de):
+    from extractor import extract_dates
+    doc = _make_doc_with_ents(nlp_de, "Briefe von Opa", [])
+    date_from, date_to = extract_dates(doc, "de")
+    assert date_from is None
+    assert date_to is None
+
+
+def test_date_before_english(nlp_en):
+    from extractor import extract_dates
+    # "letters before 1920" — "1920" at chars 15..19
+    doc = _make_doc_with_ents(nlp_en, "letters before 1920", [(15, 19, "DATE")])
+    date_from, date_to = extract_dates(doc, "en")
+    assert date_from is None
+    assert date_to == "1920-12-31"
+
+
+def test_date_after_english(nlp_en):
+    from extractor import extract_dates
+    # "letters after 1900" — "1900" at chars 14..18
+    doc = _make_doc_with_ents(nlp_en, "letters after 1900", [(14, 18, "DATE")])
+    date_from, date_to = extract_dates(doc, "en")
+    assert date_from == "1900-01-01"
+    assert date_to is None
+```
+
+- [ ] **Step 2: Run to confirm failure**
+
+```bash
+pytest test_extractor.py::test_date_vor_1920 -v
+```
+
+Expected: `ImportError: cannot import name 'extract_dates' from 'extractor'`
+
+- [ ] **Step 3: Add date parsing to `extractor.py`**
+
+Add after `detect_person_role`:
+
+```python
+# ── Step 3: Date parsing ─────────────────────────────────────────────────────
+
+_YEAR_RE = re.compile(r"^\d{4}$")
+
+_DATE_BEFORE: dict[str, frozenset[str]] = {
+    "de": frozenset({"vor"}),
+    "en": frozenset({"before"}),
+    "es": frozenset({"antes"}),
+}
+
+_DATE_AFTER: dict[str, frozenset[str]] = {
+    "de": frozenset({"nach"}),
+    "en": frozenset({"after"}),
+    "es": frozenset({"después", "despues"}),
+}
+
+_DATE_BETWEEN: dict[str, frozenset[str]] = {
+    "de": frozenset({"zwischen"}),
+    "en": frozenset({"between"}),
+    "es": frozenset({"entre"}),
+}
+
+
+def _parse_date_text(text: str, lang: str) -> date | None:
+    text = text.strip()
+    if _YEAR_RE.match(text):
+        year = int(text)
+        if 1000 < year < 3000:
+            return date(year, 1, 1)
+    parsed = dateparser.parse(
+        text,
+        languages=[lang],
+        settings={"PREFER_DAY_OF_MONTH": "first", "RETURN_AS_TIMEZONE_AWARE": False},
+    )
+    return parsed.date() if parsed else None
+
+
+def _year_end(d: date) -> date:
+    """If d is Jan 1, return Dec 31 of the same year (year-only boundary)."""
+    if d.month == 1 and d.day == 1:
+        return date(d.year, 12, 31)
+    return d
+
+
+def extract_dates(doc, lang: str) -> tuple[str | None, str | None]:
+    """Return (date_from, date_to) as ISO strings or None."""
+    date_spans = [ent for ent in doc.ents if ent.label_ == "DATE"]
+    if not date_spans:
+        return None, None
+
+    between_tokens = _DATE_BETWEEN[lang]
+    before_tokens = _DATE_BEFORE[lang]
+    after_tokens = _DATE_AFTER[lang]
+
+    # "zwischen X und Y" / "between X and Y" — two DATE spans form a range
+    has_between = any(tok.lower_ in between_tokens for tok in doc)
+    if has_between and len(date_spans) >= 2:
+        parsed = []
+        for span in date_spans[:2]:
+            d = _parse_date_text(span.text, lang)
+            if d:
+                parsed.append(d)
+        if len(parsed) == 2:
+            parsed.sort()
+            return parsed[0].isoformat(), _year_end(parsed[1]).isoformat()
+
+    # Single DATE span — use direction token
+    span = date_spans[0]
+    d = _parse_date_text(span.text, lang)
+    if not d:
+        return None, None
+
+    prev_lower = doc[span.start - 1].lower_ if span.start > 0 else ""
+
+    if prev_lower in before_tokens:
+        return None, _year_end(d).isoformat()
+    if prev_lower in after_tokens:
+        return d.isoformat(), None
+    # Bare year/date — closed year-range
+    return d.isoformat(), _year_end(d).isoformat()
+```
+
+- [ ] **Step 4: Run tests to confirm they pass**
+
+```bash
+pytest test_extractor.py::test_date_vor_1920 \
+       test_extractor.py::test_date_nach_1900 \
+       test_extractor.py::test_date_zwischen_1900_und_1920 \
+       test_extractor.py::test_date_bare_year_makes_range \
+       test_extractor.py::test_date_no_date_entity \
+       test_extractor.py::test_date_before_english \
+       test_extractor.py::test_date_after_english -v
+```
+
+Expected: `7 passed`
+
+- [ ] **Step 5: Commit**
+
+```bash
+git add nlp-service/extractor.py nlp-service/test_extractor.py
+git commit -m "feat(nlp-service): date range extraction with direction detection"
+```
+
+---
+
+## Task 6: Keyword extraction
+
+**Files:**
+- Modify: `nlp-service/extractor.py`
+- Modify: `nlp-service/test_extractor.py`
+
+Keywords are POS-filtered content words (NOUN or PROPN, non-stop, length ≥ 3, not inside any NER span). These are passed to Java's `resolveTags()` which fuzzy-matches them against the tag table — no tag lookup in Python.
+
+- [ ] **Step 1: Write the failing tests**
+
+Append to `nlp-service/test_extractor.py`:
+
+```python
+# ── Keyword extraction ───────────────────────────────────────────────────────
+
+def test_keywords_extracts_nouns(nlp_de):
+    from extractor import extract_keywords
+    # Use real NLP for POS tags; disable NER to control entities manually
+    doc = nlp_de("Briefe aus dem Krieg", disable=["ner"])
+    keywords = extract_keywords(doc, [])
+    # "Brief" (NOUN, lemma "Brief") and "Krieg" (NOUN) should appear
+    assert "brief" in keywords
+    assert "krieg" in keywords
+
+
+def test_keywords_excludes_stopwords(nlp_de):
+    from extractor import extract_keywords
+    doc = nlp_de("Briefe aus dem Krieg", disable=["ner"])
+    keywords = extract_keywords(doc, [])
+    # "dem" is a stopword article (DET) — must not appear
+    assert "dem" not in keywords
+
+
+def test_keywords_excludes_per_ner_spans(nlp_de):
+    from extractor import extract_keywords
+    # Run full NLP so POS tagger fires, then inject PER span over "Hermann"
+    doc = nlp_de("Briefe von Hermann")
+    per_span = doc.char_span(11, 18, label="PER")  # "Hermann" = 11..18
+    if per_span:
+        doc.ents = [per_span]
+        keywords = extract_keywords(doc, list(doc.ents))
+        assert "hermann" not in keywords
+
+
+def test_keywords_excludes_short_lemmas(nlp_de):
+    from extractor import extract_keywords
+    # Single-letter / two-letter words should be excluded (length < 3)
+    doc = nlp_de("Briefe an ihn", disable=["ner"])
+    keywords = extract_keywords(doc, [])
+    assert "ihn" not in keywords
+
+
+def test_keywords_deduplicates(nlp_de):
+    from extractor import extract_keywords
+    doc = nlp_de("Brief Brief Krieg", disable=["ner"])
+    keywords = extract_keywords(doc, [])
+    assert keywords.count("brief") == 1
+```
+
+- [ ] **Step 2: Run to confirm failure**
+
+```bash
+pytest test_extractor.py::test_keywords_extracts_nouns -v
+```
+
+Expected: `ImportError: cannot import name 'extract_keywords' from 'extractor'`
+
+- [ ] **Step 3: Add keyword extraction to `extractor.py`**
+
+Add after `extract_dates`:
+
+```python
+# ── Step 4: Keyword extraction ───────────────────────────────────────────────
+
+def extract_keywords(doc, excluded_spans: list) -> list[str]:
+    """Return lowercased lemmas of content words not inside any NER span."""
+    excluded_indices: set[int] = set()
+    for span in excluded_spans:
+        excluded_indices.update(range(span.start, span.end))
+
+    seen: set[str] = set()
+    keywords: list[str] = []
+    for token in doc:
+        if token.i in excluded_indices:
+            continue
+        if token.pos_ not in ("NOUN", "PROPN"):
+            continue
+        if token.is_stop:
+            continue
+        lemma = token.lemma_.lower()
+        if len(lemma) < 3:
+            continue
+        if lemma not in seen:
+            seen.add(lemma)
+            keywords.append(lemma)
+
+    return keywords
+```
+
+- [ ] **Step 4: Run tests to confirm they pass**
+
+```bash
+pytest test_extractor.py::test_keywords_extracts_nouns \
+       test_extractor.py::test_keywords_excludes_stopwords \
+       test_extractor.py::test_keywords_excludes_per_ner_spans \
+       test_extractor.py::test_keywords_excludes_short_lemmas \
+       test_extractor.py::test_keywords_deduplicates -v
+```
+
+Expected: `5 passed`
+
+- [ ] **Step 5: Commit**
+
+```bash
+git add nlp-service/extractor.py nlp-service/test_extractor.py
+git commit -m "feat(nlp-service): keyword extraction (POS-filtered, deduped lemmas)"
+```
+
+---
+
+## Task 7: Full `extract()` function
+
+**Files:**
+- Modify: `nlp-service/extractor.py`
+- Modify: `nlp-service/test_extractor.py`
+
+This assembles all steps. Tests here use **real NLP** (no synthetic docs) to validate actual extraction quality.
+
+- [ ] **Step 1: Write the failing tests**
+
+Append to `nlp-service/test_extractor.py`:
+
+```python
+# ── Full extract() pipeline ──────────────────────────────────────────────────
+
+def test_extract_dates_de():
+    from extractor import extract
+    result = extract("Briefe vor 1920", "de")
+    assert result.dateFrom is None
+    assert result.dateTo == "1920-12-31"
+    assert result.rawQuery == "Briefe vor 1920"
+    assert result.personNames == []
+    assert result.personRole == "any"
+
+
+def test_extract_keywords_from_topic_de():
+    from extractor import extract
+    result = extract("Briefe aus dem Krieg", "de")
+    assert "krieg" in result.keywords
+    assert result.dateFrom is None
+    assert result.dateTo is None
+
+
+def test_extract_dates_en():
+    from extractor import extract
+    result = extract("letters before 1920", "en")
+    assert result.dateTo == "1920-12-31"
+    assert result.dateFrom is None
+
+
+def test_extract_dates_es():
+    from extractor import extract
+    result = extract("cartas antes de 1920", "es")
+    assert result.dateTo == "1920-12-31"
+    assert result.dateFrom is None
+
+
+def test_extract_rawquery_echoed():
+    from extractor import extract
+    q = "Texte über Weihnachten"
+    result = extract(q, "de")
+    assert result.rawQuery == q
+
+
+def test_extract_response_fields_are_complete():
+    from extractor import extract
+    result = extract("Briefe 1900", "de")
+    assert isinstance(result.personNames, list)
+    assert result.personRole in ("sender", "receiver", "any")
+    assert isinstance(result.keywords, list)
+    assert result.rawQuery == "Briefe 1900"
+```
+
+- [ ] **Step 2: Run to confirm failure**
+
+```bash
+pytest test_extractor.py::test_extract_dates_de -v
+```
+
+Expected: `ImportError: cannot import name 'extract' from 'extractor'`
+
+- [ ] **Step 3: Add `extract()` to `extractor.py`**
+
+Add at the bottom of `extractor.py`:
+
+```python
+# ── Step 5: Assembly ─────────────────────────────────────────────────────────
+
+def extract(query: str, lang: str) -> ParseResponse:
+    """Run the full NLP pipeline and return a ParseResponse."""
+    nlp = get_nlp(lang)
+    doc = nlp(query)
+
+    per_spans = [ent for ent in doc.ents if ent.label_ == "PER"]
+
+    person_names = extract_person_names(doc)
+    person_role = detect_person_role(doc, per_spans, lang)
+    date_from, date_to = extract_dates(doc, lang)
+    keywords = extract_keywords(doc, list(doc.ents))
+
+    return ParseResponse(
+        personNames=person_names,
+        personRole=person_role,
+        dateFrom=date_from,
+        dateTo=date_to,
+        keywords=keywords,
+        rawQuery=query,
+    )
+```
+
+- [ ] **Step 4: Run tests to confirm they pass**
+
+```bash
+pytest test_extractor.py::test_extract_dates_de \
+       test_extractor.py::test_extract_keywords_from_topic_de \
+       test_extractor.py::test_extract_dates_en \
+       test_extractor.py::test_extract_dates_es \
+       test_extractor.py::test_extract_rawquery_echoed \
+       test_extractor.py::test_extract_response_fields_are_complete -v
+```
+
+Expected: `6 passed`
+
+- [ ] **Step 5: Run the full test suite to confirm no regressions**
+
+```bash
+pytest test_extractor.py -v
+```
+
+Expected: all tests pass
+
+- [ ] **Step 6: Commit**
+
+```bash
+git add nlp-service/extractor.py nlp-service/test_extractor.py
+git commit -m "feat(nlp-service): full extract() pipeline — assembles all steps"
+```
+
+---
+
+## Task 8: FastAPI app
+
+**Files:**
+- Create: `nlp-service/main.py`
+- Create: `nlp-service/test_main.py`
+
+- [ ] **Step 1: Write the failing tests**
+
+Create `nlp-service/test_main.py`:
+
+```python
+import pytest
+from fastapi.testclient import TestClient
+
+
+@pytest.fixture(scope="session")
+def client():
+    from main import app
+    with TestClient(app) as c:
+        yield c
+
+
+def test_health(client):
+    response = client.get("/health")
+    assert response.status_code == 200
+    assert response.json() == {"status": "ok"}
+
+
+def test_parse_returns_200_with_all_fields(client):
+    response = client.post("/parse", json={"query": "Briefe vor 1920", "lang": "de"})
+    assert response.status_code == 200
+    data = response.json()
+    assert "personNames" in data
+    assert "personRole" in data
+    assert data["personRole"] in ("sender", "receiver", "any")
+    assert "dateFrom" in data
+    assert "dateTo" in data
+    assert "keywords" in data
+    assert "rawQuery" in data
+    assert data["rawQuery"] == "Briefe vor 1920"
+    assert data["dateTo"] == "1920-12-31"
+
+
+def test_parse_unknown_lang_returns_422(client):
+    response = client.post("/parse", json={"query": "test", "lang": "fr"})
+    assert response.status_code == 422
+
+
+def test_parse_missing_query_returns_422(client):
+    response = client.post("/parse", json={"lang": "de"})
+    assert response.status_code == 422
+
+
+def test_parse_all_languages(client):
+    cases = [
+        ("de", "Briefe vor 1920"),
+        ("en", "letters before 1920"),
+        ("es", "cartas antes de 1920"),
+    ]
+    for lang, query in cases:
+        response = client.post("/parse", json={"query": query, "lang": lang})
+        assert response.status_code == 200, f"Failed for lang={lang}"
+        assert response.json()["dateTo"] == "1920-12-31", f"Wrong dateTo for lang={lang}"
+```
+
+- [ ] **Step 2: Run to confirm failure**
+
+```bash
+pytest test_main.py::test_health -v
+```
+
+Expected: `ModuleNotFoundError: No module named 'main'`
+
+- [ ] **Step 3: Create `nlp-service/main.py`**
+
+```python
+import logging
+from contextlib import asynccontextmanager
+
+from fastapi import FastAPI, HTTPException
+
+from extractor import extract, load_all_models
+from models import ParseRequest, ParseResponse
+
+logger = logging.getLogger(__name__)
+
+
+@asynccontextmanager
+async def lifespan(app: FastAPI):
+    logger.info("Loading spaCy models...")
+    load_all_models()
+    logger.info("All models ready.")
+    yield
+
+
+app = FastAPI(lifespan=lifespan)
+
+
+@app.get("/health")
+def health() -> dict:
+    return {"status": "ok"}
+
+
+@app.post("/parse", response_model=ParseResponse)
+def parse(request: ParseRequest) -> ParseResponse:
+    try:
+        return extract(request.query, request.lang)
+    except Exception as exc:
+        logger.exception("Extraction pipeline failed")
+        raise HTTPException(status_code=500, detail=str(exc)) from exc
+```
+
+- [ ] **Step 4: Run tests to confirm they pass**
+
+```bash
+pytest test_main.py -v
+```
+
+Expected: `5 passed`
+
+- [ ] **Step 5: Run the full test suite**
+
+```bash
+pytest -v
+```
+
+Expected: all tests pass
+
+- [ ] **Step 6: Smoke-test the running service**
+
+```bash
+uvicorn main:app --reload --port 8001 &
+sleep 2
+
+curl -s http://localhost:8001/health
+# Expected: {"status":"ok"}
+
+curl -s -X POST http://localhost:8001/parse \
+  -H "Content-Type: application/json" \
+  -d '{"query": "Briefe von Opa Hermann an Marie vor 1920", "lang": "de"}' | python3 -m json.tool
+
+# Expected (spaCy may or may not catch "Opa Hermann"/"Marie" as PER):
+# {
+#   "personNames": [...],
+#   "personRole": "any",
+#   "dateFrom": null,
+#   "dateTo": "1920-12-31",
+#   "keywords": ["brief"],
+#   "rawQuery": "Briefe von Opa Hermann an Marie vor 1920"
+# }
+
+kill %1
+```
+
+- [ ] **Step 7: Commit**
+
+```bash
+git add nlp-service/main.py nlp-service/test_main.py
+git commit -m "feat(nlp-service): FastAPI app with /parse and /health endpoints"
+```
+
+---
+
+## Task 9: Dockerfile
+
+**Files:**
+- Create: `nlp-service/Dockerfile`
+
+- [ ] **Step 1: Create `nlp-service/Dockerfile`**
+
+```dockerfile
+FROM python:3.11-slim
+
+WORKDIR /app
+
+RUN apt-get update && apt-get install -y --no-install-recommends \
+    curl \
+    && rm -rf /var/lib/apt/lists/*
+
+COPY requirements.txt .
+RUN pip install --no-cache-dir -r requirements.txt
+
+# Bake models into the image — no volume needed, ~350 MB total
+RUN python -m spacy download de_core_news_sm \
+ && python -m spacy download en_core_web_sm \
+ && python -m spacy download es_core_news_sm
+
+COPY . .
+
+RUN useradd --no-create-home --shell /usr/sbin/nologin --uid 1001 nlp \
+    && chown -R nlp:nlp /app
+
+USER nlp
+
+EXPOSE 8001
+
+HEALTHCHECK --interval=30s --timeout=5s --start-period=15s --retries=3 \
+    CMD curl -f http://localhost:8001/health || exit 1
+
+CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8001"]
+```
+
+- [ ] **Step 2: Build the image**
+
+```bash
+cd nlp-service
+docker build -t nlp-service:prototype .
+```
+
+Expected: build completes, image ~350 MB
+
+- [ ] **Step 3: Run and smoke-test the container**
+
+```bash
+docker run --rm -d -p 8001:8001 --name nlp-test nlp-service:prototype
+sleep 5
+
+curl -s http://localhost:8001/health
+# Expected: {"status":"ok"}
+
+curl -s -X POST http://localhost:8001/parse \
+  -H "Content-Type: application/json" \
+  -d '{"query": "Briefe aus dem Krieg", "lang": "de"}' | python3 -m json.tool
+
+docker stop nlp-test
+```
+
+- [ ] **Step 4: Commit**
+
+```bash
+git add nlp-service/Dockerfile
+git commit -m "feat(nlp-service): Dockerfile — python:3.11-slim, models baked in"
+```