security(import): harden DocumentBuilderFactory against XXE in MassImportService #528

Open
opened 2026-05-11 20:13:22 +02:00 by marcel · 0 comments
Owner

Type: Security hardening
Priority: P0-critical — prod-reachable XML parsing of attacker-influenceable input
Source: review of #526 by Nora "NullX" Steiner (security_expert) — comment #8648, finding 4
Parent PR: #526 (mass-import bind mount → first prod-reachable use)

Summary

Explicitly disable DOCTYPE declarations, external entity resolution, XInclude, and entity-reference expansion on every DocumentBuilderFactory in the backend, starting with MassImportService. Cover the regression with a test and a Semgrep rule.

Context

MassImportService parses the ODS spreadsheet's inner XML via DocumentBuilderFactory (MassImportService.java:29). With #526 merged, the mass-import endpoint becomes reachable on production for the first time — the risk profile of any XXE bug in this parser goes from "dev-only" to "prod-reachable". An attacker who controls or substitutes the ODS source (e.g. supply-chain compromise of the zzfamilienarchiv*.ods file before it lands on the host) could exfiltrate any file the backend container can read.

POI 5.5.0 is recent enough that defaults are probably safe — but "probably" is not enough for prod-reachable XML parsing of attacker-influenceable input.

Required

  1. Locate every DocumentBuilderFactory.newInstance() call in the codebase (currently only MassImportService).
  2. Explicitly disable doctype declarations on the factory:
    DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
    dbf.setFeature("http://apache.org/xml/features/disallow-doctype-decl", true);
    dbf.setFeature("http://xml.org/sax/features/external-general-entities", false);
    dbf.setFeature("http://xml.org/sax/features/external-parameter-entities", false);
    dbf.setFeature("http://apache.org/xml/features/nonvalidating/load-external-dtd", false);
    dbf.setXIncludeAware(false);
    dbf.setExpandEntityReferences(false);
    
  3. Extract the hardened factory into a small helper so future XML parsers in this codebase inherit the same defaults.
  4. Permanent regression test: feed a tiny ODS-shaped XML containing a <!DOCTYPE foo [<!ENTITY xxe SYSTEM "file:///etc/hostname">]> block and an &xxe; reference; assert the parse rejects or ignores it.

Detection

Add the Semgrep rule from Nora's review comment to catch future regressions across all stacks:

rules:
  - id: dbf-xxe-default
    pattern: $X = DocumentBuilderFactory.newInstance();
    pattern-not-inside: |
      ...
      $X.setFeature("http://apache.org/xml/features/disallow-doctype-decl", true);
      ...
    message: "DocumentBuilderFactory without XXE protection"
    severity: WARNING

Acceptance criteria

  • All DocumentBuilderFactory call sites harden their factory with the features above
  • Failing-then-passing regression test demonstrating XXE is blocked
  • Semgrep rule committed and running in CI
  • PR notes whether POI 5.5.0 default already covered this (in which case the explicit hardening is defense-in-depth)

Linked NFRs

  • Security: XML parsers in the backend MUST disallow DOCTYPE declarations and external entity resolution.
  • Maintainability: XML-parser construction MUST go through a shared hardened helper so safe defaults are inherited.
  • Observability: Semgrep rule MUST run in CI and fail the build on regressions.

Definition of Ready

  • Problem statement clear
  • Required steps enumerated
  • Detection mechanism specified
  • Acceptance criteria testable

🤖 Generated with Claude Code during /implement on #526

**Type:** Security hardening **Priority:** P0-critical — prod-reachable XML parsing of attacker-influenceable input **Source:** review of #526 by Nora "NullX" Steiner (security_expert) — comment [#8648](https://git.raddatz.cloud/marcel/familienarchiv/pulls/526#issuecomment-8648), finding 4 **Parent PR:** #526 (mass-import bind mount → first prod-reachable use) ## Summary Explicitly disable DOCTYPE declarations, external entity resolution, XInclude, and entity-reference expansion on every `DocumentBuilderFactory` in the backend, starting with `MassImportService`. Cover the regression with a test and a Semgrep rule. ## Context `MassImportService` parses the ODS spreadsheet's inner XML via `DocumentBuilderFactory` (`MassImportService.java:29`). With #526 merged, the mass-import endpoint becomes reachable on production for the first time — the risk profile of any XXE bug in this parser goes from "dev-only" to "prod-reachable". An attacker who controls or substitutes the ODS source (e.g. supply-chain compromise of the `zzfamilienarchiv*.ods` file before it lands on the host) could exfiltrate any file the backend container can read. POI 5.5.0 is recent enough that defaults are *probably* safe — but "probably" is not enough for prod-reachable XML parsing of attacker-influenceable input. ## Required 1. Locate every `DocumentBuilderFactory.newInstance()` call in the codebase (currently only `MassImportService`). 2. Explicitly disable doctype declarations on the factory: ```java DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance(); dbf.setFeature("http://apache.org/xml/features/disallow-doctype-decl", true); dbf.setFeature("http://xml.org/sax/features/external-general-entities", false); dbf.setFeature("http://xml.org/sax/features/external-parameter-entities", false); dbf.setFeature("http://apache.org/xml/features/nonvalidating/load-external-dtd", false); dbf.setXIncludeAware(false); dbf.setExpandEntityReferences(false); ``` 3. Extract the hardened factory into a small helper so future XML parsers in this codebase inherit the same defaults. 4. Permanent regression test: feed a tiny ODS-shaped XML containing a `<!DOCTYPE foo [<!ENTITY xxe SYSTEM "file:///etc/hostname">]>` block and an `&xxe;` reference; assert the parse rejects or ignores it. ## Detection Add the Semgrep rule from Nora's review comment to catch future regressions across all stacks: ```yaml rules: - id: dbf-xxe-default pattern: $X = DocumentBuilderFactory.newInstance(); pattern-not-inside: | ... $X.setFeature("http://apache.org/xml/features/disallow-doctype-decl", true); ... message: "DocumentBuilderFactory without XXE protection" severity: WARNING ``` ## Acceptance criteria - [ ] All `DocumentBuilderFactory` call sites harden their factory with the features above - [ ] Failing-then-passing regression test demonstrating XXE is blocked - [ ] Semgrep rule committed and running in CI - [ ] PR notes whether POI 5.5.0 default already covered this (in which case the explicit hardening is defense-in-depth) ## Linked NFRs - **Security:** XML parsers in the backend MUST disallow DOCTYPE declarations and external entity resolution. - **Maintainability:** XML-parser construction MUST go through a shared hardened helper so safe defaults are inherited. - **Observability:** Semgrep rule MUST run in CI and fail the build on regressions. ## Definition of Ready - [x] Problem statement clear - [x] Required steps enumerated - [x] Detection mechanism specified - [x] Acceptance criteria testable 🤖 Generated with [Claude Code](https://claude.com/claude-code) during /implement on #526
marcel added the P0-criticalsecurity labels 2026-05-11 20:36:38 +02:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: marcel/familienarchiv#528