refactor(ocr): document > 50 frequency threshold rationale

Strict greater-than avoids non-determinism: if multiple candidates share
the minimum frequency value, pyspellchecker's ranking is undefined.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
Marcel
2026-04-17 17:21:37 +02:00
parent fea24aee25
commit ec85f228c1

View File

@@ -107,7 +107,7 @@ def correct_text(text: str) -> str:
continue continue
correction = _spell.correction(word) correction = _spell.correction(word)
if correction and _spell.word_frequency[correction] > 50: if correction and _spell.word_frequency[correction] > 50: # strict > avoids non-determinism when candidates tie at the frequency floor
if word[0].isupper() and not correction[0].isupper(): if word[0].isupper() and not correction[0].isupper():
correction = correction.capitalize() correction = correction.capitalize()
checked.append(leading + correction + CORRECTION_MARKER + trailing) checked.append(leading + correction + CORRECTION_MARKER + trailing)