refactor(ocr): document > 50 frequency threshold rationale
Strict greater-than avoids non-determinism: if multiple candidates share the minimum frequency value, pyspellchecker's ranking is undefined. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -107,7 +107,7 @@ def correct_text(text: str) -> str:
|
||||
continue
|
||||
|
||||
correction = _spell.correction(word)
|
||||
if correction and _spell.word_frequency[correction] > 50:
|
||||
if correction and _spell.word_frequency[correction] > 50: # strict > avoids non-determinism when candidates tie at the frequency floor
|
||||
if word[0].isupper() and not correction[0].isupper():
|
||||
correction = correction.capitalize()
|
||||
checked.append(leading + correction + CORRECTION_MARKER + trailing)
|
||||
|
||||
Reference in New Issue
Block a user