#78: NO_USABLE_TEXT (Foto-PDF) finalisiert sofort zu FAILED_FINAL

Bisher wurde NO_USABLE_TEXT (kein OCR-Text im PDF) wie alle anderen deterministischen Inhaltsfehler mit der 1-Retry-Regel behandelt und landete beim ersten Auftreten in FAILED_RETRYABLE. Da ein Bild-Scan ohne OCR-Text sich zwischen Läufen nicht verändert, ist ein Wiederholversuch sinnlos – der Status muss sofort FAILED_FINAL sein. Geändert: ProcessingOutcomeTransition erkennt NO_USABLE_TEXT als Sonderfall und liefert ohne Retry-Prüfung FAILED_FINAL. PAGE_LIMIT_EXCEEDED und CONTENT_NOT_EXTRACTABLE behalten die 1-Retry-Regel. Tests angepasst: Bestehende Tests, die FAILED_RETRYABLE für NO_USABLE_TEXT erwarteten, wurden auf das korrekte Verhalten umgestellt oder auf PAGE_LIMIT_EXCEEDED umgeschrieben. Neue Lifecycle-Tests für NO_USABLE_TEXT (sofort FAILED_FINAL → SKIPPED_FINAL_FAILURE) hinzugefügt. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-04 15:08:01 +02:00
parent 349ee69a7f
commit 18f9c33bbb
4 changed files with 100 additions and 18 deletions
@@ -21,9 +21,10 @@ public enum PreCheckFailureReason {
     * The extracted PDF text, after normalization, contains no letters or digits.
     * <p>
     * This is a deterministic content error: reprocessing the same file in a later run
-     * will have the same outcome unless the source file is changed.
+     * will have the same outcome unless the source file is changed (e.g. by adding OCR).
     * <p>
-     * Retry logic: exactly 1 retry in a later batch run.
+     * Retry logic: no retry — the document is immediately finalised to
+     * {@link ProcessingStatus#FAILED_FINAL}.
     */
    NO_USABLE_TEXT("No usable text in extracted PDF content"),