M4 AP-001 Kernobjekte, Statusmodell und Port-Verträge präzisieren

Arbeitspakete für M4
2026-04-02 19:24:00 +02:00 · 2026-04-02 19:04:55 +02:00
23 changed files with 1409 additions and 17 deletions
@@ -17,7 +17,8 @@
      "Bash(grep \"\\\\.java$\")",
      "Bash(mvn -q clean compile -DskipTests)",
      "Bash(mvn -q test)",
-      "Bash(mvn -q clean test)"
+      "Bash(mvn -q clean test)",
      "Bash(./mvnw.cmd:*)"
    ]
  }
 }
@@ -0,0 +1,516 @@
 # M4 - Arbeitspakete
 ## Geltungsbereich
 Dieses Dokument beschreibt ausschließlich die Arbeitspakete für den definierten Meilenstein **M4 – Fingerprint, SQLite-Persistenz und Idempotenz**.
 Die Meilensteine **M1**, **M2** und **M3** werden als vollständig umgesetzt vorausgesetzt.
 Die Arbeitspakete sind bewusst so geschnitten, dass:
 - **KI 1** daraus je Arbeitspaket einen klaren Einzel-Prompt ableiten kann,
 - **KI 2** genau dieses eine Arbeitspaket in **einem Durchgang** vollständig umsetzen kann,
 - nach **jedem** Arbeitspaket wieder ein **fehlerfreier, buildbarer Stand** vorliegt.
 Die Reihenfolge der Arbeitspakete ist verbindlich.
 ## Zusätzliche Schnittregeln für die KI-Bearbeitung
 - Pro Arbeitspaket nur die **minimal notwendigen Querschnitte** durch Domain, Application, Adapter und Bootstrap ändern.
 - Keine Annahmen treffen, die nicht durch dieses Dokument oder die verbindlichen Spezifikationen gedeckt sind.
 - Kein Vorgriff auf **M5+**.
 - Kein Umbau bestehender M1–M3-Strukturen ohne direkten M4-Bezug.
 - Neue Typen, Ports und Adapter so schneiden, dass sie aus einem einzelnen Arbeitspaket heraus **klar benennbar, testbar und reviewbar** sind.
 ## Explizit nicht Bestandteil von M4
 - KI-Anbindung
 - Prompt-Laden oder Prompt-Verarbeitung
 - Validierung von KI-Antworten
 - Dateinamensbildung
 - Zielkopie in den Zielordner
 - Windows-Zeichenbereinigung für Zieldateinamen
 - physische Dublettenbehandlung im Zielordner
 - M5+-Persistenzfelder wie Modellname, Prompt-Identifikator, KI-Rohantwort, KI-Reasoning, Datumsquelle, finaler Titel oder finaler Zieldateiname
 - vollständige laufübergreifende Retry-Logik späterer Meilensteine für KI- und Zielkopie-Fehler
 - Logging-Feinschliff des Endstands
 ## Verbindliche M4-Regeln für **alle** Arbeitspakete
 ### 1. Identifikation
 - Die Identifikation eines Dokuments erfolgt in M4 **ausschließlich über den SHA-256-Fingerprint des Dateiinhalts**.
 - Dateiname und Pfad dienen **nicht** als Identifikator.
 - Gleicher Inhalt unter anderem Dateinamen oder anderem Pfad ist **dasselbe Dokument**.
 - Geänderter Inhalt ist **ein neuer fachlicher Vorgang**.
 ### 2. Persistenzmodell
 M4 führt die Persistenz verbindlich in **zwei Ebenen**:
 1. **Dokument-Stammsatz** pro Fingerprint
 2. **Versuchshistorie** mit einem Datensatz pro historisiertem dokumentbezogenem Verarbeitungsversuch
 ### 3. Minimale Pflichtdaten im Dokument-Stammsatz für M4
 Im Dokument-Stammsatz müssen in M4 mindestens speicherbar sein:
 - interne ID
 - Fingerprint
 - letzter bekannter Quellpfad
 - letzter bekannter Quelldateiname
 - aktueller Gesamtstatus
 - Anzahl bisheriger Inhaltsfehler
 - Anzahl bisheriger transienter Fehler
 - letzter Fehlerzeitpunkt
 - letzter Erfolgzeitpunkt
 - Erstellungszeitpunkt
 - Änderungszeitpunkt
 **Nicht** Bestandteil von M4-Stammsatzfeldern sind Zielpfad, Zieldateiname oder KI-bezogene Felder.
 ### 4. Minimale Pflichtdaten der Versuchshistorie für M4
 Für jeden in M4 zu historisierenden Versuch müssen mindestens speicherbar sein:
 - Versuchs-ID
 - Fingerprint-Referenz
 - Lauf-ID
 - Versuchsnummer
 - Startzeitpunkt
 - Endzeitpunkt
 - Ergebnisstatus
 - Fehlerklasse
 - Fehlermeldung bzw. Begründung
 - Retryable-Flag
 ### 5. Statusmodell für M4
 Für M4 müssen folgende Statuswerte fachlich klar verwendbar sein:
 - `SUCCESS`
 - `FAILED_RETRYABLE`
 - `FAILED_FINAL`
 - `SKIPPED_ALREADY_PROCESSED`
 - `SKIPPED_FINAL_FAILURE`
 Ein technischer Zwischenstatus `PROCESSING` ist zusätzlich zulässig, aber für M4 nicht verpflichtend.
 ### 6. Verbindliche M4-Minimalregeln für Status und Zähler
 Für M4 gelten **genau** diese Minimalregeln:
 - Bereits erfolgreich verarbeitete Dokumente werden in späteren Läufen übersprungen.
 - Bereits final fehlgeschlagene Dokumente werden in späteren Läufen übersprungen.
 - Ein **deterministischer Inhaltsfehler aus M3**  
  - beim **ersten** historisierten Auftreten führt zu `FAILED_RETRYABLE`, erhöht den **Inhaltsfehlerzähler** auf 1 und setzt `retryable = true`,
  - beim **zweiten** historisierten Auftreten in einem späteren Lauf führt zu `FAILED_FINAL`, erhöht den **Inhaltsfehlerzähler** auf 2 und setzt `retryable = false`.
 - In M4 sind die deterministischen Inhaltsfehler ausschließlich die bereits aus M3 bekannten Fälle:
  - kein brauchbarer Text
  - Seitenlimit überschritten
 - Dokumentbezogene **technische** Fehler nach erfolgreicher Fingerprint-Ermittlung bleiben in M4 `FAILED_RETRYABLE`, erhöhen den **Transientfehlerzähler** und setzen `retryable = true`.
 - Skip-Ereignisse ändern **keinen** Fehlerzähler.
 ### 7. Historisierung in M4
 - Jeder **identifizierte** dokumentbezogene Verarbeitungsversuch wird separat historisiert.
 - Die Versuchsnummer beginnt pro Fingerprint bei **1** und steigt pro historisiertem Versuch monoton um **1**.
 - Auch Skip-Fälle werden historisiert:
  - `SKIPPED_ALREADY_PROCESSED`
  - `SKIPPED_FINAL_FAILURE`
 - Ein in M4 historisierter Versuch setzt einen **erfolgreich ermittelten Fingerprint** voraus.
 - Technische Fehler **vor** erfolgreicher Fingerprint-Ermittlung sind in M4 **keine** SQLite-historisierten Versuche; sie werden nur kontrolliert als dokumentbezogene Laufereignisse behandelt.
 ### 8. Reihenfolge pro Dokument in M4
 Die Verarbeitung eines einzelnen Kandidaten erfolgt in M4 verbindlich in dieser Reihenfolge:
 1. Fingerprint berechnen
 2. Dokument-Stammsatz laden
 3. bei `SUCCESS` Skip-Entscheidung treffen und Skip-Versuch historisieren
 4. bei `FAILED_FINAL` Skip-Entscheidung treffen und Skip-Versuch historisieren
 5. sonst bestehenden M3-Ablauf ausführen
 6. M3-Ergebnis in M4-Status, Zähler und Retryable-Flag überführen
 7. Versuch historisieren
 8. Dokument-Stammsatz fortschreiben
 ### 9. Konsistenz pro identifiziertem Dokument
 - Für jeden identifizierten dokumentbezogenen Versuch müssen **Versuchshistorie und Stammsatz konsistent** fortgeschrieben werden.
 - Teilaktualisierungen zwischen Historie und Stammsatz sind zu vermeiden.
 - Wenn die Persistenz eines dokumentbezogenen Versuchs technisch scheitert, darf **kein inkonsistenter Teilzustand** zurückbleiben.
 ### 10. Schema-Initialisierung
 - Die Initialisierung des SQLite-Schemas erfolgt in M4 **beim Programmstart**, bevor der Batch-Lauf mit der Dokumentverarbeitung beginnt.
 - Eine nur implizite oder ausschließlich lazy Initialisierung während des laufenden Dokumentdurchsatzes ist **nicht** Ziel von M4.
 ---
 ## AP-001 M4-Kernobjekte, Statussemantik und Port-Verträge präzisieren
 ### Voraussetzung
 Keine. Dieses Arbeitspaket ist der M4-Startpunkt.
 ### Ziel
 Die M4-relevanten Typen, Statusbedeutungen und Port-Verträge werden eindeutig eingeführt, damit spätere Arbeitspakete ohne Interpretationsspielraum implementiert werden können.
 ### Muss umgesetzt werden
 - Neue M4-relevante Kernobjekte bzw. Application-nahe Typen anlegen, insbesondere für:
  - Dokument-Fingerprint
  - Dokument-Stammsatz
  - Verarbeitungsversuch
  - Fehlerzählerstände
  - dokumentbezogene Persistenzentscheidung bzw. Lookup-Ergebnis
  - technische Fehlerklassifikation für dokumentbezogene M4-Verarbeitung
 - Statusmodell so vervollständigen oder schärfen, dass die verbindlichen M4-Statuswerte fachlich eindeutig abbildbar sind.
 - Eindeutige Semantik für folgende Fälle im Typmodell bzw. in JavaDoc festlegen:
  - unbekanntes Dokument
  - bekanntes, noch nicht terminales Dokument
  - bereits erfolgreiches Dokument
  - bereits final fehlgeschlagenes Dokument
  - historisierbarer dokumentbezogener Versuch
  - nicht historisierbarer Vor-Fingerprint-Fehler
 - Outbound-Ports definieren für:
  - Erzeugung eines Fingerprints für genau einen Verarbeitungskandidaten
  - Lesen und Schreiben des Dokument-Stammsatzes
  - Schreiben und Lesen der Versuchshistorie
  - technische Initialisierung des SQLite-Schemas
 - Port-Verträge so schneiden, dass **weder `Path`/`File` noch JDBC-/SQLite-Typen** in Domain oder Application durchsickern.
 - Port-Rückgaben so modellieren, dass spätere Arbeitspakete ohne zusätzliche Annahmen unterscheiden können:
  - Dokument unbekannt
  - Dokument bekannt und aktiv weiter zu verarbeiten
  - Dokument terminal erfolgreich
  - Dokument terminal final fehlgeschlagen
  - technischer Persistenzfehler
 - JavaDoc und `package-info` für:
  - Statusbedeutungen
  - Zählersemantik
  - Historisierungsgrenzen
  - Architekturgrenzen
  ergänzen.
 ### Explizit nicht Teil
 - SHA-256-Implementierung
 - SQLite-Implementierung
 - konkrete SQL-Tabellen
 - Batch-Integration
 - Repository-Code
 ### Fertig wenn
 - die M4-relevanten Typen und Port-Verträge vorhanden sind,
 - die M4-Statussemantik eindeutig dokumentiert ist,
 - Historisierung vs. Vor-Fingerprint-Fehler klar abgegrenzt ist,
 - Domain und Application frei von Infrastrukturtypen bleiben,
 - der Build weiterhin fehlerfrei ist.
 ---
 ## AP-002 SHA-256-Fingerprint-Adapter für Verarbeitungskandidaten implementieren
 ### Voraussetzung
 AP-001 ist abgeschlossen.
 ### Ziel
 Für jeden Verarbeitungskandidaten kann ein stabiler, deterministischer SHA-256-Fingerprint erzeugt werden; technische Probleme werden kontrolliert in den Port-Vertrag überführt.
 ### Muss umgesetzt werden
 - Fingerprint-Port technisch im Adapter-Out implementieren.
 - SHA-256-basierte Fingerprint-Erzeugung für genau einen Verarbeitungskandidaten umsetzen.
 - Sicherstellen, dass der Fingerprint ausschließlich aus dem **Dateiinhalt** abgeleitet wird.
 - Kontrolliertes technisches Fehlerverhalten für mindestens folgende Fälle abbilden:
  - Datei nicht lesbar
  - Datei zwischen Kandidatenermittlung und Fingerprint-Erzeugung nicht mehr vorhanden
  - sonstige technische IO-Probleme
 - Sicherstellen, dass Dateisystem- und Hashing-Details ausschließlich im Adapter-Out verbleiben.
 - JavaDoc für Determinismus, Fehlerverhalten und M4-Grenze ergänzen, dass Vor-Fingerprint-Fehler **nicht** als SQLite-historisierte Versuche gelten.
 ### Explizit nicht Teil
 - SQLite-Persistenz
 - Batch-Orchestrierung
 - Versuchshistorie
 - Skip-Logik
 - Zählerfortschreibung
 ### Fertig wenn
 - für denselben Dateiinhalt stabil derselbe SHA-256-Fingerprint erzeugt wird,
 - Fingerprint und Fehler kontrolliert über den Port geliefert werden,
 - keine Hashing- oder Dateisystemdetails in Domain oder Application durchsickern,
 - der Build weiterhin fehlerfrei ist.
 ---
 ## AP-003 SQLite-Schema, Start-Initialisierung und Persistenzbasis im Adapter-Out einführen
 ### Voraussetzung
 AP-001 und AP-002 sind abgeschlossen.
 ### Ziel
 Die SQLite-basierte Persistenzgrundlage für M4 wird technisch sauber eingeführt und beim Programmstart kontrolliert initialisiert.
 ### Muss umgesetzt werden
 - SQLite-Dateizugriff im Adapter-Out technisch einführen.
 - Technischen Initialisierungsbaustein für SQLite-Schema anlegen und über den dafür vorgesehenen Port anbinden.
 - M4-Schema explizit in **zwei Ebenen** anlegen:
  - Dokument-Stammsatz
  - Versuchshistorie
 - Tabellen, Primärschlüssel, Fremdschlüssel, Unique-Regeln und sinnvolle Indizes für den M4-Stand definieren.
 - Dokument-Stammsatz so anlegen, dass die in diesem Dokument festgelegten M4-Pflichtfelder speicherbar sind.
 - Versuchshistorie so anlegen, dass die in diesem Dokument festgelegten M4-Pflichtfelder speicherbar sind.
 - Sicherstellen, dass:
  - Versuchsnummer pro Fingerprint eindeutig ist,
  - Skip-Versuche speicherbar sind,
  - keine M5+-Spalten angelegt werden.
 - Die Schema-Initialisierung so vorbereiten, dass sie **beim Programmstart** explizit aufgerufen werden kann.
 - JavaDoc für Schema-Zweck, Zwei-Ebenen-Modell und Initialisierungszeitpunkt ergänzen.
 ### Explizit nicht Teil
 - Repository-Fachlogik
 - Use-Case-Integration
 - Statusübergänge im Batch-Lauf
 - KI-bezogene Persistenzfelder
 - Zielpfad- oder Dateinamenspersistenz
 ### Fertig wenn
 - die SQLite-Datei und das M4-Schema technisch anlegbar sind,
 - beide Persistenzebenen den M4-Pflichtumfang abbilden,
 - die Start-Initialisierung technisch vorbereitet ist,
 - keine M5+-Felder im Schema enthalten sind,
 - der Stand fehlerfrei buildbar bleibt.
 ---
 ## AP-004 Repository für Dokument-Stammsatz mit vollständigem M4-Minimalumfang implementieren
 ### Voraussetzung
 AP-003 ist abgeschlossen.
 ### Ziel
 Der Dokument-Stammsatz kann pro Fingerprint zuverlässig gelesen, angelegt und fortgeschrieben werden, ohne fachliche Entscheidungslogik in den Adapter-Out zu verlagern.
 ### Muss umgesetzt werden
 - Repository-Adapter für den Dokument-Stammsatz implementieren.
 - Folgende technischen Fähigkeiten bereitstellen:
  - Suche eines Stammsatzes über Fingerprint
  - Neuanlage eines Stammsatzes für bisher unbekannte Dokumente
  - Fortschreibung von:
    - letztem bekanntem Quellpfad
    - letztem bekanntem Quelldateinamen
    - Gesamtstatus
    - Inhaltsfehlerzähler
    - Transientfehlerzähler
    - letztem Fehlerzeitpunkt
    - letztem Erfolgzeitpunkt
    - Änderungszeitpunkt
 - Sicherstellen, dass die Repository-Operationen **keine** fachlichen Entscheidungen über Retry-Regeln oder Skip-Logik treffen.
 - Mapping zwischen Application-Typen und SQLite-Struktur explizit und nachvollziehbar halten.
 - Upsert-/Neuanlageverhalten für den M4-Einzelprozess reproduzierbar modellieren.
 - JavaDoc für Verantwortlichkeit und Mapping ergänzen.
 ### Explizit nicht Teil
 - Versuchshistorie
 - Batch-Skip-Logik
 - Versuchsnummernvergabe
 - konkrete Statusentscheidungen im Use-Case
 - KI- oder Zielkopie-bezogene Persistenz
 ### Fertig wenn
 - der Dokument-Stammsatz pro Fingerprint zuverlässig gelesen und geschrieben werden kann,
 - alle M4-Pflichtfelder des Stammsatzes technisch fortschreibbar sind,
 - fachliche Entscheidungen nicht in das Repository abgerutscht sind,
 - der Build weiterhin fehlerfrei ist.
 ---
 ## AP-005 Repository für Versuchshistorie mit monotoner Versuchsnummer implementieren
 ### Voraussetzung
 AP-003 ist abgeschlossen.
 ### Ziel
 Jeder historisierbare dokumentbezogene M4-Versuch kann separat und nachvollziehbar persistiert werden.
 ### Muss umgesetzt werden
 - Repository-Adapter für die Versuchshistorie implementieren.
 - Schreiben genau eines Versuchseintrags pro historisiertem dokumentbezogenem M4-Versuch umsetzen.
 - Lesefähigkeiten bereitstellen, soweit sie für M4-Use-Case und Tests benötigt werden.
 - Versuchsnummern pro Fingerprint reproduzierbar ableiten oder fortschreiben.
 - Sicherstellen, dass die Versuchsnummer:
  - bei **1** beginnt,
  - pro Fingerprint monoton steigt,
  - auch bei Skip-Versuchen mitgezählt wird.
 - M4-relevante Historisierungsdaten persistieren:
  - Fingerprint-Referenz
  - Lauf-ID
  - Versuchsnummer
  - Startzeitpunkt
  - Endzeitpunkt
  - Ergebnisstatus
  - Fehlerklasse
  - Fehlermeldung bzw. Begründung
  - Retryable-Flag
 - Sicherstellen, dass nur **identifizierte** Dokumente historisiert werden.
 - JavaDoc für Historisierungszweck, Versuchsnummernlogik und M4-Grenzen ergänzen.
 ### Explizit nicht Teil
 - Dokument-Stammsatz
 - fachliche Zählerlogik
 - Batch-Orchestrierung
 - KI-Rohantwort, Modellname oder Prompt-Identifikator
 - Zielname, Zielpfad oder Zielkopie
 ### Fertig wenn
 - pro historisiertem dokumentbezogenem Verarbeitungsvorgang ein separater Versuchseintrag gespeichert werden kann,
 - die Versuchsnummern pro Fingerprint reproduzierbar und monoton sind,
 - Skip-Versuche historisierbar sind,
 - Vor-Fingerprint-Fehler nicht fälschlich historisiert werden,
 - der Stand fehlerfrei buildbar bleibt.
 ---
 ## AP-006 M4-Entscheidungslogik und Batch-Integration für Idempotenz, Zähler und konsistente Persistenz umsetzen
 ### Voraussetzung
 AP-001 bis AP-005 sind abgeschlossen.
 ### Ziel
 Der bestehende M3-Verarbeitungslauf wird zu einem echten M4-Lauf erweitert, der Dokumente über Fingerprint wiedererkennt, Status und Zähler korrekt fortschreibt, Skip-Fälle historisiert und dabei keinen inkonsistenten Persistenzzustand hinterlässt.
 ### Muss umgesetzt werden
 - Den bestehenden Batch-Use-Case so erweitern, dass pro Verarbeitungskandidat verbindlich diese Reihenfolge gilt:
  1. Fingerprint erzeugen
  2. Dokument-Stammsatz laden
  3. terminale Fälle entscheiden
  4. gegebenenfalls bestehenden M3-Ablauf ausführen
  5. Ergebnis in M4-Status, Zähler und Retryable-Flag überführen
  6. Versuch historisieren
  7. Dokument-Stammsatz fortschreiben
 - Folgende M4-Regeln explizit umsetzen:
  - vorhandener Gesamtstatus `SUCCESS` → Dokument wird nicht erneut fachlich verarbeitet, sondern mit `SKIPPED_ALREADY_PROCESSED` historisiert
  - vorhandener Gesamtstatus `FAILED_FINAL` → Dokument wird nicht erneut fachlich verarbeitet, sondern mit `SKIPPED_FINAL_FAILURE` historisiert
  - unbekanntes oder noch nicht terminales Dokument wird regulär weiterverarbeitet
 - M3-Ergebnisse exakt wie folgt in M4 überführen:
  - M3 erfolgreich abgeschlossen → `SUCCESS`, keine Fehlerzähler erhöhen, `retryable = false`
  - M3-Inhaltsfehler „kein brauchbarer Text“ oder „Seitenlimit überschritten“ beim ersten historisierten Auftreten → `FAILED_RETRYABLE`, Inhaltsfehlerzähler +1, `retryable = true`
  - derselbe Dokumenttyp eines bereits identifizierten Dokuments mit erneutem deterministischen Inhaltsfehler in einem späteren Lauf → `FAILED_FINAL`, Inhaltsfehlerzähler +1, `retryable = false`
  - dokumentbezogener technischer Fehler nach erfolgreicher Fingerprint-Ermittlung → `FAILED_RETRYABLE`, Transientfehlerzähler +1, `retryable = true`
 - Skip-Fälle so behandeln, dass:
  - ein eigener Versuchseintrag geschrieben wird,
  - kein Fehlerzähler verändert wird,
  - der Gesamtstatus des Stammsatzes terminal bestehen bleibt.
 - Vor-Fingerprint-Fehler ausdrücklich **nicht** als SQLite-Versuch historisieren.
 - Für identifizierte Dokumente sicherstellen, dass **Historie und Stammsatz konsistent** fortgeschrieben werden und keine inkonsistenten Teilzustände entstehen.
 - Falls eine dokumentbezogene Persistenzoperation technisch scheitert:
  - darf kein teilaktualisierter Zustand zurückbleiben,
  - bleibt der Batch-Lauf für andere Dokumente kontrolliert weiter lauffähig,
  - wird kein M5+-Verhalten vorweggenommen.
 - JavaDoc für Idempotenz, Zählerfortschreibung, Skip-Semantik und Persistenzkonsistenz ergänzen.
 ### Explizit nicht Teil
 - KI-Aufruf
 - Dateinamensbildung
 - Zielkopie
 - M5+-Retry-Regeln für KI- oder Zielkopiefehler
 - M5+-Persistenzfelder
 - spätere Reporting- oder Auswertungslogik
 ### Fertig wenn
 - der Batch-Lauf identische Inhalte über Fingerprint wiedererkennt,
 - `SUCCESS`- und `FAILED_FINAL`-Dokumente in späteren Läufen historisiert übersprungen werden,
 - die Minimalregel „erster deterministischer Inhaltsfehler retryable, zweiter final“ explizit umgesetzt ist,
 - technische dokumentbezogene Fehler nach Fingerprint als retryable behandelt werden,
 - Historie und Stammsatz pro identifiziertem Dokument konsistent fortgeschrieben werden,
 - weiterhin keine M5+-Funktionalität enthalten ist.
 ---
 ## AP-007 Bootstrap- und CLI-Anpassungen für SQLite-Konfiguration, Start-Initialisierung und M4-Verdrahtung durchführen
 ### Voraussetzung
 AP-001 bis AP-006 sind abgeschlossen.
 ### Ziel
 Der Programmeinstieg ist sauber an den M4-Lauf angepasst; die Persistenz wird beim Start initialisiert und die neuen M4-Bausteine sind vollständig verdrahtet.
 ### Muss umgesetzt werden
 - Bootstrap-Verdrahtung auf die neuen M4-Ports, Adapter und Persistenzbausteine erweitern.
 - M4-relevante Konfiguration ergänzen bzw. verdrahten, insbesondere für:
  - `sqlite.file`
 - Startvalidierung so ergänzen, dass mindestens geprüft wird:
  - SQLite-Dateipfad ist vorhanden oder technisch anlegbar
  - Persistenzkonfiguration ist nutzbar
 - Technische Schema-Initialisierung **beim Programmstart** ausführen, bevor der eigentliche Dokumentlauf beginnt.
 - CLI-/Batch-Startpfad auf den realen M4-Ablauf ausrichten.
 - Sicherstellen, dass harte Start-, Verdrahtungs- oder Initialisierungsfehler weiterhin zu **Exit-Code 1** führen.
 - Sicherstellen, dass dokumentbezogene Fehler im späteren Lauf **nicht** als Startfehler fehlmodelliert werden.
 - M1–M3-Grundverhalten erhalten und sauber mit den M4-Bausteinen kombinieren.
 - JavaDoc und `package-info` für aktualisierte Verdrahtung, Konfiguration und Modulgrenzen ergänzen.
 ### Explizit nicht Teil
 - neue Exit-Code-Semantik späterer Meilensteine
 - KI-Verdrahtung
 - Zielordner- oder Dateinamensverdrahtung
 - Logging-Feinschliff
 ### Fertig wenn
 - das Programm im M4-Stand vollständig startbar ist,
 - das SQLite-Schema beim Start kontrolliert initialisiert wird,
 - die neuen Adapter korrekt verdrahtet sind,
 - harte Persistenz-Startfehler kontrolliert zu Exit-Code 1 führen,
 - der Build fehlerfrei bleibt.
 ---
 ## AP-008 Tests für Fingerprint, SQLite-Repositories, M4-Statusfortschreibung, Historie und Skip-Logik vervollständigen
 ### Voraussetzung
 AP-001 bis AP-007 sind abgeschlossen.
 ### Ziel
 Der vollständige M4-Zielzustand wird automatisiert abgesichert und als konsistenter Übergabestand nachgewiesen.
 ### Muss umgesetzt werden
 - Unit-Tests für die SHA-256-Fingerprint-Erzeugung implementieren.
 - Repository-Tests gegen SQLite implementieren, insbesondere für:
  - Schema-Initialisierung
  - Anlegen und Lesen eines Dokument-Stammsatzes
  - Fortschreiben aller M4-Pflichtfelder des Stammsatzes
  - Anlegen und Lesen von Versuchshistorie
  - stabile Versuchsnummern pro Fingerprint
 - Tests für M4-Statusfortschreibung und Zähler ergänzen, insbesondere:
  - unbekanntes Dokument mit erfolgreichem M4-Ende wird als `SUCCESS` persistiert
  - erster deterministischer Inhaltsfehler führt zu `FAILED_RETRYABLE`
  - zweiter deterministischer Inhaltsfehler in einem späteren Lauf führt zu `FAILED_FINAL`
  - technischer dokumentbezogener Fehler nach erfolgreicher Fingerprint-Ermittlung erhöht den Transientfehlerzähler und bleibt `FAILED_RETRYABLE`
  - Skip-Fälle verändern keine Fehlerzähler
 - Tests für Idempotenz- und Skip-Logik ergänzen, insbesondere:
  - bereits erfolgreiches Dokument wird historisiert übersprungen
  - final fehlgeschlagenes Dokument wird historisiert übersprungen
  - gleicher Inhalt unter anderem Dateinamen wird über denselben Fingerprint erkannt
 - Tests ergänzen, die belegen:
  - pro identifiziertem dokumentbezogenem Verarbeitungsvorgang entsteht genau **ein** Historieneintrag
  - Skip-Ereignisse werden historisiert
  - Vor-Fingerprint-Fehler nicht in SQLite-Historie auftauchen
 - Tests für Bootstrap- und Startverhalten ergänzen, insbesondere:
  - Schema-Initialisierung beim Start
  - harter Persistenz-Startfehler führt zu Exit-Code 1
 - Den M4-Stand abschließend auf Konsistenz, Architekturtreue und Nicht-Vorgriff auf M5+ prüfen.
 ### Explizit nicht Teil
 - Tests für KI, Prompt-Laden oder KI-JSON
 - Tests für Zielkopie oder Dateinamensbildung
 - Tests für M5+-Persistenzfelder
 - Tests für vollständige Retry-Logik späterer Meilensteine
 ### Fertig wenn
 - die Test-Suite für den M4-Umfang grün ist,
 - die wichtigsten M4-Randfälle automatisiert abgesichert sind,
 - der definierte M4-Zielzustand vollständig erreicht ist,
 - ein fehlerfreier, übergabefähiger Stand vorliegt.
 ---
 ## Abschlussbewertung
 Die Arbeitspakete decken den vollständigen M4-Zielumfang aus den verbindlichen Spezifikationen ab:
 - Fingerprint über SHA-256
 - SQLite-Persistenz in zwei Ebenen
 - Dokument-Stammsatz mit M4-Minimalumfang
 - Versuchshistorie pro identifiziertem dokumentbezogenem Versuch
 - Idempotenz über Fingerprint
 - Skip-Regeln für bereits erfolgreiche und final fehlgeschlagene Dokumente
 - explizite Minimalregel für deterministische Inhaltsfehler in M4
 - Tests für Fingerprint, Persistenz, Statusfortschreibung, Historie und Skip-Logik
 Gleichzeitig bleiben die Grenzen zu M1–M3 sowie zu M5+ gewahrt. Insbesondere werden **keine** KI-Funktionalitäten, **keine** Dateinamensbildung und **keine** Zielkopie vorweggenommen.
@@ -0,0 +1,30 @@
 package de.gecheckt.pdf.umbenenner.application.port.out;
 import java.util.Objects;
 /**
 * Lookup result indicating that a master record exists and the document is not yet terminal.
 * <p>
 * The document is known (fingerprint exists in the persistence store) but its overall
 * status is neither {@link de.gecheckt.pdf.umbenenner.domain.model.ProcessingStatus#SUCCESS}
 * nor {@link de.gecheckt.pdf.umbenenner.domain.model.ProcessingStatus#FAILED_FINAL}.
 * The use case may continue with normal M4 processing using the provided record.
 * <p>
 * The existing {@link DocumentRecord} is supplied so the use case can inspect the
 * current status, failure counters, and other fields required to apply M4 retry rules
 * without an additional lookup.
 *
 * @param record the current master record for this document; never null
 * @since M4-AP-001
 */
 public record DocumentKnownProcessable(DocumentRecord record) implements DocumentRecordLookupResult {
    /**
     * Compact constructor validating the non-null contract.
     *
     * @throws NullPointerException if {@code record} is null
     */
    public DocumentKnownProcessable {
        Objects.requireNonNull(record, "record must not be null");
    }
 }
@@ -0,0 +1,48 @@
 package de.gecheckt.pdf.umbenenner.application.port.out;
 /**
 * Unchecked exception thrown by persistence write operations when a technical
 * infrastructure failure prevents the operation from completing.
 * <p>
 * This exception is thrown by {@link DocumentRecordRepository} and
 * {@link ProcessingAttemptRepository} write methods, and by
 * {@link PersistenceSchemaInitializationPort#initializeSchema()}, when the underlying
 * persistence layer (SQLite) cannot be reached or returns an unrecoverable error.
 * <p>
 * <strong>Batch run impact:</strong>
 * <ul>
 *   <li>If thrown during <em>schema initialisation</em> at startup, the run must abort
 *       with exit code&nbsp;1.</li>
 *   <li>If thrown during <em>per-document write operations</em>, the current candidate
 *       is treated as a transient failure; the batch run continues with the remaining
 *       candidates.</li>
 * </ul>
 * <p>
 * The exception is <em>not</em> used for read operations; read failures are modelled
 * as {@link PersistenceLookupTechnicalFailure} in the sealed
 * {@link DocumentRecordLookupResult} hierarchy to allow exhaustive pattern matching
 * at the call site.
 *
 * @since M4-AP-001
 */
 public class DocumentPersistenceException extends RuntimeException {
    /**
     * Constructs a new {@code DocumentPersistenceException} with the given message.
     *
     * @param message human-readable description of the persistence failure
     */
    public DocumentPersistenceException(String message) {
        super(message);
    }
    /**
     * Constructs a new {@code DocumentPersistenceException} with message and cause.
     *
     * @param message human-readable description of the persistence failure
     * @param cause   the underlying throwable that caused this failure
     */
    public DocumentPersistenceException(String message, Throwable cause) {
        super(message, cause);
    }
 }
@@ -0,0 +1,83 @@
 package de.gecheckt.pdf.umbenenner.application.port.out;
 import de.gecheckt.pdf.umbenenner.domain.model.DocumentFingerprint;
 import de.gecheckt.pdf.umbenenner.domain.model.ProcessingStatus;
 import de.gecheckt.pdf.umbenenner.domain.model.SourceDocumentLocator;
 import java.time.Instant;
 import java.util.Objects;
 /**
 * Application-facing representation of the document master record (Dokument-Stammsatz).
 * <p>
 * One {@code DocumentRecord} exists per unique {@link DocumentFingerprint}. It carries
 * the current overall status, failure counters, and the most recently known source
 * location of the document.
 * <p>
 * <strong>Architecture boundary:</strong> This type contains no SQLite or JDBC types.
 * Mapping between {@code DocumentRecord} and the persistence layer is performed
 * exclusively by the repository adapter in {@code adapter-out}.
 * <p>
 * <strong>M4 field semantics:</strong>
 * <ul>
 *   <li>{@link #fingerprint()} — primary identity; never changes for a given record.</li>
 *   <li>{@link #lastKnownSourceLocator()} — opaque locator used by adapters; the
 *       application passes it through without interpreting the value.</li>
 *   <li>{@link #lastKnownSourceFileName()} — human-readable file name for logging and
 *       diagnostics; not used for identity.</li>
 *   <li>{@link #overallStatus()} — the current terminal or active status of the document
 *       across all runs. See {@link ProcessingStatus} for semantics.</li>
 *   <li>{@link #failureCounters()} — independent counters for content and transient errors;
 *       never increased by skip events.</li>
 *   <li>{@link #lastFailureInstant()} — timestamp of the most recent failure; {@code null}
 *       if no failure has been recorded yet.</li>
 *   <li>{@link #lastSuccessInstant()} — timestamp of the successful processing; {@code null}
 *       if the document has never been processed successfully.</li>
 *   <li>{@link #createdAt()} — timestamp when this master record was first created.</li>
 *   <li>{@link #updatedAt()} — timestamp of the most recent update to this master record.</li>
 * </ul>
 * <p>
 * <strong>Not included in M4:</strong> target path, target file name, AI-related fields.
 * These are added in later milestones.
 *
 * @param fingerprint             content-based identity; never null
 * @param lastKnownSourceLocator  opaque locator to the physical source file; never null
 * @param lastKnownSourceFileName file name at the time of the last known access; never null or blank
 * @param overallStatus           current processing status; never null
 * @param failureCounters         counters for content and transient errors; never null
 * @param lastFailureInstant      timestamp of the most recent failure, or {@code null}
 * @param lastSuccessInstant      timestamp of the successful processing, or {@code null}
 * @param createdAt               timestamp when this record was first created; never null
 * @param updatedAt               timestamp of the most recent update; never null
 * @since M4-AP-001
 */
 public record DocumentRecord(
        DocumentFingerprint fingerprint,
        SourceDocumentLocator lastKnownSourceLocator,
        String lastKnownSourceFileName,
        ProcessingStatus overallStatus,
        FailureCounters failureCounters,
        Instant lastFailureInstant,
        Instant lastSuccessInstant,
        Instant createdAt,
        Instant updatedAt) {
    /**
     * Compact constructor validating mandatory non-null fields.
     *
     * @throws NullPointerException     if any mandatory field is null
     * @throws IllegalArgumentException if {@code lastKnownSourceFileName} is blank
     */
    public DocumentRecord {
        Objects.requireNonNull(fingerprint, "fingerprint must not be null");
        Objects.requireNonNull(lastKnownSourceLocator, "lastKnownSourceLocator must not be null");
        Objects.requireNonNull(lastKnownSourceFileName, "lastKnownSourceFileName must not be null");
        if (lastKnownSourceFileName.isBlank()) {
            throw new IllegalArgumentException("lastKnownSourceFileName must not be blank");
        }
        Objects.requireNonNull(overallStatus, "overallStatus must not be null");
        Objects.requireNonNull(failureCounters, "failureCounters must not be null");
        Objects.requireNonNull(createdAt, "createdAt must not be null");
        Objects.requireNonNull(updatedAt, "updatedAt must not be null");
    }
 }
@@ -0,0 +1,32 @@
 package de.gecheckt.pdf.umbenenner.application.port.out;
 /**
 * Sealed result type for a document master record lookup via {@link DocumentRecordRepository}.
 * <p>
 * The use case uses this result to make the per-document processing decision in M4
 * without additional assumptions:
 * <ul>
 *   <li>{@link DocumentUnknown} — the fingerprint is not yet in the persistence store;
 *       the document must be processed for the first time.</li>
 *   <li>{@link DocumentKnownProcessable} — a master record exists but the document is
 *       not in a terminal state; normal processing may continue.</li>
 *   <li>{@link DocumentTerminalSuccess} — the document was already processed
 *       successfully; skip with {@link de.gecheckt.pdf.umbenenner.domain.model.ProcessingStatus#SKIPPED_ALREADY_PROCESSED}.</li>
 *   <li>{@link DocumentTerminalFinalFailure} — the document has finally failed; skip
 *       with {@link de.gecheckt.pdf.umbenenner.domain.model.ProcessingStatus#SKIPPED_FINAL_FAILURE}.</li>
 *   <li>{@link PersistenceLookupTechnicalFailure} — the lookup itself failed due to a
 *       technical infrastructure problem; the document cannot be processed in this run.</li>
 * </ul>
 * <p>
 * <strong>Architecture boundary:</strong> No JDBC, SQLite, or filesystem types appear
 * in this sealed hierarchy or in any of its implementations.
 *
 * @since M4-AP-001
 */
 public sealed interface DocumentRecordLookupResult
        permits DocumentUnknown,
                DocumentKnownProcessable,
                DocumentTerminalSuccess,
                DocumentTerminalFinalFailure,
                PersistenceLookupTechnicalFailure {
 }
@@ -0,0 +1,72 @@
 package de.gecheckt.pdf.umbenenner.application.port.out;
 import de.gecheckt.pdf.umbenenner.domain.model.DocumentFingerprint;
 /**
 * Outbound port for reading and writing the document master record (Dokument-Stammsatz).
 * <p>
 * One master record exists per unique {@link DocumentFingerprint}. The repository is
 * responsible for the persistence of {@link DocumentRecord} values; it holds no
 * business logic about retry rules, skip decisions, or status transitions.
 * <p>
 * <strong>Lookup semantics:</strong>
 * {@link #findByFingerprint(DocumentFingerprint)} returns a sealed
 * {@link DocumentRecordLookupResult} that allows the use case to distinguish exhaustively
 * between an unknown document, a known processable document, a terminal success, a
 * terminal final failure, and a technical persistence failure — without additional
 * assumptions or null checks.
 * <p>
 * <strong>Write semantics:</strong>
 * <ul>
 *   <li>{@link #create(DocumentRecord)} inserts a new record for a previously unknown
 *       document.</li>
 *   <li>{@link #update(DocumentRecord)} replaces the mutable fields of an existing
 *       record identified by its fingerprint.</li>
 * </ul>
 * Both write methods throw {@link DocumentPersistenceException} on technical failure.
 * <p>
 * <strong>Architecture boundary:</strong> No JDBC, SQLite, or filesystem types appear
 * in this interface or in any type it references. Mapping to and from the persistence
 * schema is the exclusive responsibility of the adapter implementation.
 *
 * @since M4-AP-001
 */
 public interface DocumentRecordRepository {
    /**
     * Looks up the master record for the given fingerprint.
     * <p>
     * Returns a {@link DocumentRecordLookupResult} that encodes all possible outcomes
     * including technical failures; this method never throws.
     *
     * @param fingerprint the content-based document identity to look up; must not be null
     * @return {@link DocumentUnknown} if no record exists,
     *         {@link DocumentKnownProcessable} if the document is known but not terminal,
     *         {@link DocumentTerminalSuccess} if the document succeeded,
     *         {@link DocumentTerminalFinalFailure} if the document finally failed, or
     *         {@link PersistenceLookupTechnicalFailure} if the lookup itself failed
     */
    DocumentRecordLookupResult findByFingerprint(DocumentFingerprint fingerprint);
    /**
     * Persists a new master record for a previously unknown document.
     * <p>
     * The fingerprint within {@code record} must not yet exist in the persistence store.
     *
     * @param record the new master record to persist; must not be null
     * @throws DocumentPersistenceException if the insert fails due to a technical error
     */
    void create(DocumentRecord record);
    /**
     * Updates the mutable fields of an existing master record.
     * <p>
     * The record is identified by its {@link DocumentFingerprint}; the fingerprint
     * itself is never changed. Mutable fields include the overall status, failure
     * counters, last known source location, and all timestamp fields.
     *
     * @param record the updated master record; must not be null; fingerprint must exist
     * @throws DocumentPersistenceException if the update fails due to a technical error
     */
    void update(DocumentRecord record);
 }
@@ -0,0 +1,30 @@
 package de.gecheckt.pdf.umbenenner.application.port.out;
 import java.util.Objects;
 /**
 * Lookup result indicating that the document has finally and irrecoverably failed.
 * <p>
 * The master record's overall status is
 * {@link de.gecheckt.pdf.umbenenner.domain.model.ProcessingStatus#FAILED_FINAL}.
 * The use case must skip further processing and historise a
 * {@link de.gecheckt.pdf.umbenenner.domain.model.ProcessingStatus#SKIPPED_FINAL_FAILURE}
 * attempt. No failure counters are changed.
 * <p>
 * The existing {@link DocumentRecord} is supplied so the use case can read the
 * current record for the skip attempt historisation without an additional lookup.
 *
 * @param record the current (finally failed) master record for this document; never null
 * @since M4-AP-001
 */
 public record DocumentTerminalFinalFailure(DocumentRecord record) implements DocumentRecordLookupResult {
    /**
     * Compact constructor validating the non-null contract.
     *
     * @throws NullPointerException if {@code record} is null
     */
    public DocumentTerminalFinalFailure {
        Objects.requireNonNull(record, "record must not be null");
    }
 }
@@ -0,0 +1,30 @@
 package de.gecheckt.pdf.umbenenner.application.port.out;
 import java.util.Objects;
 /**
 * Lookup result indicating that the document was already successfully processed.
 * <p>
 * The master record's overall status is
 * {@link de.gecheckt.pdf.umbenenner.domain.model.ProcessingStatus#SUCCESS}.
 * The use case must skip further processing and historise a
 * {@link de.gecheckt.pdf.umbenenner.domain.model.ProcessingStatus#SKIPPED_ALREADY_PROCESSED}
 * attempt. No failure counters are changed.
 * <p>
 * The existing {@link DocumentRecord} is supplied so the use case can read the
 * current record for the skip attempt historisation without an additional lookup.
 *
 * @param record the current (successful) master record for this document; never null
 * @since M4-AP-001
 */
 public record DocumentTerminalSuccess(DocumentRecord record) implements DocumentRecordLookupResult {
    /**
     * Compact constructor validating the non-null contract.
     *
     * @throws NullPointerException if {@code record} is null
     */
    public DocumentTerminalSuccess {
        Objects.requireNonNull(record, "record must not be null");
    }
 }
@@ -0,0 +1,14 @@
 package de.gecheckt.pdf.umbenenner.application.port.out;
 /**
 * Lookup result indicating that the fingerprint is not yet present in the persistence store.
 * <p>
 * The document has never been processed before. The use case must create a new
 * {@link DocumentRecord} and proceed with normal M4 processing.
 * <p>
 * This variant carries no data because there is no existing record to return.
 *
 * @since M4-AP-001
 */
 public record DocumentUnknown() implements DocumentRecordLookupResult {
 }
@@ -0,0 +1,75 @@
 package de.gecheckt.pdf.umbenenner.application.port.out;
 /**
 * Immutable snapshot of the two independent failure counters maintained per document.
 * <p>
 * M4 tracks two distinct counters separately because they drive different retry rules:
 * <ul>
 *   <li><strong>Content error counter</strong> ({@link #contentErrorCount()}):
 *       counts how many times a deterministic content error occurred for this document
 *       (no usable text, page limit exceeded). At count&nbsp;1 the document is
 *       {@code FAILED_RETRYABLE}; at count&nbsp;2 it becomes {@code FAILED_FINAL}.
 *       Skip events do <em>not</em> increase this counter.</li>
 *   <li><strong>Transient error counter</strong> ({@link #transientErrorCount()}):
 *       counts how many times a technical infrastructure error occurred after a
 *       successful fingerprint was computed. The document remains
 *       {@code FAILED_RETRYABLE} until the configured maximum is reached in later
 *       milestones. Skip events do <em>not</em> increase this counter.</li>
 * </ul>
 * <p>
 * A freshly discovered document starts with both counters at zero.
 * Counters are only written by the repository layer on the instructions of the
 * application use case; they never change as a side-effect of a read operation.
 *
 * @param contentErrorCount  number of deterministic content errors recorded so far;
 *                           must be &gt;= 0
 * @param transientErrorCount number of transient technical errors recorded so far;
 *                            must be &gt;= 0
 * @since M4-AP-001
 */
 public record FailureCounters(int contentErrorCount, int transientErrorCount) {
    /**
     * Compact constructor validating that neither counter is negative.
     *
     * @throws IllegalArgumentException if either counter is negative
     */
    public FailureCounters {
        if (contentErrorCount < 0) {
            throw new IllegalArgumentException(
                    "contentErrorCount must be >= 0, but was: " + contentErrorCount);
        }
        if (transientErrorCount < 0) {
            throw new IllegalArgumentException(
                    "transientErrorCount must be >= 0, but was: " + transientErrorCount);
        }
    }
    /**
     * Returns a {@code FailureCounters} instance with both counters at zero.
     * Use this when initialising a master record for a newly discovered document.
     *
     * @return zero-value counters
     */
    public static FailureCounters zero() {
        return new FailureCounters(0, 0);
    }
    /**
     * Returns a copy with the content error counter incremented by one.
     *
     * @return new instance with {@code contentErrorCount + 1}
     */
    public FailureCounters withIncrementedContentErrorCount() {
        return new FailureCounters(contentErrorCount + 1, transientErrorCount);
    }
    /**
     * Returns a copy with the transient error counter incremented by one.
     *
     * @return new instance with {@code transientErrorCount + 1}
     */
    public FailureCounters withIncrementedTransientErrorCount() {
        return new FailureCounters(contentErrorCount, transientErrorCount + 1);
    }
 }
@@ -0,0 +1,40 @@
 package de.gecheckt.pdf.umbenenner.application.port.out;
 import de.gecheckt.pdf.umbenenner.domain.model.SourceDocumentCandidate;
 /**
 * Outbound port for computing the content-based fingerprint of exactly one
 * processing candidate.
 * <p>
 * Implementations must derive the fingerprint <em>exclusively</em> from the binary
 * content of the file referenced by the candidate. File name, path, and metadata must
 * not influence the result.
 * <p>
 * <strong>Architecture boundary:</strong> All hashing logic and file I/O are confined
 * to the {@code adapter-out} implementation. This interface exposes no
 * {@code java.nio.file.Path}, {@code java.io.File}, or cryptographic types to Domain
 * or Application.
 * <p>
 * <strong>Failure semantics:</strong> Technical failures (unreadable file, I/O error)
 * are returned as {@link FingerprintTechnicalError} rather than thrown as exceptions.
 * A {@link FingerprintTechnicalError} result means no
 * {@link de.gecheckt.pdf.umbenenner.domain.model.DocumentFingerprint} is available
 * and the candidate cannot be identified; consequently no SQLite attempt record is
 * created for this candidate in M4.
 *
 * @since M4-AP-001
 */
 public interface FingerprintPort {
    /**
     * Computes the fingerprint for the given candidate.
     * <p>
     * This method never throws. All outcomes, including technical failures, are
     * encoded in the returned {@link FingerprintResult}.
     *
     * @param candidate the candidate whose file content is to be hashed; must not be null
     * @return {@link FingerprintSuccess} on success, or {@link FingerprintTechnicalError}
     *         on any infrastructure failure
     */
    FingerprintResult computeFingerprint(SourceDocumentCandidate candidate);
 }
@@ -0,0 +1,20 @@
 package de.gecheckt.pdf.umbenenner.application.port.out;
 /**
 * Sealed result type for a fingerprint computation attempt via {@link FingerprintPort}.
 * <p>
 * Exhaustive variants:
 * <ul>
 *   <li>{@link FingerprintSuccess} — fingerprint computed successfully.</li>
 *   <li>{@link FingerprintTechnicalError} — fingerprint computation failed due to a
 *       technical infrastructure problem (e.g. I/O error, file no longer accessible).</li>
 * </ul>
 * <p>
 * <strong>Historisation impact:</strong> If the result is {@link FingerprintTechnicalError},
 * the document cannot be identified and <em>no</em> SQLite attempt record is created.
 * The failure is treated as a non-identifiable run event.
 *
 * @since M4-AP-001
 */
 public sealed interface FingerprintResult permits FingerprintSuccess, FingerprintTechnicalError {
 }
@@ -0,0 +1,27 @@
 package de.gecheckt.pdf.umbenenner.application.port.out;
 import de.gecheckt.pdf.umbenenner.domain.model.DocumentFingerprint;
 import java.util.Objects;
 /**
 * Successful outcome of a fingerprint computation.
 * <p>
 * Carries the computed {@link DocumentFingerprint} that uniquely identifies the
 * document by its content. The fingerprint can now be used as the primary key
 * for all subsequent persistence operations in M4.
 *
 * @param fingerprint the successfully computed fingerprint; never null
 * @since M4-AP-001
 */
 public record FingerprintSuccess(DocumentFingerprint fingerprint) implements FingerprintResult {
    /**
     * Compact constructor validating the non-null contract.
     *
     * @throws NullPointerException if {@code fingerprint} is null
     */
    public FingerprintSuccess {
        Objects.requireNonNull(fingerprint, "fingerprint must not be null");
    }
 }
@@ -0,0 +1,34 @@
 package de.gecheckt.pdf.umbenenner.application.port.out;
 import java.util.Objects;
 /**
 * Technical failure during fingerprint computation.
 * <p>
 * Returned by {@link FingerprintPort} when the adapter cannot read the file content
 * to compute the SHA-256 hash. Typical causes include the file no longer being
 * accessible between candidate discovery and hashing, I/O errors, or permission issues.
 * <p>
 * <strong>Historisation impact:</strong> Because no {@link de.gecheckt.pdf.umbenenner.domain.model.DocumentFingerprint}
 * could be produced, this failure is <em>not</em> historised in SQLite. No
 * {@link ProcessingAttempt} is created.
 *
 * @param errorMessage human-readable description of the failure; never null or blank
 * @param cause        the underlying throwable, or {@code null} if not available
 * @since M4-AP-001
 */
 public record FingerprintTechnicalError(String errorMessage, Throwable cause) implements FingerprintResult {
    /**
     * Compact constructor validating the error message.
     *
     * @throws NullPointerException     if {@code errorMessage} is null
     * @throws IllegalArgumentException if {@code errorMessage} is blank
     */
    public FingerprintTechnicalError {
        Objects.requireNonNull(errorMessage, "errorMessage must not be null");
        if (errorMessage.isBlank()) {
            throw new IllegalArgumentException("errorMessage must not be blank");
        }
    }
 }
@@ -0,0 +1,36 @@
 package de.gecheckt.pdf.umbenenner.application.port.out;
 import java.util.Objects;
 /**
 * Lookup result indicating that the master record lookup itself failed due to a
 * technical infrastructure problem.
 * <p>
 * The persistence layer (SQLite) could not be reached or returned an unexpected error.
 * The document state is unknown; the use case must treat this candidate as a
 * transient technical failure for this run and must not attempt to write any attempt
 * record (since the underlying persistence is unavailable).
 * <p>
 * This variant is distinct from a business-level "document not found" outcome
 * ({@link DocumentUnknown}): here, the lookup operation itself failed.
 *
 * @param errorMessage human-readable description of the persistence failure; never null or blank
 * @param cause        the underlying throwable, or {@code null} if not available
 * @since M4-AP-001
 */
 public record PersistenceLookupTechnicalFailure(String errorMessage, Throwable cause)
        implements DocumentRecordLookupResult {
    /**
     * Compact constructor validating the error message.
     *
     * @throws NullPointerException     if {@code errorMessage} is null
     * @throws IllegalArgumentException if {@code errorMessage} is blank
     */
    public PersistenceLookupTechnicalFailure {
        Objects.requireNonNull(errorMessage, "errorMessage must not be null");
        if (errorMessage.isBlank()) {
            throw new IllegalArgumentException("errorMessage must not be blank");
        }
    }
 }
@@ -0,0 +1,40 @@
 package de.gecheckt.pdf.umbenenner.application.port.out;
 /**
 * Outbound port for initialising the SQLite persistence schema at program startup.
 * <p>
 * This port is invoked exactly once per program run, <em>before</em> the batch
 * document processing loop begins. The initialisation must ensure that all tables,
 * indices, and constraints required for M4 persistence are present in the SQLite file.
 * <p>
 * <strong>Timing:</strong> The adapter implementation must perform the schema
 * initialisation eagerly and synchronously. Lazy or deferred initialisation during
 * the document processing loop is not the intent of this port.
 * <p>
 * <strong>Failure handling:</strong> If the schema cannot be initialised, the
 * implementation must throw {@link DocumentPersistenceException}. The bootstrap
 * layer must catch this exception and abort the run with exit code&nbsp;1.
 * <p>
 * <strong>Idempotency:</strong> Calling {@link #initializeSchema()} on a database
 * that already has the correct schema must succeed without error (e.g. via
 * {@code CREATE TABLE IF NOT EXISTS} semantics).
 * <p>
 * <strong>Architecture boundary:</strong> No JDBC, SQLite, or filesystem types appear
 * in this interface. All schema DDL and connection management are confined to the
 * {@code adapter-out} implementation.
 *
 * @since M4-AP-001
 */
 public interface PersistenceSchemaInitializationPort {
    /**
     * Creates or verifies the M4 persistence schema.
     * <p>
     * Must be called once at program start, before any document processing begins.
     * The method must be idempotent: calling it on an already-initialised database
     * must not fail or alter existing data.
     *
     * @throws DocumentPersistenceException if the schema cannot be created or verified
     */
    void initializeSchema();
 }
@@ -0,0 +1,88 @@
 package de.gecheckt.pdf.umbenenner.application.port.out;
 import de.gecheckt.pdf.umbenenner.domain.model.DocumentFingerprint;
 import de.gecheckt.pdf.umbenenner.domain.model.ProcessingStatus;
 import de.gecheckt.pdf.umbenenner.domain.model.RunId;
 import java.time.Instant;
 import java.util.Objects;
 /**
 * Application-facing representation of exactly one historised processing attempt
 * (Versuchshistorie-Eintrag) for an identified document.
 * <p>
 * <strong>Historisation boundary (M4):</strong> Only attempts for documents whose
 * {@link DocumentFingerprint} was successfully computed are historised. Failures that
 * occur <em>before</em> the fingerprint is available (e.g. the source file is
 * unreadable before hashing) are <em>not</em> represented by a {@code ProcessingAttempt}
 * and are <em>not</em> written to SQLite.
 * <p>
 * <strong>Attempt number semantics:</strong> The attempt number starts at&nbsp;1 for the
 * first historised attempt per fingerprint and increases monotonically by 1 for every
 * subsequent attempt, including skip attempts
 * ({@link ProcessingStatus#SKIPPED_ALREADY_PROCESSED},
 * {@link ProcessingStatus#SKIPPED_FINAL_FAILURE}).
 * <p>
 * <strong>Field semantics:</strong>
 * <ul>
 *   <li>{@link #fingerprint()} — foreign key to the document master record.</li>
 *   <li>{@link #runId()} — identifies the batch run during which this attempt occurred.</li>
 *   <li>{@link #attemptNumber()} — monotonically increasing per fingerprint; assigned
 *       before the attempt is recorded.</li>
 *   <li>{@link #startedAt()} — wall-clock timestamp when processing of this candidate
 *       began in this run.</li>
 *   <li>{@link #endedAt()} — wall-clock timestamp when processing completed (success,
 *       failure, or skip).</li>
 *   <li>{@link #status()} — outcome status of this specific attempt.</li>
 *   <li>{@link #failureClass()} — short classification of the failure (e.g. enum constant
 *       name or exception class name); {@code null} for successful or skip attempts.</li>
 *   <li>{@link #failureMessage()} — human-readable failure description; {@code null} for
 *       successful or skip attempts.</li>
 *   <li>{@link #retryable()} — {@code true} if the failure is considered retryable in a
 *       later run; {@code false} for final failures, successes, and skip attempts.</li>
 * </ul>
 * <p>
 * <strong>Not included in M4:</strong> model name, prompt identifier, AI raw response,
 * AI reasoning, resolved date, date source, final title, final target file name.
 * These fields are added in later milestones (M5+).
 *
 * @param fingerprint    content-based document identity; never null
 * @param runId          identifier of the batch run; never null
 * @param attemptNumber  monotonic sequence number per fingerprint; must be &gt;= 1
 * @param startedAt      start of this processing attempt; never null
 * @param endedAt        end of this processing attempt; never null
 * @param status         outcome status of this attempt; never null
 * @param failureClass   failure classification, or {@code null} for non-failure statuses
 * @param failureMessage failure description, or {@code null} for non-failure statuses
 * @param retryable      whether this failure should be retried in a later run
 * @since M4-AP-001
 */
 public record ProcessingAttempt(
        DocumentFingerprint fingerprint,
        RunId runId,
        int attemptNumber,
        Instant startedAt,
        Instant endedAt,
        ProcessingStatus status,
        String failureClass,
        String failureMessage,
        boolean retryable) {
    /**
     * Compact constructor validating mandatory non-null fields and numeric constraints.
     *
     * @throws NullPointerException     if any mandatory field is null
     * @throws IllegalArgumentException if {@code attemptNumber} is less than 1
     */
    public ProcessingAttempt {
        Objects.requireNonNull(fingerprint, "fingerprint must not be null");
        Objects.requireNonNull(runId, "runId must not be null");
        if (attemptNumber < 1) {
            throw new IllegalArgumentException(
                    "attemptNumber must be >= 1, but was: " + attemptNumber);
        }
        Objects.requireNonNull(startedAt, "startedAt must not be null");
        Objects.requireNonNull(endedAt, "endedAt must not be null");
        Objects.requireNonNull(status, "status must not be null");
    }
 }
@@ -0,0 +1,70 @@
 package de.gecheckt.pdf.umbenenner.application.port.out;
 import de.gecheckt.pdf.umbenenner.domain.model.DocumentFingerprint;
 import java.util.List;
 /**
 * Outbound port for writing and reading the processing attempt history
 * (Versuchshistorie).
 * <p>
 * Every historisable processing attempt for an <em>identified</em> document results
 * in exactly one {@link ProcessingAttempt} record written via {@link #save(ProcessingAttempt)}.
 * <p>
 * <strong>Historisation boundary:</strong> Only attempts with a successfully computed
 * {@link de.gecheckt.pdf.umbenenner.domain.model.DocumentFingerprint} are historised.
 * Failures that occur before the fingerprint is available are <em>not</em> recorded
 * through this port.
 * <p>
 * <strong>Attempt number semantics:</strong>
 * Attempt numbers start at&nbsp;1 per fingerprint and increase monotonically by&nbsp;1
 * for every saved attempt, including skip attempts. The use case calls
 * {@link #loadNextAttemptNumber(DocumentFingerprint)} to obtain the correct sequence
 * number before constructing a {@link ProcessingAttempt}.
 * <p>
 * <strong>Architecture boundary:</strong> No JDBC, SQLite, or filesystem types appear
 * in this interface. Mapping to and from the persistence schema is the exclusive
 * responsibility of the adapter implementation.
 *
 * @since M4-AP-001
 */
 public interface ProcessingAttemptRepository {
    /**
     * Returns the attempt number to assign to the <em>next</em> attempt for the given
     * fingerprint.
     * <p>
     * If no prior attempts exist for the fingerprint, returns&nbsp;1.
     * Otherwise returns the current maximum attempt number plus&nbsp;1.
     *
     * @param fingerprint the document identity; must not be null
     * @return the next monotonic attempt number; always &gt;= 1
     * @throws DocumentPersistenceException if the query fails due to a technical error
     */
    int loadNextAttemptNumber(DocumentFingerprint fingerprint);
    /**
     * Persists exactly one processing attempt record.
     * <p>
     * The {@link ProcessingAttempt#attemptNumber()} must have been obtained from
     * {@link #loadNextAttemptNumber(DocumentFingerprint)} in the same run to guarantee
     * monotonic ordering.
     *
     * @param attempt the attempt to persist; must not be null
     * @throws DocumentPersistenceException if the insert fails due to a technical error
     */
    void save(ProcessingAttempt attempt);
    /**
     * Returns all historised attempts for the given fingerprint, ordered by
     * {@link ProcessingAttempt#attemptNumber()} ascending.
     * <p>
     * Returns an empty list if no attempts have been recorded yet.
     * Intended for use in tests and diagnostics; not required on the primary batch path.
     *
     * @param fingerprint the document identity; must not be null
     * @return immutable list of attempts, ordered by attempt number; never null
     * @throws DocumentPersistenceException if the query fails due to a technical error
     */
    List<ProcessingAttempt> findAllByFingerprint(DocumentFingerprint fingerprint);
 }
@@ -22,12 +22,40 @@
 *       — Extract text content and page count from a single PDF</li>
 * </ul>
 * <p>
 * M4-AP-001 ports:
 * <ul>
 *   <li>{@link de.gecheckt.pdf.umbenenner.application.port.out.FingerprintPort}
 *       — Compute the content-based SHA-256 fingerprint of a processing candidate</li>
 *   <li>{@link de.gecheckt.pdf.umbenenner.application.port.out.DocumentRecordRepository}
 *       — Read and write the document master record (Dokument-Stammsatz)</li>
 *   <li>{@link de.gecheckt.pdf.umbenenner.application.port.out.ProcessingAttemptRepository}
 *       — Write and read the per-document attempt history (Versuchshistorie)</li>
 *   <li>{@link de.gecheckt.pdf.umbenenner.application.port.out.PersistenceSchemaInitializationPort}
 *       — Initialise the SQLite schema at program startup</li>
 * </ul>
 * <p>
 * M4-AP-001 value types and result types:
 * <ul>
 *   <li>{@link de.gecheckt.pdf.umbenenner.application.port.out.FailureCounters}
 *       — Immutable snapshot of content-error and transient-error counters per document</li>
 *   <li>{@link de.gecheckt.pdf.umbenenner.application.port.out.DocumentRecord}
 *       — Application-facing representation of the document master record</li>
 *   <li>{@link de.gecheckt.pdf.umbenenner.application.port.out.ProcessingAttempt}
 *       — Application-facing representation of one historised processing attempt</li>
 *   <li>{@link de.gecheckt.pdf.umbenenner.application.port.out.FingerprintResult}
 *       — Sealed result of a fingerprint computation (success or technical error)</li>
 *   <li>{@link de.gecheckt.pdf.umbenenner.application.port.out.DocumentRecordLookupResult}
 *       — Sealed result of a master record lookup (unknown / processable / terminal / failure)</li>
 * </ul>
 * <p>
 * Exception types:
 * <ul>
 *   <li>{@link de.gecheckt.pdf.umbenenner.application.port.out.RunLockUnavailableException}
 *       — Thrown when run lock cannot be acquired (another instance running) (M2)</li>
 *   <li>{@link de.gecheckt.pdf.umbenenner.application.port.out.SourceDocumentAccessException}
 *       — Thrown when source folder cannot be read or accessed (M3)</li>
 *   <li>{@link de.gecheckt.pdf.umbenenner.application.port.out.DocumentPersistenceException}
 *       — Thrown when a persistence write operation or schema init fails (M4)</li>
 * </ul>
 * <p>
 * Architecture Rule: Outbound ports are implementation-agnostic and contain no business logic.
@@ -0,0 +1,51 @@
 package de.gecheckt.pdf.umbenenner.domain.model;
 import java.util.Objects;
 /**
 * Unique, stable identity of a document derived exclusively from its binary content.
 * <p>
 * A {@code DocumentFingerprint} is computed once per file read and used as the primary
 * key for all subsequent persistence lookups and history entries. It is independent of
 * the file name, path, or any metadata — only the raw file content determines the value.
 * <p>
 * <strong>Identification semantics (M4):</strong>
 * <ul>
 *   <li>Two files with identical content have the same fingerprint and are treated as
 *       the same document, regardless of their location or name.</li>
 *   <li>A file whose content has changed produces a different fingerprint and is treated
 *       as a new, independent document (new processing record).</li>
 * </ul>
 * <p>
 * <strong>Architecture boundary:</strong> The hashing algorithm (SHA-256) and all
 * file I/O required to compute the fingerprint are strictly confined to the
 * {@code adapter-out} layer. Domain and Application only hold and compare the resulting
 * hex string; they never access the filesystem or perform cryptographic operations.
 * <p>
 * <strong>Pre-fingerprint failures:</strong> If computing the fingerprint fails
 * (e.g. due to an I/O error), no {@code DocumentFingerprint} is created and the failure
 * is not historised in SQLite. The attempt is treated as a non-identifiable run event,
 * not as a documentable processing attempt.
 *
 * @param sha256Hex lowercase hex encoding of the SHA-256 digest (exactly 64 characters,
 *                  characters {@code [0-9a-f]})
 * @since M4-AP-001
 */
 public record DocumentFingerprint(String sha256Hex) {
    /**
     * Compact constructor that validates the hex string format.
     *
     * @param sha256Hex lowercase hex encoding of the SHA-256 digest
     * @throws NullPointerException     if {@code sha256Hex} is null
     * @throws IllegalArgumentException if {@code sha256Hex} is not exactly 64 lowercase hex characters
     */
    public DocumentFingerprint {
        Objects.requireNonNull(sha256Hex, "sha256Hex must not be null");
        if (sha256Hex.length() != 64 || !sha256Hex.matches("[0-9a-f]{64}")) {
            throw new IllegalArgumentException(
                    "sha256Hex must be a 64-character lowercase hex string, but was: '"
                            + sha256Hex + "'");
        }
    }
 }
@@ -1,20 +1,44 @@
 package de.gecheckt.pdf.umbenenner.domain.model;
 /**
- * Enumeration of all valid processing status values for a document within a batch run.
+ * Enumeration of all valid processing status values for a document.
 * <p>
- * Each status reflects the outcome or current state of a document processing attempt.
+ * Each status reflects the outcome or current state of a document in the
- * Status transitions follow the rules defined in the architecture specification and persist
+ * master record ({@code DocumentRecord}) or in a single attempt record
- * across multiple batch runs via the repository layer.
+ * ({@code ProcessingAttempt}).
 * <p>
- * Status Categories:
+ * <strong>Overall-status semantics (master record, M4):</strong>
 * <ul>
- *   <li><strong>Final Success:</strong> {@link #SUCCESS}</li>
+ *   <li>{@link #SUCCESS} — document was fully processed; skip in all future runs.</li>
- *   <li><strong>Retryable Failure:</strong> {@link #FAILED_RETRYABLE}</li>
+ *   <li>{@link #FAILED_RETRYABLE} — last attempt failed but is retryable; process again
- *   <li><strong>Final Failure:</strong> {@link #FAILED_FINAL}</li>
+ *       in the next run according to the applicable retry rule.</li>
- *   <li><strong>Skip (Already Processed):</strong> {@link #SKIPPED_ALREADY_PROCESSED}</li>
+ *   <li>{@link #FAILED_FINAL} — all allowed retries exhausted; skip in all future runs.</li>
- *   <li><strong>Skip (Final Failure):</strong> {@link #SKIPPED_FINAL_FAILURE}</li>
+ *   <li>{@link #PROCESSING} — document is currently being processed (transient, within a
- *   <li><strong>Processing (Transient):</strong> {@link #PROCESSING}</li>
+ *       run); if found persisted after a crash, treat as {@link #FAILED_RETRYABLE}.</li>
 * </ul>
 * <p>
 * <strong>Attempt-status semantics (attempt history, M4):</strong>
 * <ul>
 *   <li>{@link #SUCCESS} — this attempt completed successfully.</li>
 *   <li>{@link #FAILED_RETRYABLE} — this attempt failed; a future attempt is allowed.</li>
 *   <li>{@link #FAILED_FINAL} — this attempt failed and no further attempts will be made.</li>
 *   <li>{@link #SKIPPED_ALREADY_PROCESSED} — this attempt was a skip because the
 *       document's overall status was already {@link #SUCCESS}.</li>
 *   <li>{@link #SKIPPED_FINAL_FAILURE} — this attempt was a skip because the document's
 *       overall status was already {@link #FAILED_FINAL}.</li>
 * </ul>
 * <p>
 * <strong>M4 counter rules:</strong>
 * <ul>
 *   <li>Only {@link #FAILED_RETRYABLE} and {@link #FAILED_FINAL} outcomes may increase
 *       a failure counter (content-error or transient-error counter).</li>
 *   <li>Skip outcomes ({@link #SKIPPED_ALREADY_PROCESSED}, {@link #SKIPPED_FINAL_FAILURE})
 *       never change any failure counter.</li>
 *   <li>A deterministic content error at first occurrence → {@link #FAILED_RETRYABLE},
 *       content-error counter +1.  At second occurrence → {@link #FAILED_FINAL},
 *       content-error counter +2 (cumulative).</li>
 *   <li>A transient technical error after a successful fingerprint → {@link #FAILED_RETRYABLE},
 *       transient-error counter +1.</li>
 * </ul>
 *
 * @since M2-AP-001
@@ -40,13 +64,15 @@ public enum ProcessingStatus {
    FAILED_RETRYABLE,
    /**
-     * Processing failed with a deterministic content error (non-recoverable problem).
+     * Processing has failed finally and irrecoverably — no further retries will be attempted.
     * <p>
-     * Examples: PDF has no extractable text, page limit exceeded, document is ambiguous.
+     * This status is reached after all allowed retries for a document are exhausted.
     * For deterministic content errors (no usable text, page limit exceeded) this means
     * the second occurrence of the error. For other error types, it means the configured
     * maximum retry count has been reached.
     * <p>
-     * A document with this status receives exactly one retry in a later batch run.
+     * A document with this overall status is skipped in all future batch runs and
-     * After that retry, if it still fails, status becomes {@link #FAILED_FINAL}.
+     * a {@link #SKIPPED_FINAL_FAILURE} attempt is historised.
     * No further retries are attempted.
     */
    FAILED_FINAL,
@@ -6,6 +6,7 @@
 *   <li>{@link de.gecheckt.pdf.umbenenner.domain.model.ProcessingStatus} — enumeration of all valid document processing states</li>
 *   <li>{@link de.gecheckt.pdf.umbenenner.domain.model.RunId} — unique identifier for a batch run</li>
 *   <li>{@link de.gecheckt.pdf.umbenenner.domain.model.BatchRunContext} — technical context for a batch run</li>
 *   <li>{@link de.gecheckt.pdf.umbenenner.domain.model.DocumentFingerprint} — content-based document identity (SHA-256 hex); primary key for M4 persistence (M4)</li>
 *   <li>{@link de.gecheckt.pdf.umbenenner.domain.model.SourceDocumentCandidate} — discovered PDF from source folder</li>
 *   <li>{@link de.gecheckt.pdf.umbenenner.domain.model.SourceDocumentLocator} — opaque locator passed from scan adapter to extraction adapter</li>
 *   <li>{@link de.gecheckt.pdf.umbenenner.domain.model.PdfPageCount} — typed page count validation</li>
Author	SHA1	Message	Date
marcus	5441d15b41	M4 AP-001 Kernobjekte, Statusmodell und Port-Verträge präzisieren	2026-04-02 19:24:00 +02:00
marcus	69b68b25ac	Arbeitspakete für M4	2026-04-02 19:04:55 +02:00