Compare commits
2 Commits
edcbf45cd6
...
5441d15b41
| Author | SHA1 | Date | |
|---|---|---|---|
| 5441d15b41 | |||
| 69b68b25ac |
@@ -17,7 +17,8 @@
|
|||||||
"Bash(grep \"\\\\.java$\")",
|
"Bash(grep \"\\\\.java$\")",
|
||||||
"Bash(mvn -q clean compile -DskipTests)",
|
"Bash(mvn -q clean compile -DskipTests)",
|
||||||
"Bash(mvn -q test)",
|
"Bash(mvn -q test)",
|
||||||
"Bash(mvn -q clean test)"
|
"Bash(mvn -q clean test)",
|
||||||
|
"Bash(./mvnw.cmd:*)"
|
||||||
]
|
]
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|||||||
516
docs/workpackages/M4 - Arbeitspakete.md
Normal file
516
docs/workpackages/M4 - Arbeitspakete.md
Normal file
@@ -0,0 +1,516 @@
|
|||||||
|
# M4 - Arbeitspakete
|
||||||
|
|
||||||
|
## Geltungsbereich
|
||||||
|
|
||||||
|
Dieses Dokument beschreibt ausschließlich die Arbeitspakete für den definierten Meilenstein **M4 – Fingerprint, SQLite-Persistenz und Idempotenz**.
|
||||||
|
|
||||||
|
Die Meilensteine **M1**, **M2** und **M3** werden als vollständig umgesetzt vorausgesetzt.
|
||||||
|
|
||||||
|
Die Arbeitspakete sind bewusst so geschnitten, dass:
|
||||||
|
|
||||||
|
- **KI 1** daraus je Arbeitspaket einen klaren Einzel-Prompt ableiten kann,
|
||||||
|
- **KI 2** genau dieses eine Arbeitspaket in **einem Durchgang** vollständig umsetzen kann,
|
||||||
|
- nach **jedem** Arbeitspaket wieder ein **fehlerfreier, buildbarer Stand** vorliegt.
|
||||||
|
|
||||||
|
Die Reihenfolge der Arbeitspakete ist verbindlich.
|
||||||
|
|
||||||
|
## Zusätzliche Schnittregeln für die KI-Bearbeitung
|
||||||
|
|
||||||
|
- Pro Arbeitspaket nur die **minimal notwendigen Querschnitte** durch Domain, Application, Adapter und Bootstrap ändern.
|
||||||
|
- Keine Annahmen treffen, die nicht durch dieses Dokument oder die verbindlichen Spezifikationen gedeckt sind.
|
||||||
|
- Kein Vorgriff auf **M5+**.
|
||||||
|
- Kein Umbau bestehender M1–M3-Strukturen ohne direkten M4-Bezug.
|
||||||
|
- Neue Typen, Ports und Adapter so schneiden, dass sie aus einem einzelnen Arbeitspaket heraus **klar benennbar, testbar und reviewbar** sind.
|
||||||
|
|
||||||
|
## Explizit nicht Bestandteil von M4
|
||||||
|
|
||||||
|
- KI-Anbindung
|
||||||
|
- Prompt-Laden oder Prompt-Verarbeitung
|
||||||
|
- Validierung von KI-Antworten
|
||||||
|
- Dateinamensbildung
|
||||||
|
- Zielkopie in den Zielordner
|
||||||
|
- Windows-Zeichenbereinigung für Zieldateinamen
|
||||||
|
- physische Dublettenbehandlung im Zielordner
|
||||||
|
- M5+-Persistenzfelder wie Modellname, Prompt-Identifikator, KI-Rohantwort, KI-Reasoning, Datumsquelle, finaler Titel oder finaler Zieldateiname
|
||||||
|
- vollständige laufübergreifende Retry-Logik späterer Meilensteine für KI- und Zielkopie-Fehler
|
||||||
|
- Logging-Feinschliff des Endstands
|
||||||
|
|
||||||
|
## Verbindliche M4-Regeln für **alle** Arbeitspakete
|
||||||
|
|
||||||
|
### 1. Identifikation
|
||||||
|
- Die Identifikation eines Dokuments erfolgt in M4 **ausschließlich über den SHA-256-Fingerprint des Dateiinhalts**.
|
||||||
|
- Dateiname und Pfad dienen **nicht** als Identifikator.
|
||||||
|
- Gleicher Inhalt unter anderem Dateinamen oder anderem Pfad ist **dasselbe Dokument**.
|
||||||
|
- Geänderter Inhalt ist **ein neuer fachlicher Vorgang**.
|
||||||
|
|
||||||
|
### 2. Persistenzmodell
|
||||||
|
M4 führt die Persistenz verbindlich in **zwei Ebenen**:
|
||||||
|
|
||||||
|
1. **Dokument-Stammsatz** pro Fingerprint
|
||||||
|
2. **Versuchshistorie** mit einem Datensatz pro historisiertem dokumentbezogenem Verarbeitungsversuch
|
||||||
|
|
||||||
|
### 3. Minimale Pflichtdaten im Dokument-Stammsatz für M4
|
||||||
|
Im Dokument-Stammsatz müssen in M4 mindestens speicherbar sein:
|
||||||
|
|
||||||
|
- interne ID
|
||||||
|
- Fingerprint
|
||||||
|
- letzter bekannter Quellpfad
|
||||||
|
- letzter bekannter Quelldateiname
|
||||||
|
- aktueller Gesamtstatus
|
||||||
|
- Anzahl bisheriger Inhaltsfehler
|
||||||
|
- Anzahl bisheriger transienter Fehler
|
||||||
|
- letzter Fehlerzeitpunkt
|
||||||
|
- letzter Erfolgzeitpunkt
|
||||||
|
- Erstellungszeitpunkt
|
||||||
|
- Änderungszeitpunkt
|
||||||
|
|
||||||
|
**Nicht** Bestandteil von M4-Stammsatzfeldern sind Zielpfad, Zieldateiname oder KI-bezogene Felder.
|
||||||
|
|
||||||
|
### 4. Minimale Pflichtdaten der Versuchshistorie für M4
|
||||||
|
Für jeden in M4 zu historisierenden Versuch müssen mindestens speicherbar sein:
|
||||||
|
|
||||||
|
- Versuchs-ID
|
||||||
|
- Fingerprint-Referenz
|
||||||
|
- Lauf-ID
|
||||||
|
- Versuchsnummer
|
||||||
|
- Startzeitpunkt
|
||||||
|
- Endzeitpunkt
|
||||||
|
- Ergebnisstatus
|
||||||
|
- Fehlerklasse
|
||||||
|
- Fehlermeldung bzw. Begründung
|
||||||
|
- Retryable-Flag
|
||||||
|
|
||||||
|
### 5. Statusmodell für M4
|
||||||
|
Für M4 müssen folgende Statuswerte fachlich klar verwendbar sein:
|
||||||
|
|
||||||
|
- `SUCCESS`
|
||||||
|
- `FAILED_RETRYABLE`
|
||||||
|
- `FAILED_FINAL`
|
||||||
|
- `SKIPPED_ALREADY_PROCESSED`
|
||||||
|
- `SKIPPED_FINAL_FAILURE`
|
||||||
|
|
||||||
|
Ein technischer Zwischenstatus `PROCESSING` ist zusätzlich zulässig, aber für M4 nicht verpflichtend.
|
||||||
|
|
||||||
|
### 6. Verbindliche M4-Minimalregeln für Status und Zähler
|
||||||
|
Für M4 gelten **genau** diese Minimalregeln:
|
||||||
|
|
||||||
|
- Bereits erfolgreich verarbeitete Dokumente werden in späteren Läufen übersprungen.
|
||||||
|
- Bereits final fehlgeschlagene Dokumente werden in späteren Läufen übersprungen.
|
||||||
|
- Ein **deterministischer Inhaltsfehler aus M3**
|
||||||
|
- beim **ersten** historisierten Auftreten führt zu `FAILED_RETRYABLE`, erhöht den **Inhaltsfehlerzähler** auf 1 und setzt `retryable = true`,
|
||||||
|
- beim **zweiten** historisierten Auftreten in einem späteren Lauf führt zu `FAILED_FINAL`, erhöht den **Inhaltsfehlerzähler** auf 2 und setzt `retryable = false`.
|
||||||
|
- In M4 sind die deterministischen Inhaltsfehler ausschließlich die bereits aus M3 bekannten Fälle:
|
||||||
|
- kein brauchbarer Text
|
||||||
|
- Seitenlimit überschritten
|
||||||
|
- Dokumentbezogene **technische** Fehler nach erfolgreicher Fingerprint-Ermittlung bleiben in M4 `FAILED_RETRYABLE`, erhöhen den **Transientfehlerzähler** und setzen `retryable = true`.
|
||||||
|
- Skip-Ereignisse ändern **keinen** Fehlerzähler.
|
||||||
|
|
||||||
|
### 7. Historisierung in M4
|
||||||
|
- Jeder **identifizierte** dokumentbezogene Verarbeitungsversuch wird separat historisiert.
|
||||||
|
- Die Versuchsnummer beginnt pro Fingerprint bei **1** und steigt pro historisiertem Versuch monoton um **1**.
|
||||||
|
- Auch Skip-Fälle werden historisiert:
|
||||||
|
- `SKIPPED_ALREADY_PROCESSED`
|
||||||
|
- `SKIPPED_FINAL_FAILURE`
|
||||||
|
- Ein in M4 historisierter Versuch setzt einen **erfolgreich ermittelten Fingerprint** voraus.
|
||||||
|
- Technische Fehler **vor** erfolgreicher Fingerprint-Ermittlung sind in M4 **keine** SQLite-historisierten Versuche; sie werden nur kontrolliert als dokumentbezogene Laufereignisse behandelt.
|
||||||
|
|
||||||
|
### 8. Reihenfolge pro Dokument in M4
|
||||||
|
Die Verarbeitung eines einzelnen Kandidaten erfolgt in M4 verbindlich in dieser Reihenfolge:
|
||||||
|
|
||||||
|
1. Fingerprint berechnen
|
||||||
|
2. Dokument-Stammsatz laden
|
||||||
|
3. bei `SUCCESS` Skip-Entscheidung treffen und Skip-Versuch historisieren
|
||||||
|
4. bei `FAILED_FINAL` Skip-Entscheidung treffen und Skip-Versuch historisieren
|
||||||
|
5. sonst bestehenden M3-Ablauf ausführen
|
||||||
|
6. M3-Ergebnis in M4-Status, Zähler und Retryable-Flag überführen
|
||||||
|
7. Versuch historisieren
|
||||||
|
8. Dokument-Stammsatz fortschreiben
|
||||||
|
|
||||||
|
### 9. Konsistenz pro identifiziertem Dokument
|
||||||
|
- Für jeden identifizierten dokumentbezogenen Versuch müssen **Versuchshistorie und Stammsatz konsistent** fortgeschrieben werden.
|
||||||
|
- Teilaktualisierungen zwischen Historie und Stammsatz sind zu vermeiden.
|
||||||
|
- Wenn die Persistenz eines dokumentbezogenen Versuchs technisch scheitert, darf **kein inkonsistenter Teilzustand** zurückbleiben.
|
||||||
|
|
||||||
|
### 10. Schema-Initialisierung
|
||||||
|
- Die Initialisierung des SQLite-Schemas erfolgt in M4 **beim Programmstart**, bevor der Batch-Lauf mit der Dokumentverarbeitung beginnt.
|
||||||
|
- Eine nur implizite oder ausschließlich lazy Initialisierung während des laufenden Dokumentdurchsatzes ist **nicht** Ziel von M4.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## AP-001 M4-Kernobjekte, Statussemantik und Port-Verträge präzisieren
|
||||||
|
|
||||||
|
### Voraussetzung
|
||||||
|
Keine. Dieses Arbeitspaket ist der M4-Startpunkt.
|
||||||
|
|
||||||
|
### Ziel
|
||||||
|
Die M4-relevanten Typen, Statusbedeutungen und Port-Verträge werden eindeutig eingeführt, damit spätere Arbeitspakete ohne Interpretationsspielraum implementiert werden können.
|
||||||
|
|
||||||
|
### Muss umgesetzt werden
|
||||||
|
- Neue M4-relevante Kernobjekte bzw. Application-nahe Typen anlegen, insbesondere für:
|
||||||
|
- Dokument-Fingerprint
|
||||||
|
- Dokument-Stammsatz
|
||||||
|
- Verarbeitungsversuch
|
||||||
|
- Fehlerzählerstände
|
||||||
|
- dokumentbezogene Persistenzentscheidung bzw. Lookup-Ergebnis
|
||||||
|
- technische Fehlerklassifikation für dokumentbezogene M4-Verarbeitung
|
||||||
|
- Statusmodell so vervollständigen oder schärfen, dass die verbindlichen M4-Statuswerte fachlich eindeutig abbildbar sind.
|
||||||
|
- Eindeutige Semantik für folgende Fälle im Typmodell bzw. in JavaDoc festlegen:
|
||||||
|
- unbekanntes Dokument
|
||||||
|
- bekanntes, noch nicht terminales Dokument
|
||||||
|
- bereits erfolgreiches Dokument
|
||||||
|
- bereits final fehlgeschlagenes Dokument
|
||||||
|
- historisierbarer dokumentbezogener Versuch
|
||||||
|
- nicht historisierbarer Vor-Fingerprint-Fehler
|
||||||
|
- Outbound-Ports definieren für:
|
||||||
|
- Erzeugung eines Fingerprints für genau einen Verarbeitungskandidaten
|
||||||
|
- Lesen und Schreiben des Dokument-Stammsatzes
|
||||||
|
- Schreiben und Lesen der Versuchshistorie
|
||||||
|
- technische Initialisierung des SQLite-Schemas
|
||||||
|
- Port-Verträge so schneiden, dass **weder `Path`/`File` noch JDBC-/SQLite-Typen** in Domain oder Application durchsickern.
|
||||||
|
- Port-Rückgaben so modellieren, dass spätere Arbeitspakete ohne zusätzliche Annahmen unterscheiden können:
|
||||||
|
- Dokument unbekannt
|
||||||
|
- Dokument bekannt und aktiv weiter zu verarbeiten
|
||||||
|
- Dokument terminal erfolgreich
|
||||||
|
- Dokument terminal final fehlgeschlagen
|
||||||
|
- technischer Persistenzfehler
|
||||||
|
- JavaDoc und `package-info` für:
|
||||||
|
- Statusbedeutungen
|
||||||
|
- Zählersemantik
|
||||||
|
- Historisierungsgrenzen
|
||||||
|
- Architekturgrenzen
|
||||||
|
ergänzen.
|
||||||
|
|
||||||
|
### Explizit nicht Teil
|
||||||
|
- SHA-256-Implementierung
|
||||||
|
- SQLite-Implementierung
|
||||||
|
- konkrete SQL-Tabellen
|
||||||
|
- Batch-Integration
|
||||||
|
- Repository-Code
|
||||||
|
|
||||||
|
### Fertig wenn
|
||||||
|
- die M4-relevanten Typen und Port-Verträge vorhanden sind,
|
||||||
|
- die M4-Statussemantik eindeutig dokumentiert ist,
|
||||||
|
- Historisierung vs. Vor-Fingerprint-Fehler klar abgegrenzt ist,
|
||||||
|
- Domain und Application frei von Infrastrukturtypen bleiben,
|
||||||
|
- der Build weiterhin fehlerfrei ist.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## AP-002 SHA-256-Fingerprint-Adapter für Verarbeitungskandidaten implementieren
|
||||||
|
|
||||||
|
### Voraussetzung
|
||||||
|
AP-001 ist abgeschlossen.
|
||||||
|
|
||||||
|
### Ziel
|
||||||
|
Für jeden Verarbeitungskandidaten kann ein stabiler, deterministischer SHA-256-Fingerprint erzeugt werden; technische Probleme werden kontrolliert in den Port-Vertrag überführt.
|
||||||
|
|
||||||
|
### Muss umgesetzt werden
|
||||||
|
- Fingerprint-Port technisch im Adapter-Out implementieren.
|
||||||
|
- SHA-256-basierte Fingerprint-Erzeugung für genau einen Verarbeitungskandidaten umsetzen.
|
||||||
|
- Sicherstellen, dass der Fingerprint ausschließlich aus dem **Dateiinhalt** abgeleitet wird.
|
||||||
|
- Kontrolliertes technisches Fehlerverhalten für mindestens folgende Fälle abbilden:
|
||||||
|
- Datei nicht lesbar
|
||||||
|
- Datei zwischen Kandidatenermittlung und Fingerprint-Erzeugung nicht mehr vorhanden
|
||||||
|
- sonstige technische IO-Probleme
|
||||||
|
- Sicherstellen, dass Dateisystem- und Hashing-Details ausschließlich im Adapter-Out verbleiben.
|
||||||
|
- JavaDoc für Determinismus, Fehlerverhalten und M4-Grenze ergänzen, dass Vor-Fingerprint-Fehler **nicht** als SQLite-historisierte Versuche gelten.
|
||||||
|
|
||||||
|
### Explizit nicht Teil
|
||||||
|
- SQLite-Persistenz
|
||||||
|
- Batch-Orchestrierung
|
||||||
|
- Versuchshistorie
|
||||||
|
- Skip-Logik
|
||||||
|
- Zählerfortschreibung
|
||||||
|
|
||||||
|
### Fertig wenn
|
||||||
|
- für denselben Dateiinhalt stabil derselbe SHA-256-Fingerprint erzeugt wird,
|
||||||
|
- Fingerprint und Fehler kontrolliert über den Port geliefert werden,
|
||||||
|
- keine Hashing- oder Dateisystemdetails in Domain oder Application durchsickern,
|
||||||
|
- der Build weiterhin fehlerfrei ist.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## AP-003 SQLite-Schema, Start-Initialisierung und Persistenzbasis im Adapter-Out einführen
|
||||||
|
|
||||||
|
### Voraussetzung
|
||||||
|
AP-001 und AP-002 sind abgeschlossen.
|
||||||
|
|
||||||
|
### Ziel
|
||||||
|
Die SQLite-basierte Persistenzgrundlage für M4 wird technisch sauber eingeführt und beim Programmstart kontrolliert initialisiert.
|
||||||
|
|
||||||
|
### Muss umgesetzt werden
|
||||||
|
- SQLite-Dateizugriff im Adapter-Out technisch einführen.
|
||||||
|
- Technischen Initialisierungsbaustein für SQLite-Schema anlegen und über den dafür vorgesehenen Port anbinden.
|
||||||
|
- M4-Schema explizit in **zwei Ebenen** anlegen:
|
||||||
|
- Dokument-Stammsatz
|
||||||
|
- Versuchshistorie
|
||||||
|
- Tabellen, Primärschlüssel, Fremdschlüssel, Unique-Regeln und sinnvolle Indizes für den M4-Stand definieren.
|
||||||
|
- Dokument-Stammsatz so anlegen, dass die in diesem Dokument festgelegten M4-Pflichtfelder speicherbar sind.
|
||||||
|
- Versuchshistorie so anlegen, dass die in diesem Dokument festgelegten M4-Pflichtfelder speicherbar sind.
|
||||||
|
- Sicherstellen, dass:
|
||||||
|
- Versuchsnummer pro Fingerprint eindeutig ist,
|
||||||
|
- Skip-Versuche speicherbar sind,
|
||||||
|
- keine M5+-Spalten angelegt werden.
|
||||||
|
- Die Schema-Initialisierung so vorbereiten, dass sie **beim Programmstart** explizit aufgerufen werden kann.
|
||||||
|
- JavaDoc für Schema-Zweck, Zwei-Ebenen-Modell und Initialisierungszeitpunkt ergänzen.
|
||||||
|
|
||||||
|
### Explizit nicht Teil
|
||||||
|
- Repository-Fachlogik
|
||||||
|
- Use-Case-Integration
|
||||||
|
- Statusübergänge im Batch-Lauf
|
||||||
|
- KI-bezogene Persistenzfelder
|
||||||
|
- Zielpfad- oder Dateinamenspersistenz
|
||||||
|
|
||||||
|
### Fertig wenn
|
||||||
|
- die SQLite-Datei und das M4-Schema technisch anlegbar sind,
|
||||||
|
- beide Persistenzebenen den M4-Pflichtumfang abbilden,
|
||||||
|
- die Start-Initialisierung technisch vorbereitet ist,
|
||||||
|
- keine M5+-Felder im Schema enthalten sind,
|
||||||
|
- der Stand fehlerfrei buildbar bleibt.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## AP-004 Repository für Dokument-Stammsatz mit vollständigem M4-Minimalumfang implementieren
|
||||||
|
|
||||||
|
### Voraussetzung
|
||||||
|
AP-003 ist abgeschlossen.
|
||||||
|
|
||||||
|
### Ziel
|
||||||
|
Der Dokument-Stammsatz kann pro Fingerprint zuverlässig gelesen, angelegt und fortgeschrieben werden, ohne fachliche Entscheidungslogik in den Adapter-Out zu verlagern.
|
||||||
|
|
||||||
|
### Muss umgesetzt werden
|
||||||
|
- Repository-Adapter für den Dokument-Stammsatz implementieren.
|
||||||
|
- Folgende technischen Fähigkeiten bereitstellen:
|
||||||
|
- Suche eines Stammsatzes über Fingerprint
|
||||||
|
- Neuanlage eines Stammsatzes für bisher unbekannte Dokumente
|
||||||
|
- Fortschreibung von:
|
||||||
|
- letztem bekanntem Quellpfad
|
||||||
|
- letztem bekanntem Quelldateinamen
|
||||||
|
- Gesamtstatus
|
||||||
|
- Inhaltsfehlerzähler
|
||||||
|
- Transientfehlerzähler
|
||||||
|
- letztem Fehlerzeitpunkt
|
||||||
|
- letztem Erfolgzeitpunkt
|
||||||
|
- Änderungszeitpunkt
|
||||||
|
- Sicherstellen, dass die Repository-Operationen **keine** fachlichen Entscheidungen über Retry-Regeln oder Skip-Logik treffen.
|
||||||
|
- Mapping zwischen Application-Typen und SQLite-Struktur explizit und nachvollziehbar halten.
|
||||||
|
- Upsert-/Neuanlageverhalten für den M4-Einzelprozess reproduzierbar modellieren.
|
||||||
|
- JavaDoc für Verantwortlichkeit und Mapping ergänzen.
|
||||||
|
|
||||||
|
### Explizit nicht Teil
|
||||||
|
- Versuchshistorie
|
||||||
|
- Batch-Skip-Logik
|
||||||
|
- Versuchsnummernvergabe
|
||||||
|
- konkrete Statusentscheidungen im Use-Case
|
||||||
|
- KI- oder Zielkopie-bezogene Persistenz
|
||||||
|
|
||||||
|
### Fertig wenn
|
||||||
|
- der Dokument-Stammsatz pro Fingerprint zuverlässig gelesen und geschrieben werden kann,
|
||||||
|
- alle M4-Pflichtfelder des Stammsatzes technisch fortschreibbar sind,
|
||||||
|
- fachliche Entscheidungen nicht in das Repository abgerutscht sind,
|
||||||
|
- der Build weiterhin fehlerfrei ist.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## AP-005 Repository für Versuchshistorie mit monotoner Versuchsnummer implementieren
|
||||||
|
|
||||||
|
### Voraussetzung
|
||||||
|
AP-003 ist abgeschlossen.
|
||||||
|
|
||||||
|
### Ziel
|
||||||
|
Jeder historisierbare dokumentbezogene M4-Versuch kann separat und nachvollziehbar persistiert werden.
|
||||||
|
|
||||||
|
### Muss umgesetzt werden
|
||||||
|
- Repository-Adapter für die Versuchshistorie implementieren.
|
||||||
|
- Schreiben genau eines Versuchseintrags pro historisiertem dokumentbezogenem M4-Versuch umsetzen.
|
||||||
|
- Lesefähigkeiten bereitstellen, soweit sie für M4-Use-Case und Tests benötigt werden.
|
||||||
|
- Versuchsnummern pro Fingerprint reproduzierbar ableiten oder fortschreiben.
|
||||||
|
- Sicherstellen, dass die Versuchsnummer:
|
||||||
|
- bei **1** beginnt,
|
||||||
|
- pro Fingerprint monoton steigt,
|
||||||
|
- auch bei Skip-Versuchen mitgezählt wird.
|
||||||
|
- M4-relevante Historisierungsdaten persistieren:
|
||||||
|
- Fingerprint-Referenz
|
||||||
|
- Lauf-ID
|
||||||
|
- Versuchsnummer
|
||||||
|
- Startzeitpunkt
|
||||||
|
- Endzeitpunkt
|
||||||
|
- Ergebnisstatus
|
||||||
|
- Fehlerklasse
|
||||||
|
- Fehlermeldung bzw. Begründung
|
||||||
|
- Retryable-Flag
|
||||||
|
- Sicherstellen, dass nur **identifizierte** Dokumente historisiert werden.
|
||||||
|
- JavaDoc für Historisierungszweck, Versuchsnummernlogik und M4-Grenzen ergänzen.
|
||||||
|
|
||||||
|
### Explizit nicht Teil
|
||||||
|
- Dokument-Stammsatz
|
||||||
|
- fachliche Zählerlogik
|
||||||
|
- Batch-Orchestrierung
|
||||||
|
- KI-Rohantwort, Modellname oder Prompt-Identifikator
|
||||||
|
- Zielname, Zielpfad oder Zielkopie
|
||||||
|
|
||||||
|
### Fertig wenn
|
||||||
|
- pro historisiertem dokumentbezogenem Verarbeitungsvorgang ein separater Versuchseintrag gespeichert werden kann,
|
||||||
|
- die Versuchsnummern pro Fingerprint reproduzierbar und monoton sind,
|
||||||
|
- Skip-Versuche historisierbar sind,
|
||||||
|
- Vor-Fingerprint-Fehler nicht fälschlich historisiert werden,
|
||||||
|
- der Stand fehlerfrei buildbar bleibt.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## AP-006 M4-Entscheidungslogik und Batch-Integration für Idempotenz, Zähler und konsistente Persistenz umsetzen
|
||||||
|
|
||||||
|
### Voraussetzung
|
||||||
|
AP-001 bis AP-005 sind abgeschlossen.
|
||||||
|
|
||||||
|
### Ziel
|
||||||
|
Der bestehende M3-Verarbeitungslauf wird zu einem echten M4-Lauf erweitert, der Dokumente über Fingerprint wiedererkennt, Status und Zähler korrekt fortschreibt, Skip-Fälle historisiert und dabei keinen inkonsistenten Persistenzzustand hinterlässt.
|
||||||
|
|
||||||
|
### Muss umgesetzt werden
|
||||||
|
- Den bestehenden Batch-Use-Case so erweitern, dass pro Verarbeitungskandidat verbindlich diese Reihenfolge gilt:
|
||||||
|
1. Fingerprint erzeugen
|
||||||
|
2. Dokument-Stammsatz laden
|
||||||
|
3. terminale Fälle entscheiden
|
||||||
|
4. gegebenenfalls bestehenden M3-Ablauf ausführen
|
||||||
|
5. Ergebnis in M4-Status, Zähler und Retryable-Flag überführen
|
||||||
|
6. Versuch historisieren
|
||||||
|
7. Dokument-Stammsatz fortschreiben
|
||||||
|
- Folgende M4-Regeln explizit umsetzen:
|
||||||
|
- vorhandener Gesamtstatus `SUCCESS` → Dokument wird nicht erneut fachlich verarbeitet, sondern mit `SKIPPED_ALREADY_PROCESSED` historisiert
|
||||||
|
- vorhandener Gesamtstatus `FAILED_FINAL` → Dokument wird nicht erneut fachlich verarbeitet, sondern mit `SKIPPED_FINAL_FAILURE` historisiert
|
||||||
|
- unbekanntes oder noch nicht terminales Dokument wird regulär weiterverarbeitet
|
||||||
|
- M3-Ergebnisse exakt wie folgt in M4 überführen:
|
||||||
|
- M3 erfolgreich abgeschlossen → `SUCCESS`, keine Fehlerzähler erhöhen, `retryable = false`
|
||||||
|
- M3-Inhaltsfehler „kein brauchbarer Text“ oder „Seitenlimit überschritten“ beim ersten historisierten Auftreten → `FAILED_RETRYABLE`, Inhaltsfehlerzähler +1, `retryable = true`
|
||||||
|
- derselbe Dokumenttyp eines bereits identifizierten Dokuments mit erneutem deterministischen Inhaltsfehler in einem späteren Lauf → `FAILED_FINAL`, Inhaltsfehlerzähler +1, `retryable = false`
|
||||||
|
- dokumentbezogener technischer Fehler nach erfolgreicher Fingerprint-Ermittlung → `FAILED_RETRYABLE`, Transientfehlerzähler +1, `retryable = true`
|
||||||
|
- Skip-Fälle so behandeln, dass:
|
||||||
|
- ein eigener Versuchseintrag geschrieben wird,
|
||||||
|
- kein Fehlerzähler verändert wird,
|
||||||
|
- der Gesamtstatus des Stammsatzes terminal bestehen bleibt.
|
||||||
|
- Vor-Fingerprint-Fehler ausdrücklich **nicht** als SQLite-Versuch historisieren.
|
||||||
|
- Für identifizierte Dokumente sicherstellen, dass **Historie und Stammsatz konsistent** fortgeschrieben werden und keine inkonsistenten Teilzustände entstehen.
|
||||||
|
- Falls eine dokumentbezogene Persistenzoperation technisch scheitert:
|
||||||
|
- darf kein teilaktualisierter Zustand zurückbleiben,
|
||||||
|
- bleibt der Batch-Lauf für andere Dokumente kontrolliert weiter lauffähig,
|
||||||
|
- wird kein M5+-Verhalten vorweggenommen.
|
||||||
|
- JavaDoc für Idempotenz, Zählerfortschreibung, Skip-Semantik und Persistenzkonsistenz ergänzen.
|
||||||
|
|
||||||
|
### Explizit nicht Teil
|
||||||
|
- KI-Aufruf
|
||||||
|
- Dateinamensbildung
|
||||||
|
- Zielkopie
|
||||||
|
- M5+-Retry-Regeln für KI- oder Zielkopiefehler
|
||||||
|
- M5+-Persistenzfelder
|
||||||
|
- spätere Reporting- oder Auswertungslogik
|
||||||
|
|
||||||
|
### Fertig wenn
|
||||||
|
- der Batch-Lauf identische Inhalte über Fingerprint wiedererkennt,
|
||||||
|
- `SUCCESS`- und `FAILED_FINAL`-Dokumente in späteren Läufen historisiert übersprungen werden,
|
||||||
|
- die Minimalregel „erster deterministischer Inhaltsfehler retryable, zweiter final“ explizit umgesetzt ist,
|
||||||
|
- technische dokumentbezogene Fehler nach Fingerprint als retryable behandelt werden,
|
||||||
|
- Historie und Stammsatz pro identifiziertem Dokument konsistent fortgeschrieben werden,
|
||||||
|
- weiterhin keine M5+-Funktionalität enthalten ist.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## AP-007 Bootstrap- und CLI-Anpassungen für SQLite-Konfiguration, Start-Initialisierung und M4-Verdrahtung durchführen
|
||||||
|
|
||||||
|
### Voraussetzung
|
||||||
|
AP-001 bis AP-006 sind abgeschlossen.
|
||||||
|
|
||||||
|
### Ziel
|
||||||
|
Der Programmeinstieg ist sauber an den M4-Lauf angepasst; die Persistenz wird beim Start initialisiert und die neuen M4-Bausteine sind vollständig verdrahtet.
|
||||||
|
|
||||||
|
### Muss umgesetzt werden
|
||||||
|
- Bootstrap-Verdrahtung auf die neuen M4-Ports, Adapter und Persistenzbausteine erweitern.
|
||||||
|
- M4-relevante Konfiguration ergänzen bzw. verdrahten, insbesondere für:
|
||||||
|
- `sqlite.file`
|
||||||
|
- Startvalidierung so ergänzen, dass mindestens geprüft wird:
|
||||||
|
- SQLite-Dateipfad ist vorhanden oder technisch anlegbar
|
||||||
|
- Persistenzkonfiguration ist nutzbar
|
||||||
|
- Technische Schema-Initialisierung **beim Programmstart** ausführen, bevor der eigentliche Dokumentlauf beginnt.
|
||||||
|
- CLI-/Batch-Startpfad auf den realen M4-Ablauf ausrichten.
|
||||||
|
- Sicherstellen, dass harte Start-, Verdrahtungs- oder Initialisierungsfehler weiterhin zu **Exit-Code 1** führen.
|
||||||
|
- Sicherstellen, dass dokumentbezogene Fehler im späteren Lauf **nicht** als Startfehler fehlmodelliert werden.
|
||||||
|
- M1–M3-Grundverhalten erhalten und sauber mit den M4-Bausteinen kombinieren.
|
||||||
|
- JavaDoc und `package-info` für aktualisierte Verdrahtung, Konfiguration und Modulgrenzen ergänzen.
|
||||||
|
|
||||||
|
### Explizit nicht Teil
|
||||||
|
- neue Exit-Code-Semantik späterer Meilensteine
|
||||||
|
- KI-Verdrahtung
|
||||||
|
- Zielordner- oder Dateinamensverdrahtung
|
||||||
|
- Logging-Feinschliff
|
||||||
|
|
||||||
|
### Fertig wenn
|
||||||
|
- das Programm im M4-Stand vollständig startbar ist,
|
||||||
|
- das SQLite-Schema beim Start kontrolliert initialisiert wird,
|
||||||
|
- die neuen Adapter korrekt verdrahtet sind,
|
||||||
|
- harte Persistenz-Startfehler kontrolliert zu Exit-Code 1 führen,
|
||||||
|
- der Build fehlerfrei bleibt.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## AP-008 Tests für Fingerprint, SQLite-Repositories, M4-Statusfortschreibung, Historie und Skip-Logik vervollständigen
|
||||||
|
|
||||||
|
### Voraussetzung
|
||||||
|
AP-001 bis AP-007 sind abgeschlossen.
|
||||||
|
|
||||||
|
### Ziel
|
||||||
|
Der vollständige M4-Zielzustand wird automatisiert abgesichert und als konsistenter Übergabestand nachgewiesen.
|
||||||
|
|
||||||
|
### Muss umgesetzt werden
|
||||||
|
- Unit-Tests für die SHA-256-Fingerprint-Erzeugung implementieren.
|
||||||
|
- Repository-Tests gegen SQLite implementieren, insbesondere für:
|
||||||
|
- Schema-Initialisierung
|
||||||
|
- Anlegen und Lesen eines Dokument-Stammsatzes
|
||||||
|
- Fortschreiben aller M4-Pflichtfelder des Stammsatzes
|
||||||
|
- Anlegen und Lesen von Versuchshistorie
|
||||||
|
- stabile Versuchsnummern pro Fingerprint
|
||||||
|
- Tests für M4-Statusfortschreibung und Zähler ergänzen, insbesondere:
|
||||||
|
- unbekanntes Dokument mit erfolgreichem M4-Ende wird als `SUCCESS` persistiert
|
||||||
|
- erster deterministischer Inhaltsfehler führt zu `FAILED_RETRYABLE`
|
||||||
|
- zweiter deterministischer Inhaltsfehler in einem späteren Lauf führt zu `FAILED_FINAL`
|
||||||
|
- technischer dokumentbezogener Fehler nach erfolgreicher Fingerprint-Ermittlung erhöht den Transientfehlerzähler und bleibt `FAILED_RETRYABLE`
|
||||||
|
- Skip-Fälle verändern keine Fehlerzähler
|
||||||
|
- Tests für Idempotenz- und Skip-Logik ergänzen, insbesondere:
|
||||||
|
- bereits erfolgreiches Dokument wird historisiert übersprungen
|
||||||
|
- final fehlgeschlagenes Dokument wird historisiert übersprungen
|
||||||
|
- gleicher Inhalt unter anderem Dateinamen wird über denselben Fingerprint erkannt
|
||||||
|
- Tests ergänzen, die belegen:
|
||||||
|
- pro identifiziertem dokumentbezogenem Verarbeitungsvorgang entsteht genau **ein** Historieneintrag
|
||||||
|
- Skip-Ereignisse werden historisiert
|
||||||
|
- Vor-Fingerprint-Fehler nicht in SQLite-Historie auftauchen
|
||||||
|
- Tests für Bootstrap- und Startverhalten ergänzen, insbesondere:
|
||||||
|
- Schema-Initialisierung beim Start
|
||||||
|
- harter Persistenz-Startfehler führt zu Exit-Code 1
|
||||||
|
- Den M4-Stand abschließend auf Konsistenz, Architekturtreue und Nicht-Vorgriff auf M5+ prüfen.
|
||||||
|
|
||||||
|
### Explizit nicht Teil
|
||||||
|
- Tests für KI, Prompt-Laden oder KI-JSON
|
||||||
|
- Tests für Zielkopie oder Dateinamensbildung
|
||||||
|
- Tests für M5+-Persistenzfelder
|
||||||
|
- Tests für vollständige Retry-Logik späterer Meilensteine
|
||||||
|
|
||||||
|
### Fertig wenn
|
||||||
|
- die Test-Suite für den M4-Umfang grün ist,
|
||||||
|
- die wichtigsten M4-Randfälle automatisiert abgesichert sind,
|
||||||
|
- der definierte M4-Zielzustand vollständig erreicht ist,
|
||||||
|
- ein fehlerfreier, übergabefähiger Stand vorliegt.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Abschlussbewertung
|
||||||
|
|
||||||
|
Die Arbeitspakete decken den vollständigen M4-Zielumfang aus den verbindlichen Spezifikationen ab:
|
||||||
|
|
||||||
|
- Fingerprint über SHA-256
|
||||||
|
- SQLite-Persistenz in zwei Ebenen
|
||||||
|
- Dokument-Stammsatz mit M4-Minimalumfang
|
||||||
|
- Versuchshistorie pro identifiziertem dokumentbezogenem Versuch
|
||||||
|
- Idempotenz über Fingerprint
|
||||||
|
- Skip-Regeln für bereits erfolgreiche und final fehlgeschlagene Dokumente
|
||||||
|
- explizite Minimalregel für deterministische Inhaltsfehler in M4
|
||||||
|
- Tests für Fingerprint, Persistenz, Statusfortschreibung, Historie und Skip-Logik
|
||||||
|
|
||||||
|
Gleichzeitig bleiben die Grenzen zu M1–M3 sowie zu M5+ gewahrt. Insbesondere werden **keine** KI-Funktionalitäten, **keine** Dateinamensbildung und **keine** Zielkopie vorweggenommen.
|
||||||
@@ -0,0 +1,30 @@
|
|||||||
|
package de.gecheckt.pdf.umbenenner.application.port.out;
|
||||||
|
|
||||||
|
import java.util.Objects;
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Lookup result indicating that a master record exists and the document is not yet terminal.
|
||||||
|
* <p>
|
||||||
|
* The document is known (fingerprint exists in the persistence store) but its overall
|
||||||
|
* status is neither {@link de.gecheckt.pdf.umbenenner.domain.model.ProcessingStatus#SUCCESS}
|
||||||
|
* nor {@link de.gecheckt.pdf.umbenenner.domain.model.ProcessingStatus#FAILED_FINAL}.
|
||||||
|
* The use case may continue with normal M4 processing using the provided record.
|
||||||
|
* <p>
|
||||||
|
* The existing {@link DocumentRecord} is supplied so the use case can inspect the
|
||||||
|
* current status, failure counters, and other fields required to apply M4 retry rules
|
||||||
|
* without an additional lookup.
|
||||||
|
*
|
||||||
|
* @param record the current master record for this document; never null
|
||||||
|
* @since M4-AP-001
|
||||||
|
*/
|
||||||
|
public record DocumentKnownProcessable(DocumentRecord record) implements DocumentRecordLookupResult {
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Compact constructor validating the non-null contract.
|
||||||
|
*
|
||||||
|
* @throws NullPointerException if {@code record} is null
|
||||||
|
*/
|
||||||
|
public DocumentKnownProcessable {
|
||||||
|
Objects.requireNonNull(record, "record must not be null");
|
||||||
|
}
|
||||||
|
}
|
||||||
@@ -0,0 +1,48 @@
|
|||||||
|
package de.gecheckt.pdf.umbenenner.application.port.out;
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Unchecked exception thrown by persistence write operations when a technical
|
||||||
|
* infrastructure failure prevents the operation from completing.
|
||||||
|
* <p>
|
||||||
|
* This exception is thrown by {@link DocumentRecordRepository} and
|
||||||
|
* {@link ProcessingAttemptRepository} write methods, and by
|
||||||
|
* {@link PersistenceSchemaInitializationPort#initializeSchema()}, when the underlying
|
||||||
|
* persistence layer (SQLite) cannot be reached or returns an unrecoverable error.
|
||||||
|
* <p>
|
||||||
|
* <strong>Batch run impact:</strong>
|
||||||
|
* <ul>
|
||||||
|
* <li>If thrown during <em>schema initialisation</em> at startup, the run must abort
|
||||||
|
* with exit code 1.</li>
|
||||||
|
* <li>If thrown during <em>per-document write operations</em>, the current candidate
|
||||||
|
* is treated as a transient failure; the batch run continues with the remaining
|
||||||
|
* candidates.</li>
|
||||||
|
* </ul>
|
||||||
|
* <p>
|
||||||
|
* The exception is <em>not</em> used for read operations; read failures are modelled
|
||||||
|
* as {@link PersistenceLookupTechnicalFailure} in the sealed
|
||||||
|
* {@link DocumentRecordLookupResult} hierarchy to allow exhaustive pattern matching
|
||||||
|
* at the call site.
|
||||||
|
*
|
||||||
|
* @since M4-AP-001
|
||||||
|
*/
|
||||||
|
public class DocumentPersistenceException extends RuntimeException {
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Constructs a new {@code DocumentPersistenceException} with the given message.
|
||||||
|
*
|
||||||
|
* @param message human-readable description of the persistence failure
|
||||||
|
*/
|
||||||
|
public DocumentPersistenceException(String message) {
|
||||||
|
super(message);
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Constructs a new {@code DocumentPersistenceException} with message and cause.
|
||||||
|
*
|
||||||
|
* @param message human-readable description of the persistence failure
|
||||||
|
* @param cause the underlying throwable that caused this failure
|
||||||
|
*/
|
||||||
|
public DocumentPersistenceException(String message, Throwable cause) {
|
||||||
|
super(message, cause);
|
||||||
|
}
|
||||||
|
}
|
||||||
@@ -0,0 +1,83 @@
|
|||||||
|
package de.gecheckt.pdf.umbenenner.application.port.out;
|
||||||
|
|
||||||
|
import de.gecheckt.pdf.umbenenner.domain.model.DocumentFingerprint;
|
||||||
|
import de.gecheckt.pdf.umbenenner.domain.model.ProcessingStatus;
|
||||||
|
import de.gecheckt.pdf.umbenenner.domain.model.SourceDocumentLocator;
|
||||||
|
|
||||||
|
import java.time.Instant;
|
||||||
|
import java.util.Objects;
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Application-facing representation of the document master record (Dokument-Stammsatz).
|
||||||
|
* <p>
|
||||||
|
* One {@code DocumentRecord} exists per unique {@link DocumentFingerprint}. It carries
|
||||||
|
* the current overall status, failure counters, and the most recently known source
|
||||||
|
* location of the document.
|
||||||
|
* <p>
|
||||||
|
* <strong>Architecture boundary:</strong> This type contains no SQLite or JDBC types.
|
||||||
|
* Mapping between {@code DocumentRecord} and the persistence layer is performed
|
||||||
|
* exclusively by the repository adapter in {@code adapter-out}.
|
||||||
|
* <p>
|
||||||
|
* <strong>M4 field semantics:</strong>
|
||||||
|
* <ul>
|
||||||
|
* <li>{@link #fingerprint()} — primary identity; never changes for a given record.</li>
|
||||||
|
* <li>{@link #lastKnownSourceLocator()} — opaque locator used by adapters; the
|
||||||
|
* application passes it through without interpreting the value.</li>
|
||||||
|
* <li>{@link #lastKnownSourceFileName()} — human-readable file name for logging and
|
||||||
|
* diagnostics; not used for identity.</li>
|
||||||
|
* <li>{@link #overallStatus()} — the current terminal or active status of the document
|
||||||
|
* across all runs. See {@link ProcessingStatus} for semantics.</li>
|
||||||
|
* <li>{@link #failureCounters()} — independent counters for content and transient errors;
|
||||||
|
* never increased by skip events.</li>
|
||||||
|
* <li>{@link #lastFailureInstant()} — timestamp of the most recent failure; {@code null}
|
||||||
|
* if no failure has been recorded yet.</li>
|
||||||
|
* <li>{@link #lastSuccessInstant()} — timestamp of the successful processing; {@code null}
|
||||||
|
* if the document has never been processed successfully.</li>
|
||||||
|
* <li>{@link #createdAt()} — timestamp when this master record was first created.</li>
|
||||||
|
* <li>{@link #updatedAt()} — timestamp of the most recent update to this master record.</li>
|
||||||
|
* </ul>
|
||||||
|
* <p>
|
||||||
|
* <strong>Not included in M4:</strong> target path, target file name, AI-related fields.
|
||||||
|
* These are added in later milestones.
|
||||||
|
*
|
||||||
|
* @param fingerprint content-based identity; never null
|
||||||
|
* @param lastKnownSourceLocator opaque locator to the physical source file; never null
|
||||||
|
* @param lastKnownSourceFileName file name at the time of the last known access; never null or blank
|
||||||
|
* @param overallStatus current processing status; never null
|
||||||
|
* @param failureCounters counters for content and transient errors; never null
|
||||||
|
* @param lastFailureInstant timestamp of the most recent failure, or {@code null}
|
||||||
|
* @param lastSuccessInstant timestamp of the successful processing, or {@code null}
|
||||||
|
* @param createdAt timestamp when this record was first created; never null
|
||||||
|
* @param updatedAt timestamp of the most recent update; never null
|
||||||
|
* @since M4-AP-001
|
||||||
|
*/
|
||||||
|
public record DocumentRecord(
|
||||||
|
DocumentFingerprint fingerprint,
|
||||||
|
SourceDocumentLocator lastKnownSourceLocator,
|
||||||
|
String lastKnownSourceFileName,
|
||||||
|
ProcessingStatus overallStatus,
|
||||||
|
FailureCounters failureCounters,
|
||||||
|
Instant lastFailureInstant,
|
||||||
|
Instant lastSuccessInstant,
|
||||||
|
Instant createdAt,
|
||||||
|
Instant updatedAt) {
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Compact constructor validating mandatory non-null fields.
|
||||||
|
*
|
||||||
|
* @throws NullPointerException if any mandatory field is null
|
||||||
|
* @throws IllegalArgumentException if {@code lastKnownSourceFileName} is blank
|
||||||
|
*/
|
||||||
|
public DocumentRecord {
|
||||||
|
Objects.requireNonNull(fingerprint, "fingerprint must not be null");
|
||||||
|
Objects.requireNonNull(lastKnownSourceLocator, "lastKnownSourceLocator must not be null");
|
||||||
|
Objects.requireNonNull(lastKnownSourceFileName, "lastKnownSourceFileName must not be null");
|
||||||
|
if (lastKnownSourceFileName.isBlank()) {
|
||||||
|
throw new IllegalArgumentException("lastKnownSourceFileName must not be blank");
|
||||||
|
}
|
||||||
|
Objects.requireNonNull(overallStatus, "overallStatus must not be null");
|
||||||
|
Objects.requireNonNull(failureCounters, "failureCounters must not be null");
|
||||||
|
Objects.requireNonNull(createdAt, "createdAt must not be null");
|
||||||
|
Objects.requireNonNull(updatedAt, "updatedAt must not be null");
|
||||||
|
}
|
||||||
|
}
|
||||||
@@ -0,0 +1,32 @@
|
|||||||
|
package de.gecheckt.pdf.umbenenner.application.port.out;
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Sealed result type for a document master record lookup via {@link DocumentRecordRepository}.
|
||||||
|
* <p>
|
||||||
|
* The use case uses this result to make the per-document processing decision in M4
|
||||||
|
* without additional assumptions:
|
||||||
|
* <ul>
|
||||||
|
* <li>{@link DocumentUnknown} — the fingerprint is not yet in the persistence store;
|
||||||
|
* the document must be processed for the first time.</li>
|
||||||
|
* <li>{@link DocumentKnownProcessable} — a master record exists but the document is
|
||||||
|
* not in a terminal state; normal processing may continue.</li>
|
||||||
|
* <li>{@link DocumentTerminalSuccess} — the document was already processed
|
||||||
|
* successfully; skip with {@link de.gecheckt.pdf.umbenenner.domain.model.ProcessingStatus#SKIPPED_ALREADY_PROCESSED}.</li>
|
||||||
|
* <li>{@link DocumentTerminalFinalFailure} — the document has finally failed; skip
|
||||||
|
* with {@link de.gecheckt.pdf.umbenenner.domain.model.ProcessingStatus#SKIPPED_FINAL_FAILURE}.</li>
|
||||||
|
* <li>{@link PersistenceLookupTechnicalFailure} — the lookup itself failed due to a
|
||||||
|
* technical infrastructure problem; the document cannot be processed in this run.</li>
|
||||||
|
* </ul>
|
||||||
|
* <p>
|
||||||
|
* <strong>Architecture boundary:</strong> No JDBC, SQLite, or filesystem types appear
|
||||||
|
* in this sealed hierarchy or in any of its implementations.
|
||||||
|
*
|
||||||
|
* @since M4-AP-001
|
||||||
|
*/
|
||||||
|
public sealed interface DocumentRecordLookupResult
|
||||||
|
permits DocumentUnknown,
|
||||||
|
DocumentKnownProcessable,
|
||||||
|
DocumentTerminalSuccess,
|
||||||
|
DocumentTerminalFinalFailure,
|
||||||
|
PersistenceLookupTechnicalFailure {
|
||||||
|
}
|
||||||
@@ -0,0 +1,72 @@
|
|||||||
|
package de.gecheckt.pdf.umbenenner.application.port.out;
|
||||||
|
|
||||||
|
import de.gecheckt.pdf.umbenenner.domain.model.DocumentFingerprint;
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Outbound port for reading and writing the document master record (Dokument-Stammsatz).
|
||||||
|
* <p>
|
||||||
|
* One master record exists per unique {@link DocumentFingerprint}. The repository is
|
||||||
|
* responsible for the persistence of {@link DocumentRecord} values; it holds no
|
||||||
|
* business logic about retry rules, skip decisions, or status transitions.
|
||||||
|
* <p>
|
||||||
|
* <strong>Lookup semantics:</strong>
|
||||||
|
* {@link #findByFingerprint(DocumentFingerprint)} returns a sealed
|
||||||
|
* {@link DocumentRecordLookupResult} that allows the use case to distinguish exhaustively
|
||||||
|
* between an unknown document, a known processable document, a terminal success, a
|
||||||
|
* terminal final failure, and a technical persistence failure — without additional
|
||||||
|
* assumptions or null checks.
|
||||||
|
* <p>
|
||||||
|
* <strong>Write semantics:</strong>
|
||||||
|
* <ul>
|
||||||
|
* <li>{@link #create(DocumentRecord)} inserts a new record for a previously unknown
|
||||||
|
* document.</li>
|
||||||
|
* <li>{@link #update(DocumentRecord)} replaces the mutable fields of an existing
|
||||||
|
* record identified by its fingerprint.</li>
|
||||||
|
* </ul>
|
||||||
|
* Both write methods throw {@link DocumentPersistenceException} on technical failure.
|
||||||
|
* <p>
|
||||||
|
* <strong>Architecture boundary:</strong> No JDBC, SQLite, or filesystem types appear
|
||||||
|
* in this interface or in any type it references. Mapping to and from the persistence
|
||||||
|
* schema is the exclusive responsibility of the adapter implementation.
|
||||||
|
*
|
||||||
|
* @since M4-AP-001
|
||||||
|
*/
|
||||||
|
public interface DocumentRecordRepository {
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Looks up the master record for the given fingerprint.
|
||||||
|
* <p>
|
||||||
|
* Returns a {@link DocumentRecordLookupResult} that encodes all possible outcomes
|
||||||
|
* including technical failures; this method never throws.
|
||||||
|
*
|
||||||
|
* @param fingerprint the content-based document identity to look up; must not be null
|
||||||
|
* @return {@link DocumentUnknown} if no record exists,
|
||||||
|
* {@link DocumentKnownProcessable} if the document is known but not terminal,
|
||||||
|
* {@link DocumentTerminalSuccess} if the document succeeded,
|
||||||
|
* {@link DocumentTerminalFinalFailure} if the document finally failed, or
|
||||||
|
* {@link PersistenceLookupTechnicalFailure} if the lookup itself failed
|
||||||
|
*/
|
||||||
|
DocumentRecordLookupResult findByFingerprint(DocumentFingerprint fingerprint);
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Persists a new master record for a previously unknown document.
|
||||||
|
* <p>
|
||||||
|
* The fingerprint within {@code record} must not yet exist in the persistence store.
|
||||||
|
*
|
||||||
|
* @param record the new master record to persist; must not be null
|
||||||
|
* @throws DocumentPersistenceException if the insert fails due to a technical error
|
||||||
|
*/
|
||||||
|
void create(DocumentRecord record);
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Updates the mutable fields of an existing master record.
|
||||||
|
* <p>
|
||||||
|
* The record is identified by its {@link DocumentFingerprint}; the fingerprint
|
||||||
|
* itself is never changed. Mutable fields include the overall status, failure
|
||||||
|
* counters, last known source location, and all timestamp fields.
|
||||||
|
*
|
||||||
|
* @param record the updated master record; must not be null; fingerprint must exist
|
||||||
|
* @throws DocumentPersistenceException if the update fails due to a technical error
|
||||||
|
*/
|
||||||
|
void update(DocumentRecord record);
|
||||||
|
}
|
||||||
@@ -0,0 +1,30 @@
|
|||||||
|
package de.gecheckt.pdf.umbenenner.application.port.out;
|
||||||
|
|
||||||
|
import java.util.Objects;
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Lookup result indicating that the document has finally and irrecoverably failed.
|
||||||
|
* <p>
|
||||||
|
* The master record's overall status is
|
||||||
|
* {@link de.gecheckt.pdf.umbenenner.domain.model.ProcessingStatus#FAILED_FINAL}.
|
||||||
|
* The use case must skip further processing and historise a
|
||||||
|
* {@link de.gecheckt.pdf.umbenenner.domain.model.ProcessingStatus#SKIPPED_FINAL_FAILURE}
|
||||||
|
* attempt. No failure counters are changed.
|
||||||
|
* <p>
|
||||||
|
* The existing {@link DocumentRecord} is supplied so the use case can read the
|
||||||
|
* current record for the skip attempt historisation without an additional lookup.
|
||||||
|
*
|
||||||
|
* @param record the current (finally failed) master record for this document; never null
|
||||||
|
* @since M4-AP-001
|
||||||
|
*/
|
||||||
|
public record DocumentTerminalFinalFailure(DocumentRecord record) implements DocumentRecordLookupResult {
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Compact constructor validating the non-null contract.
|
||||||
|
*
|
||||||
|
* @throws NullPointerException if {@code record} is null
|
||||||
|
*/
|
||||||
|
public DocumentTerminalFinalFailure {
|
||||||
|
Objects.requireNonNull(record, "record must not be null");
|
||||||
|
}
|
||||||
|
}
|
||||||
@@ -0,0 +1,30 @@
|
|||||||
|
package de.gecheckt.pdf.umbenenner.application.port.out;
|
||||||
|
|
||||||
|
import java.util.Objects;
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Lookup result indicating that the document was already successfully processed.
|
||||||
|
* <p>
|
||||||
|
* The master record's overall status is
|
||||||
|
* {@link de.gecheckt.pdf.umbenenner.domain.model.ProcessingStatus#SUCCESS}.
|
||||||
|
* The use case must skip further processing and historise a
|
||||||
|
* {@link de.gecheckt.pdf.umbenenner.domain.model.ProcessingStatus#SKIPPED_ALREADY_PROCESSED}
|
||||||
|
* attempt. No failure counters are changed.
|
||||||
|
* <p>
|
||||||
|
* The existing {@link DocumentRecord} is supplied so the use case can read the
|
||||||
|
* current record for the skip attempt historisation without an additional lookup.
|
||||||
|
*
|
||||||
|
* @param record the current (successful) master record for this document; never null
|
||||||
|
* @since M4-AP-001
|
||||||
|
*/
|
||||||
|
public record DocumentTerminalSuccess(DocumentRecord record) implements DocumentRecordLookupResult {
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Compact constructor validating the non-null contract.
|
||||||
|
*
|
||||||
|
* @throws NullPointerException if {@code record} is null
|
||||||
|
*/
|
||||||
|
public DocumentTerminalSuccess {
|
||||||
|
Objects.requireNonNull(record, "record must not be null");
|
||||||
|
}
|
||||||
|
}
|
||||||
@@ -0,0 +1,14 @@
|
|||||||
|
package de.gecheckt.pdf.umbenenner.application.port.out;
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Lookup result indicating that the fingerprint is not yet present in the persistence store.
|
||||||
|
* <p>
|
||||||
|
* The document has never been processed before. The use case must create a new
|
||||||
|
* {@link DocumentRecord} and proceed with normal M4 processing.
|
||||||
|
* <p>
|
||||||
|
* This variant carries no data because there is no existing record to return.
|
||||||
|
*
|
||||||
|
* @since M4-AP-001
|
||||||
|
*/
|
||||||
|
public record DocumentUnknown() implements DocumentRecordLookupResult {
|
||||||
|
}
|
||||||
@@ -0,0 +1,75 @@
|
|||||||
|
package de.gecheckt.pdf.umbenenner.application.port.out;
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Immutable snapshot of the two independent failure counters maintained per document.
|
||||||
|
* <p>
|
||||||
|
* M4 tracks two distinct counters separately because they drive different retry rules:
|
||||||
|
* <ul>
|
||||||
|
* <li><strong>Content error counter</strong> ({@link #contentErrorCount()}):
|
||||||
|
* counts how many times a deterministic content error occurred for this document
|
||||||
|
* (no usable text, page limit exceeded). At count 1 the document is
|
||||||
|
* {@code FAILED_RETRYABLE}; at count 2 it becomes {@code FAILED_FINAL}.
|
||||||
|
* Skip events do <em>not</em> increase this counter.</li>
|
||||||
|
* <li><strong>Transient error counter</strong> ({@link #transientErrorCount()}):
|
||||||
|
* counts how many times a technical infrastructure error occurred after a
|
||||||
|
* successful fingerprint was computed. The document remains
|
||||||
|
* {@code FAILED_RETRYABLE} until the configured maximum is reached in later
|
||||||
|
* milestones. Skip events do <em>not</em> increase this counter.</li>
|
||||||
|
* </ul>
|
||||||
|
* <p>
|
||||||
|
* A freshly discovered document starts with both counters at zero.
|
||||||
|
* Counters are only written by the repository layer on the instructions of the
|
||||||
|
* application use case; they never change as a side-effect of a read operation.
|
||||||
|
*
|
||||||
|
* @param contentErrorCount number of deterministic content errors recorded so far;
|
||||||
|
* must be >= 0
|
||||||
|
* @param transientErrorCount number of transient technical errors recorded so far;
|
||||||
|
* must be >= 0
|
||||||
|
* @since M4-AP-001
|
||||||
|
*/
|
||||||
|
public record FailureCounters(int contentErrorCount, int transientErrorCount) {
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Compact constructor validating that neither counter is negative.
|
||||||
|
*
|
||||||
|
* @throws IllegalArgumentException if either counter is negative
|
||||||
|
*/
|
||||||
|
public FailureCounters {
|
||||||
|
if (contentErrorCount < 0) {
|
||||||
|
throw new IllegalArgumentException(
|
||||||
|
"contentErrorCount must be >= 0, but was: " + contentErrorCount);
|
||||||
|
}
|
||||||
|
if (transientErrorCount < 0) {
|
||||||
|
throw new IllegalArgumentException(
|
||||||
|
"transientErrorCount must be >= 0, but was: " + transientErrorCount);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Returns a {@code FailureCounters} instance with both counters at zero.
|
||||||
|
* Use this when initialising a master record for a newly discovered document.
|
||||||
|
*
|
||||||
|
* @return zero-value counters
|
||||||
|
*/
|
||||||
|
public static FailureCounters zero() {
|
||||||
|
return new FailureCounters(0, 0);
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Returns a copy with the content error counter incremented by one.
|
||||||
|
*
|
||||||
|
* @return new instance with {@code contentErrorCount + 1}
|
||||||
|
*/
|
||||||
|
public FailureCounters withIncrementedContentErrorCount() {
|
||||||
|
return new FailureCounters(contentErrorCount + 1, transientErrorCount);
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Returns a copy with the transient error counter incremented by one.
|
||||||
|
*
|
||||||
|
* @return new instance with {@code transientErrorCount + 1}
|
||||||
|
*/
|
||||||
|
public FailureCounters withIncrementedTransientErrorCount() {
|
||||||
|
return new FailureCounters(contentErrorCount, transientErrorCount + 1);
|
||||||
|
}
|
||||||
|
}
|
||||||
@@ -0,0 +1,40 @@
|
|||||||
|
package de.gecheckt.pdf.umbenenner.application.port.out;
|
||||||
|
|
||||||
|
import de.gecheckt.pdf.umbenenner.domain.model.SourceDocumentCandidate;
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Outbound port for computing the content-based fingerprint of exactly one
|
||||||
|
* processing candidate.
|
||||||
|
* <p>
|
||||||
|
* Implementations must derive the fingerprint <em>exclusively</em> from the binary
|
||||||
|
* content of the file referenced by the candidate. File name, path, and metadata must
|
||||||
|
* not influence the result.
|
||||||
|
* <p>
|
||||||
|
* <strong>Architecture boundary:</strong> All hashing logic and file I/O are confined
|
||||||
|
* to the {@code adapter-out} implementation. This interface exposes no
|
||||||
|
* {@code java.nio.file.Path}, {@code java.io.File}, or cryptographic types to Domain
|
||||||
|
* or Application.
|
||||||
|
* <p>
|
||||||
|
* <strong>Failure semantics:</strong> Technical failures (unreadable file, I/O error)
|
||||||
|
* are returned as {@link FingerprintTechnicalError} rather than thrown as exceptions.
|
||||||
|
* A {@link FingerprintTechnicalError} result means no
|
||||||
|
* {@link de.gecheckt.pdf.umbenenner.domain.model.DocumentFingerprint} is available
|
||||||
|
* and the candidate cannot be identified; consequently no SQLite attempt record is
|
||||||
|
* created for this candidate in M4.
|
||||||
|
*
|
||||||
|
* @since M4-AP-001
|
||||||
|
*/
|
||||||
|
public interface FingerprintPort {
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Computes the fingerprint for the given candidate.
|
||||||
|
* <p>
|
||||||
|
* This method never throws. All outcomes, including technical failures, are
|
||||||
|
* encoded in the returned {@link FingerprintResult}.
|
||||||
|
*
|
||||||
|
* @param candidate the candidate whose file content is to be hashed; must not be null
|
||||||
|
* @return {@link FingerprintSuccess} on success, or {@link FingerprintTechnicalError}
|
||||||
|
* on any infrastructure failure
|
||||||
|
*/
|
||||||
|
FingerprintResult computeFingerprint(SourceDocumentCandidate candidate);
|
||||||
|
}
|
||||||
@@ -0,0 +1,20 @@
|
|||||||
|
package de.gecheckt.pdf.umbenenner.application.port.out;
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Sealed result type for a fingerprint computation attempt via {@link FingerprintPort}.
|
||||||
|
* <p>
|
||||||
|
* Exhaustive variants:
|
||||||
|
* <ul>
|
||||||
|
* <li>{@link FingerprintSuccess} — fingerprint computed successfully.</li>
|
||||||
|
* <li>{@link FingerprintTechnicalError} — fingerprint computation failed due to a
|
||||||
|
* technical infrastructure problem (e.g. I/O error, file no longer accessible).</li>
|
||||||
|
* </ul>
|
||||||
|
* <p>
|
||||||
|
* <strong>Historisation impact:</strong> If the result is {@link FingerprintTechnicalError},
|
||||||
|
* the document cannot be identified and <em>no</em> SQLite attempt record is created.
|
||||||
|
* The failure is treated as a non-identifiable run event.
|
||||||
|
*
|
||||||
|
* @since M4-AP-001
|
||||||
|
*/
|
||||||
|
public sealed interface FingerprintResult permits FingerprintSuccess, FingerprintTechnicalError {
|
||||||
|
}
|
||||||
@@ -0,0 +1,27 @@
|
|||||||
|
package de.gecheckt.pdf.umbenenner.application.port.out;
|
||||||
|
|
||||||
|
import de.gecheckt.pdf.umbenenner.domain.model.DocumentFingerprint;
|
||||||
|
|
||||||
|
import java.util.Objects;
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Successful outcome of a fingerprint computation.
|
||||||
|
* <p>
|
||||||
|
* Carries the computed {@link DocumentFingerprint} that uniquely identifies the
|
||||||
|
* document by its content. The fingerprint can now be used as the primary key
|
||||||
|
* for all subsequent persistence operations in M4.
|
||||||
|
*
|
||||||
|
* @param fingerprint the successfully computed fingerprint; never null
|
||||||
|
* @since M4-AP-001
|
||||||
|
*/
|
||||||
|
public record FingerprintSuccess(DocumentFingerprint fingerprint) implements FingerprintResult {
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Compact constructor validating the non-null contract.
|
||||||
|
*
|
||||||
|
* @throws NullPointerException if {@code fingerprint} is null
|
||||||
|
*/
|
||||||
|
public FingerprintSuccess {
|
||||||
|
Objects.requireNonNull(fingerprint, "fingerprint must not be null");
|
||||||
|
}
|
||||||
|
}
|
||||||
@@ -0,0 +1,34 @@
|
|||||||
|
package de.gecheckt.pdf.umbenenner.application.port.out;
|
||||||
|
|
||||||
|
import java.util.Objects;
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Technical failure during fingerprint computation.
|
||||||
|
* <p>
|
||||||
|
* Returned by {@link FingerprintPort} when the adapter cannot read the file content
|
||||||
|
* to compute the SHA-256 hash. Typical causes include the file no longer being
|
||||||
|
* accessible between candidate discovery and hashing, I/O errors, or permission issues.
|
||||||
|
* <p>
|
||||||
|
* <strong>Historisation impact:</strong> Because no {@link de.gecheckt.pdf.umbenenner.domain.model.DocumentFingerprint}
|
||||||
|
* could be produced, this failure is <em>not</em> historised in SQLite. No
|
||||||
|
* {@link ProcessingAttempt} is created.
|
||||||
|
*
|
||||||
|
* @param errorMessage human-readable description of the failure; never null or blank
|
||||||
|
* @param cause the underlying throwable, or {@code null} if not available
|
||||||
|
* @since M4-AP-001
|
||||||
|
*/
|
||||||
|
public record FingerprintTechnicalError(String errorMessage, Throwable cause) implements FingerprintResult {
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Compact constructor validating the error message.
|
||||||
|
*
|
||||||
|
* @throws NullPointerException if {@code errorMessage} is null
|
||||||
|
* @throws IllegalArgumentException if {@code errorMessage} is blank
|
||||||
|
*/
|
||||||
|
public FingerprintTechnicalError {
|
||||||
|
Objects.requireNonNull(errorMessage, "errorMessage must not be null");
|
||||||
|
if (errorMessage.isBlank()) {
|
||||||
|
throw new IllegalArgumentException("errorMessage must not be blank");
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
@@ -0,0 +1,36 @@
|
|||||||
|
package de.gecheckt.pdf.umbenenner.application.port.out;
|
||||||
|
|
||||||
|
import java.util.Objects;
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Lookup result indicating that the master record lookup itself failed due to a
|
||||||
|
* technical infrastructure problem.
|
||||||
|
* <p>
|
||||||
|
* The persistence layer (SQLite) could not be reached or returned an unexpected error.
|
||||||
|
* The document state is unknown; the use case must treat this candidate as a
|
||||||
|
* transient technical failure for this run and must not attempt to write any attempt
|
||||||
|
* record (since the underlying persistence is unavailable).
|
||||||
|
* <p>
|
||||||
|
* This variant is distinct from a business-level "document not found" outcome
|
||||||
|
* ({@link DocumentUnknown}): here, the lookup operation itself failed.
|
||||||
|
*
|
||||||
|
* @param errorMessage human-readable description of the persistence failure; never null or blank
|
||||||
|
* @param cause the underlying throwable, or {@code null} if not available
|
||||||
|
* @since M4-AP-001
|
||||||
|
*/
|
||||||
|
public record PersistenceLookupTechnicalFailure(String errorMessage, Throwable cause)
|
||||||
|
implements DocumentRecordLookupResult {
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Compact constructor validating the error message.
|
||||||
|
*
|
||||||
|
* @throws NullPointerException if {@code errorMessage} is null
|
||||||
|
* @throws IllegalArgumentException if {@code errorMessage} is blank
|
||||||
|
*/
|
||||||
|
public PersistenceLookupTechnicalFailure {
|
||||||
|
Objects.requireNonNull(errorMessage, "errorMessage must not be null");
|
||||||
|
if (errorMessage.isBlank()) {
|
||||||
|
throw new IllegalArgumentException("errorMessage must not be blank");
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
@@ -0,0 +1,40 @@
|
|||||||
|
package de.gecheckt.pdf.umbenenner.application.port.out;
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Outbound port for initialising the SQLite persistence schema at program startup.
|
||||||
|
* <p>
|
||||||
|
* This port is invoked exactly once per program run, <em>before</em> the batch
|
||||||
|
* document processing loop begins. The initialisation must ensure that all tables,
|
||||||
|
* indices, and constraints required for M4 persistence are present in the SQLite file.
|
||||||
|
* <p>
|
||||||
|
* <strong>Timing:</strong> The adapter implementation must perform the schema
|
||||||
|
* initialisation eagerly and synchronously. Lazy or deferred initialisation during
|
||||||
|
* the document processing loop is not the intent of this port.
|
||||||
|
* <p>
|
||||||
|
* <strong>Failure handling:</strong> If the schema cannot be initialised, the
|
||||||
|
* implementation must throw {@link DocumentPersistenceException}. The bootstrap
|
||||||
|
* layer must catch this exception and abort the run with exit code 1.
|
||||||
|
* <p>
|
||||||
|
* <strong>Idempotency:</strong> Calling {@link #initializeSchema()} on a database
|
||||||
|
* that already has the correct schema must succeed without error (e.g. via
|
||||||
|
* {@code CREATE TABLE IF NOT EXISTS} semantics).
|
||||||
|
* <p>
|
||||||
|
* <strong>Architecture boundary:</strong> No JDBC, SQLite, or filesystem types appear
|
||||||
|
* in this interface. All schema DDL and connection management are confined to the
|
||||||
|
* {@code adapter-out} implementation.
|
||||||
|
*
|
||||||
|
* @since M4-AP-001
|
||||||
|
*/
|
||||||
|
public interface PersistenceSchemaInitializationPort {
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Creates or verifies the M4 persistence schema.
|
||||||
|
* <p>
|
||||||
|
* Must be called once at program start, before any document processing begins.
|
||||||
|
* The method must be idempotent: calling it on an already-initialised database
|
||||||
|
* must not fail or alter existing data.
|
||||||
|
*
|
||||||
|
* @throws DocumentPersistenceException if the schema cannot be created or verified
|
||||||
|
*/
|
||||||
|
void initializeSchema();
|
||||||
|
}
|
||||||
@@ -0,0 +1,88 @@
|
|||||||
|
package de.gecheckt.pdf.umbenenner.application.port.out;
|
||||||
|
|
||||||
|
import de.gecheckt.pdf.umbenenner.domain.model.DocumentFingerprint;
|
||||||
|
import de.gecheckt.pdf.umbenenner.domain.model.ProcessingStatus;
|
||||||
|
import de.gecheckt.pdf.umbenenner.domain.model.RunId;
|
||||||
|
|
||||||
|
import java.time.Instant;
|
||||||
|
import java.util.Objects;
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Application-facing representation of exactly one historised processing attempt
|
||||||
|
* (Versuchshistorie-Eintrag) for an identified document.
|
||||||
|
* <p>
|
||||||
|
* <strong>Historisation boundary (M4):</strong> Only attempts for documents whose
|
||||||
|
* {@link DocumentFingerprint} was successfully computed are historised. Failures that
|
||||||
|
* occur <em>before</em> the fingerprint is available (e.g. the source file is
|
||||||
|
* unreadable before hashing) are <em>not</em> represented by a {@code ProcessingAttempt}
|
||||||
|
* and are <em>not</em> written to SQLite.
|
||||||
|
* <p>
|
||||||
|
* <strong>Attempt number semantics:</strong> The attempt number starts at 1 for the
|
||||||
|
* first historised attempt per fingerprint and increases monotonically by 1 for every
|
||||||
|
* subsequent attempt, including skip attempts
|
||||||
|
* ({@link ProcessingStatus#SKIPPED_ALREADY_PROCESSED},
|
||||||
|
* {@link ProcessingStatus#SKIPPED_FINAL_FAILURE}).
|
||||||
|
* <p>
|
||||||
|
* <strong>Field semantics:</strong>
|
||||||
|
* <ul>
|
||||||
|
* <li>{@link #fingerprint()} — foreign key to the document master record.</li>
|
||||||
|
* <li>{@link #runId()} — identifies the batch run during which this attempt occurred.</li>
|
||||||
|
* <li>{@link #attemptNumber()} — monotonically increasing per fingerprint; assigned
|
||||||
|
* before the attempt is recorded.</li>
|
||||||
|
* <li>{@link #startedAt()} — wall-clock timestamp when processing of this candidate
|
||||||
|
* began in this run.</li>
|
||||||
|
* <li>{@link #endedAt()} — wall-clock timestamp when processing completed (success,
|
||||||
|
* failure, or skip).</li>
|
||||||
|
* <li>{@link #status()} — outcome status of this specific attempt.</li>
|
||||||
|
* <li>{@link #failureClass()} — short classification of the failure (e.g. enum constant
|
||||||
|
* name or exception class name); {@code null} for successful or skip attempts.</li>
|
||||||
|
* <li>{@link #failureMessage()} — human-readable failure description; {@code null} for
|
||||||
|
* successful or skip attempts.</li>
|
||||||
|
* <li>{@link #retryable()} — {@code true} if the failure is considered retryable in a
|
||||||
|
* later run; {@code false} for final failures, successes, and skip attempts.</li>
|
||||||
|
* </ul>
|
||||||
|
* <p>
|
||||||
|
* <strong>Not included in M4:</strong> model name, prompt identifier, AI raw response,
|
||||||
|
* AI reasoning, resolved date, date source, final title, final target file name.
|
||||||
|
* These fields are added in later milestones (M5+).
|
||||||
|
*
|
||||||
|
* @param fingerprint content-based document identity; never null
|
||||||
|
* @param runId identifier of the batch run; never null
|
||||||
|
* @param attemptNumber monotonic sequence number per fingerprint; must be >= 1
|
||||||
|
* @param startedAt start of this processing attempt; never null
|
||||||
|
* @param endedAt end of this processing attempt; never null
|
||||||
|
* @param status outcome status of this attempt; never null
|
||||||
|
* @param failureClass failure classification, or {@code null} for non-failure statuses
|
||||||
|
* @param failureMessage failure description, or {@code null} for non-failure statuses
|
||||||
|
* @param retryable whether this failure should be retried in a later run
|
||||||
|
* @since M4-AP-001
|
||||||
|
*/
|
||||||
|
public record ProcessingAttempt(
|
||||||
|
DocumentFingerprint fingerprint,
|
||||||
|
RunId runId,
|
||||||
|
int attemptNumber,
|
||||||
|
Instant startedAt,
|
||||||
|
Instant endedAt,
|
||||||
|
ProcessingStatus status,
|
||||||
|
String failureClass,
|
||||||
|
String failureMessage,
|
||||||
|
boolean retryable) {
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Compact constructor validating mandatory non-null fields and numeric constraints.
|
||||||
|
*
|
||||||
|
* @throws NullPointerException if any mandatory field is null
|
||||||
|
* @throws IllegalArgumentException if {@code attemptNumber} is less than 1
|
||||||
|
*/
|
||||||
|
public ProcessingAttempt {
|
||||||
|
Objects.requireNonNull(fingerprint, "fingerprint must not be null");
|
||||||
|
Objects.requireNonNull(runId, "runId must not be null");
|
||||||
|
if (attemptNumber < 1) {
|
||||||
|
throw new IllegalArgumentException(
|
||||||
|
"attemptNumber must be >= 1, but was: " + attemptNumber);
|
||||||
|
}
|
||||||
|
Objects.requireNonNull(startedAt, "startedAt must not be null");
|
||||||
|
Objects.requireNonNull(endedAt, "endedAt must not be null");
|
||||||
|
Objects.requireNonNull(status, "status must not be null");
|
||||||
|
}
|
||||||
|
}
|
||||||
@@ -0,0 +1,70 @@
|
|||||||
|
package de.gecheckt.pdf.umbenenner.application.port.out;
|
||||||
|
|
||||||
|
import de.gecheckt.pdf.umbenenner.domain.model.DocumentFingerprint;
|
||||||
|
|
||||||
|
import java.util.List;
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Outbound port for writing and reading the processing attempt history
|
||||||
|
* (Versuchshistorie).
|
||||||
|
* <p>
|
||||||
|
* Every historisable processing attempt for an <em>identified</em> document results
|
||||||
|
* in exactly one {@link ProcessingAttempt} record written via {@link #save(ProcessingAttempt)}.
|
||||||
|
* <p>
|
||||||
|
* <strong>Historisation boundary:</strong> Only attempts with a successfully computed
|
||||||
|
* {@link de.gecheckt.pdf.umbenenner.domain.model.DocumentFingerprint} are historised.
|
||||||
|
* Failures that occur before the fingerprint is available are <em>not</em> recorded
|
||||||
|
* through this port.
|
||||||
|
* <p>
|
||||||
|
* <strong>Attempt number semantics:</strong>
|
||||||
|
* Attempt numbers start at 1 per fingerprint and increase monotonically by 1
|
||||||
|
* for every saved attempt, including skip attempts. The use case calls
|
||||||
|
* {@link #loadNextAttemptNumber(DocumentFingerprint)} to obtain the correct sequence
|
||||||
|
* number before constructing a {@link ProcessingAttempt}.
|
||||||
|
* <p>
|
||||||
|
* <strong>Architecture boundary:</strong> No JDBC, SQLite, or filesystem types appear
|
||||||
|
* in this interface. Mapping to and from the persistence schema is the exclusive
|
||||||
|
* responsibility of the adapter implementation.
|
||||||
|
*
|
||||||
|
* @since M4-AP-001
|
||||||
|
*/
|
||||||
|
public interface ProcessingAttemptRepository {
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Returns the attempt number to assign to the <em>next</em> attempt for the given
|
||||||
|
* fingerprint.
|
||||||
|
* <p>
|
||||||
|
* If no prior attempts exist for the fingerprint, returns 1.
|
||||||
|
* Otherwise returns the current maximum attempt number plus 1.
|
||||||
|
*
|
||||||
|
* @param fingerprint the document identity; must not be null
|
||||||
|
* @return the next monotonic attempt number; always >= 1
|
||||||
|
* @throws DocumentPersistenceException if the query fails due to a technical error
|
||||||
|
*/
|
||||||
|
int loadNextAttemptNumber(DocumentFingerprint fingerprint);
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Persists exactly one processing attempt record.
|
||||||
|
* <p>
|
||||||
|
* The {@link ProcessingAttempt#attemptNumber()} must have been obtained from
|
||||||
|
* {@link #loadNextAttemptNumber(DocumentFingerprint)} in the same run to guarantee
|
||||||
|
* monotonic ordering.
|
||||||
|
*
|
||||||
|
* @param attempt the attempt to persist; must not be null
|
||||||
|
* @throws DocumentPersistenceException if the insert fails due to a technical error
|
||||||
|
*/
|
||||||
|
void save(ProcessingAttempt attempt);
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Returns all historised attempts for the given fingerprint, ordered by
|
||||||
|
* {@link ProcessingAttempt#attemptNumber()} ascending.
|
||||||
|
* <p>
|
||||||
|
* Returns an empty list if no attempts have been recorded yet.
|
||||||
|
* Intended for use in tests and diagnostics; not required on the primary batch path.
|
||||||
|
*
|
||||||
|
* @param fingerprint the document identity; must not be null
|
||||||
|
* @return immutable list of attempts, ordered by attempt number; never null
|
||||||
|
* @throws DocumentPersistenceException if the query fails due to a technical error
|
||||||
|
*/
|
||||||
|
List<ProcessingAttempt> findAllByFingerprint(DocumentFingerprint fingerprint);
|
||||||
|
}
|
||||||
@@ -22,12 +22,40 @@
|
|||||||
* — Extract text content and page count from a single PDF</li>
|
* — Extract text content and page count from a single PDF</li>
|
||||||
* </ul>
|
* </ul>
|
||||||
* <p>
|
* <p>
|
||||||
|
* M4-AP-001 ports:
|
||||||
|
* <ul>
|
||||||
|
* <li>{@link de.gecheckt.pdf.umbenenner.application.port.out.FingerprintPort}
|
||||||
|
* — Compute the content-based SHA-256 fingerprint of a processing candidate</li>
|
||||||
|
* <li>{@link de.gecheckt.pdf.umbenenner.application.port.out.DocumentRecordRepository}
|
||||||
|
* — Read and write the document master record (Dokument-Stammsatz)</li>
|
||||||
|
* <li>{@link de.gecheckt.pdf.umbenenner.application.port.out.ProcessingAttemptRepository}
|
||||||
|
* — Write and read the per-document attempt history (Versuchshistorie)</li>
|
||||||
|
* <li>{@link de.gecheckt.pdf.umbenenner.application.port.out.PersistenceSchemaInitializationPort}
|
||||||
|
* — Initialise the SQLite schema at program startup</li>
|
||||||
|
* </ul>
|
||||||
|
* <p>
|
||||||
|
* M4-AP-001 value types and result types:
|
||||||
|
* <ul>
|
||||||
|
* <li>{@link de.gecheckt.pdf.umbenenner.application.port.out.FailureCounters}
|
||||||
|
* — Immutable snapshot of content-error and transient-error counters per document</li>
|
||||||
|
* <li>{@link de.gecheckt.pdf.umbenenner.application.port.out.DocumentRecord}
|
||||||
|
* — Application-facing representation of the document master record</li>
|
||||||
|
* <li>{@link de.gecheckt.pdf.umbenenner.application.port.out.ProcessingAttempt}
|
||||||
|
* — Application-facing representation of one historised processing attempt</li>
|
||||||
|
* <li>{@link de.gecheckt.pdf.umbenenner.application.port.out.FingerprintResult}
|
||||||
|
* — Sealed result of a fingerprint computation (success or technical error)</li>
|
||||||
|
* <li>{@link de.gecheckt.pdf.umbenenner.application.port.out.DocumentRecordLookupResult}
|
||||||
|
* — Sealed result of a master record lookup (unknown / processable / terminal / failure)</li>
|
||||||
|
* </ul>
|
||||||
|
* <p>
|
||||||
* Exception types:
|
* Exception types:
|
||||||
* <ul>
|
* <ul>
|
||||||
* <li>{@link de.gecheckt.pdf.umbenenner.application.port.out.RunLockUnavailableException}
|
* <li>{@link de.gecheckt.pdf.umbenenner.application.port.out.RunLockUnavailableException}
|
||||||
* — Thrown when run lock cannot be acquired (another instance running) (M2)</li>
|
* — Thrown when run lock cannot be acquired (another instance running) (M2)</li>
|
||||||
* <li>{@link de.gecheckt.pdf.umbenenner.application.port.out.SourceDocumentAccessException}
|
* <li>{@link de.gecheckt.pdf.umbenenner.application.port.out.SourceDocumentAccessException}
|
||||||
* — Thrown when source folder cannot be read or accessed (M3)</li>
|
* — Thrown when source folder cannot be read or accessed (M3)</li>
|
||||||
|
* <li>{@link de.gecheckt.pdf.umbenenner.application.port.out.DocumentPersistenceException}
|
||||||
|
* — Thrown when a persistence write operation or schema init fails (M4)</li>
|
||||||
* </ul>
|
* </ul>
|
||||||
* <p>
|
* <p>
|
||||||
* Architecture Rule: Outbound ports are implementation-agnostic and contain no business logic.
|
* Architecture Rule: Outbound ports are implementation-agnostic and contain no business logic.
|
||||||
|
|||||||
@@ -0,0 +1,51 @@
|
|||||||
|
package de.gecheckt.pdf.umbenenner.domain.model;
|
||||||
|
|
||||||
|
import java.util.Objects;
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Unique, stable identity of a document derived exclusively from its binary content.
|
||||||
|
* <p>
|
||||||
|
* A {@code DocumentFingerprint} is computed once per file read and used as the primary
|
||||||
|
* key for all subsequent persistence lookups and history entries. It is independent of
|
||||||
|
* the file name, path, or any metadata — only the raw file content determines the value.
|
||||||
|
* <p>
|
||||||
|
* <strong>Identification semantics (M4):</strong>
|
||||||
|
* <ul>
|
||||||
|
* <li>Two files with identical content have the same fingerprint and are treated as
|
||||||
|
* the same document, regardless of their location or name.</li>
|
||||||
|
* <li>A file whose content has changed produces a different fingerprint and is treated
|
||||||
|
* as a new, independent document (new processing record).</li>
|
||||||
|
* </ul>
|
||||||
|
* <p>
|
||||||
|
* <strong>Architecture boundary:</strong> The hashing algorithm (SHA-256) and all
|
||||||
|
* file I/O required to compute the fingerprint are strictly confined to the
|
||||||
|
* {@code adapter-out} layer. Domain and Application only hold and compare the resulting
|
||||||
|
* hex string; they never access the filesystem or perform cryptographic operations.
|
||||||
|
* <p>
|
||||||
|
* <strong>Pre-fingerprint failures:</strong> If computing the fingerprint fails
|
||||||
|
* (e.g. due to an I/O error), no {@code DocumentFingerprint} is created and the failure
|
||||||
|
* is not historised in SQLite. The attempt is treated as a non-identifiable run event,
|
||||||
|
* not as a documentable processing attempt.
|
||||||
|
*
|
||||||
|
* @param sha256Hex lowercase hex encoding of the SHA-256 digest (exactly 64 characters,
|
||||||
|
* characters {@code [0-9a-f]})
|
||||||
|
* @since M4-AP-001
|
||||||
|
*/
|
||||||
|
public record DocumentFingerprint(String sha256Hex) {
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Compact constructor that validates the hex string format.
|
||||||
|
*
|
||||||
|
* @param sha256Hex lowercase hex encoding of the SHA-256 digest
|
||||||
|
* @throws NullPointerException if {@code sha256Hex} is null
|
||||||
|
* @throws IllegalArgumentException if {@code sha256Hex} is not exactly 64 lowercase hex characters
|
||||||
|
*/
|
||||||
|
public DocumentFingerprint {
|
||||||
|
Objects.requireNonNull(sha256Hex, "sha256Hex must not be null");
|
||||||
|
if (sha256Hex.length() != 64 || !sha256Hex.matches("[0-9a-f]{64}")) {
|
||||||
|
throw new IllegalArgumentException(
|
||||||
|
"sha256Hex must be a 64-character lowercase hex string, but was: '"
|
||||||
|
+ sha256Hex + "'");
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
@@ -1,20 +1,44 @@
|
|||||||
package de.gecheckt.pdf.umbenenner.domain.model;
|
package de.gecheckt.pdf.umbenenner.domain.model;
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* Enumeration of all valid processing status values for a document within a batch run.
|
* Enumeration of all valid processing status values for a document.
|
||||||
* <p>
|
* <p>
|
||||||
* Each status reflects the outcome or current state of a document processing attempt.
|
* Each status reflects the outcome or current state of a document in the
|
||||||
* Status transitions follow the rules defined in the architecture specification and persist
|
* master record ({@code DocumentRecord}) or in a single attempt record
|
||||||
* across multiple batch runs via the repository layer.
|
* ({@code ProcessingAttempt}).
|
||||||
* <p>
|
* <p>
|
||||||
* Status Categories:
|
* <strong>Overall-status semantics (master record, M4):</strong>
|
||||||
* <ul>
|
* <ul>
|
||||||
* <li><strong>Final Success:</strong> {@link #SUCCESS}</li>
|
* <li>{@link #SUCCESS} — document was fully processed; skip in all future runs.</li>
|
||||||
* <li><strong>Retryable Failure:</strong> {@link #FAILED_RETRYABLE}</li>
|
* <li>{@link #FAILED_RETRYABLE} — last attempt failed but is retryable; process again
|
||||||
* <li><strong>Final Failure:</strong> {@link #FAILED_FINAL}</li>
|
* in the next run according to the applicable retry rule.</li>
|
||||||
* <li><strong>Skip (Already Processed):</strong> {@link #SKIPPED_ALREADY_PROCESSED}</li>
|
* <li>{@link #FAILED_FINAL} — all allowed retries exhausted; skip in all future runs.</li>
|
||||||
* <li><strong>Skip (Final Failure):</strong> {@link #SKIPPED_FINAL_FAILURE}</li>
|
* <li>{@link #PROCESSING} — document is currently being processed (transient, within a
|
||||||
* <li><strong>Processing (Transient):</strong> {@link #PROCESSING}</li>
|
* run); if found persisted after a crash, treat as {@link #FAILED_RETRYABLE}.</li>
|
||||||
|
* </ul>
|
||||||
|
* <p>
|
||||||
|
* <strong>Attempt-status semantics (attempt history, M4):</strong>
|
||||||
|
* <ul>
|
||||||
|
* <li>{@link #SUCCESS} — this attempt completed successfully.</li>
|
||||||
|
* <li>{@link #FAILED_RETRYABLE} — this attempt failed; a future attempt is allowed.</li>
|
||||||
|
* <li>{@link #FAILED_FINAL} — this attempt failed and no further attempts will be made.</li>
|
||||||
|
* <li>{@link #SKIPPED_ALREADY_PROCESSED} — this attempt was a skip because the
|
||||||
|
* document's overall status was already {@link #SUCCESS}.</li>
|
||||||
|
* <li>{@link #SKIPPED_FINAL_FAILURE} — this attempt was a skip because the document's
|
||||||
|
* overall status was already {@link #FAILED_FINAL}.</li>
|
||||||
|
* </ul>
|
||||||
|
* <p>
|
||||||
|
* <strong>M4 counter rules:</strong>
|
||||||
|
* <ul>
|
||||||
|
* <li>Only {@link #FAILED_RETRYABLE} and {@link #FAILED_FINAL} outcomes may increase
|
||||||
|
* a failure counter (content-error or transient-error counter).</li>
|
||||||
|
* <li>Skip outcomes ({@link #SKIPPED_ALREADY_PROCESSED}, {@link #SKIPPED_FINAL_FAILURE})
|
||||||
|
* never change any failure counter.</li>
|
||||||
|
* <li>A deterministic content error at first occurrence → {@link #FAILED_RETRYABLE},
|
||||||
|
* content-error counter +1. At second occurrence → {@link #FAILED_FINAL},
|
||||||
|
* content-error counter +2 (cumulative).</li>
|
||||||
|
* <li>A transient technical error after a successful fingerprint → {@link #FAILED_RETRYABLE},
|
||||||
|
* transient-error counter +1.</li>
|
||||||
* </ul>
|
* </ul>
|
||||||
*
|
*
|
||||||
* @since M2-AP-001
|
* @since M2-AP-001
|
||||||
@@ -40,13 +64,15 @@ public enum ProcessingStatus {
|
|||||||
FAILED_RETRYABLE,
|
FAILED_RETRYABLE,
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* Processing failed with a deterministic content error (non-recoverable problem).
|
* Processing has failed finally and irrecoverably — no further retries will be attempted.
|
||||||
* <p>
|
* <p>
|
||||||
* Examples: PDF has no extractable text, page limit exceeded, document is ambiguous.
|
* This status is reached after all allowed retries for a document are exhausted.
|
||||||
|
* For deterministic content errors (no usable text, page limit exceeded) this means
|
||||||
|
* the second occurrence of the error. For other error types, it means the configured
|
||||||
|
* maximum retry count has been reached.
|
||||||
* <p>
|
* <p>
|
||||||
* A document with this status receives exactly one retry in a later batch run.
|
* A document with this overall status is skipped in all future batch runs and
|
||||||
* After that retry, if it still fails, status becomes {@link #FAILED_FINAL}.
|
* a {@link #SKIPPED_FINAL_FAILURE} attempt is historised.
|
||||||
* No further retries are attempted.
|
|
||||||
*/
|
*/
|
||||||
FAILED_FINAL,
|
FAILED_FINAL,
|
||||||
|
|
||||||
|
|||||||
@@ -6,6 +6,7 @@
|
|||||||
* <li>{@link de.gecheckt.pdf.umbenenner.domain.model.ProcessingStatus} — enumeration of all valid document processing states</li>
|
* <li>{@link de.gecheckt.pdf.umbenenner.domain.model.ProcessingStatus} — enumeration of all valid document processing states</li>
|
||||||
* <li>{@link de.gecheckt.pdf.umbenenner.domain.model.RunId} — unique identifier for a batch run</li>
|
* <li>{@link de.gecheckt.pdf.umbenenner.domain.model.RunId} — unique identifier for a batch run</li>
|
||||||
* <li>{@link de.gecheckt.pdf.umbenenner.domain.model.BatchRunContext} — technical context for a batch run</li>
|
* <li>{@link de.gecheckt.pdf.umbenenner.domain.model.BatchRunContext} — technical context for a batch run</li>
|
||||||
|
* <li>{@link de.gecheckt.pdf.umbenenner.domain.model.DocumentFingerprint} — content-based document identity (SHA-256 hex); primary key for M4 persistence (M4)</li>
|
||||||
* <li>{@link de.gecheckt.pdf.umbenenner.domain.model.SourceDocumentCandidate} — discovered PDF from source folder</li>
|
* <li>{@link de.gecheckt.pdf.umbenenner.domain.model.SourceDocumentCandidate} — discovered PDF from source folder</li>
|
||||||
* <li>{@link de.gecheckt.pdf.umbenenner.domain.model.SourceDocumentLocator} — opaque locator passed from scan adapter to extraction adapter</li>
|
* <li>{@link de.gecheckt.pdf.umbenenner.domain.model.SourceDocumentLocator} — opaque locator passed from scan adapter to extraction adapter</li>
|
||||||
* <li>{@link de.gecheckt.pdf.umbenenner.domain.model.PdfPageCount} — typed page count validation</li>
|
* <li>{@link de.gecheckt.pdf.umbenenner.domain.model.PdfPageCount} — typed page count validation</li>
|
||||||
|
|||||||
Reference in New Issue
Block a user