M4 AP-001 Kernobjekte, Statusmodell und Port-Verträge präzisieren
This commit is contained in:
@@ -0,0 +1,51 @@
|
||||
package de.gecheckt.pdf.umbenenner.domain.model;
|
||||
|
||||
import java.util.Objects;
|
||||
|
||||
/**
|
||||
* Unique, stable identity of a document derived exclusively from its binary content.
|
||||
* <p>
|
||||
* A {@code DocumentFingerprint} is computed once per file read and used as the primary
|
||||
* key for all subsequent persistence lookups and history entries. It is independent of
|
||||
* the file name, path, or any metadata — only the raw file content determines the value.
|
||||
* <p>
|
||||
* <strong>Identification semantics (M4):</strong>
|
||||
* <ul>
|
||||
* <li>Two files with identical content have the same fingerprint and are treated as
|
||||
* the same document, regardless of their location or name.</li>
|
||||
* <li>A file whose content has changed produces a different fingerprint and is treated
|
||||
* as a new, independent document (new processing record).</li>
|
||||
* </ul>
|
||||
* <p>
|
||||
* <strong>Architecture boundary:</strong> The hashing algorithm (SHA-256) and all
|
||||
* file I/O required to compute the fingerprint are strictly confined to the
|
||||
* {@code adapter-out} layer. Domain and Application only hold and compare the resulting
|
||||
* hex string; they never access the filesystem or perform cryptographic operations.
|
||||
* <p>
|
||||
* <strong>Pre-fingerprint failures:</strong> If computing the fingerprint fails
|
||||
* (e.g. due to an I/O error), no {@code DocumentFingerprint} is created and the failure
|
||||
* is not historised in SQLite. The attempt is treated as a non-identifiable run event,
|
||||
* not as a documentable processing attempt.
|
||||
*
|
||||
* @param sha256Hex lowercase hex encoding of the SHA-256 digest (exactly 64 characters,
|
||||
* characters {@code [0-9a-f]})
|
||||
* @since M4-AP-001
|
||||
*/
|
||||
public record DocumentFingerprint(String sha256Hex) {
|
||||
|
||||
/**
|
||||
* Compact constructor that validates the hex string format.
|
||||
*
|
||||
* @param sha256Hex lowercase hex encoding of the SHA-256 digest
|
||||
* @throws NullPointerException if {@code sha256Hex} is null
|
||||
* @throws IllegalArgumentException if {@code sha256Hex} is not exactly 64 lowercase hex characters
|
||||
*/
|
||||
public DocumentFingerprint {
|
||||
Objects.requireNonNull(sha256Hex, "sha256Hex must not be null");
|
||||
if (sha256Hex.length() != 64 || !sha256Hex.matches("[0-9a-f]{64}")) {
|
||||
throw new IllegalArgumentException(
|
||||
"sha256Hex must be a 64-character lowercase hex string, but was: '"
|
||||
+ sha256Hex + "'");
|
||||
}
|
||||
}
|
||||
}
|
||||
@@ -1,20 +1,44 @@
|
||||
package de.gecheckt.pdf.umbenenner.domain.model;
|
||||
|
||||
/**
|
||||
* Enumeration of all valid processing status values for a document within a batch run.
|
||||
* Enumeration of all valid processing status values for a document.
|
||||
* <p>
|
||||
* Each status reflects the outcome or current state of a document processing attempt.
|
||||
* Status transitions follow the rules defined in the architecture specification and persist
|
||||
* across multiple batch runs via the repository layer.
|
||||
* Each status reflects the outcome or current state of a document in the
|
||||
* master record ({@code DocumentRecord}) or in a single attempt record
|
||||
* ({@code ProcessingAttempt}).
|
||||
* <p>
|
||||
* Status Categories:
|
||||
* <strong>Overall-status semantics (master record, M4):</strong>
|
||||
* <ul>
|
||||
* <li><strong>Final Success:</strong> {@link #SUCCESS}</li>
|
||||
* <li><strong>Retryable Failure:</strong> {@link #FAILED_RETRYABLE}</li>
|
||||
* <li><strong>Final Failure:</strong> {@link #FAILED_FINAL}</li>
|
||||
* <li><strong>Skip (Already Processed):</strong> {@link #SKIPPED_ALREADY_PROCESSED}</li>
|
||||
* <li><strong>Skip (Final Failure):</strong> {@link #SKIPPED_FINAL_FAILURE}</li>
|
||||
* <li><strong>Processing (Transient):</strong> {@link #PROCESSING}</li>
|
||||
* <li>{@link #SUCCESS} — document was fully processed; skip in all future runs.</li>
|
||||
* <li>{@link #FAILED_RETRYABLE} — last attempt failed but is retryable; process again
|
||||
* in the next run according to the applicable retry rule.</li>
|
||||
* <li>{@link #FAILED_FINAL} — all allowed retries exhausted; skip in all future runs.</li>
|
||||
* <li>{@link #PROCESSING} — document is currently being processed (transient, within a
|
||||
* run); if found persisted after a crash, treat as {@link #FAILED_RETRYABLE}.</li>
|
||||
* </ul>
|
||||
* <p>
|
||||
* <strong>Attempt-status semantics (attempt history, M4):</strong>
|
||||
* <ul>
|
||||
* <li>{@link #SUCCESS} — this attempt completed successfully.</li>
|
||||
* <li>{@link #FAILED_RETRYABLE} — this attempt failed; a future attempt is allowed.</li>
|
||||
* <li>{@link #FAILED_FINAL} — this attempt failed and no further attempts will be made.</li>
|
||||
* <li>{@link #SKIPPED_ALREADY_PROCESSED} — this attempt was a skip because the
|
||||
* document's overall status was already {@link #SUCCESS}.</li>
|
||||
* <li>{@link #SKIPPED_FINAL_FAILURE} — this attempt was a skip because the document's
|
||||
* overall status was already {@link #FAILED_FINAL}.</li>
|
||||
* </ul>
|
||||
* <p>
|
||||
* <strong>M4 counter rules:</strong>
|
||||
* <ul>
|
||||
* <li>Only {@link #FAILED_RETRYABLE} and {@link #FAILED_FINAL} outcomes may increase
|
||||
* a failure counter (content-error or transient-error counter).</li>
|
||||
* <li>Skip outcomes ({@link #SKIPPED_ALREADY_PROCESSED}, {@link #SKIPPED_FINAL_FAILURE})
|
||||
* never change any failure counter.</li>
|
||||
* <li>A deterministic content error at first occurrence → {@link #FAILED_RETRYABLE},
|
||||
* content-error counter +1. At second occurrence → {@link #FAILED_FINAL},
|
||||
* content-error counter +2 (cumulative).</li>
|
||||
* <li>A transient technical error after a successful fingerprint → {@link #FAILED_RETRYABLE},
|
||||
* transient-error counter +1.</li>
|
||||
* </ul>
|
||||
*
|
||||
* @since M2-AP-001
|
||||
@@ -40,13 +64,15 @@ public enum ProcessingStatus {
|
||||
FAILED_RETRYABLE,
|
||||
|
||||
/**
|
||||
* Processing failed with a deterministic content error (non-recoverable problem).
|
||||
* Processing has failed finally and irrecoverably — no further retries will be attempted.
|
||||
* <p>
|
||||
* Examples: PDF has no extractable text, page limit exceeded, document is ambiguous.
|
||||
* This status is reached after all allowed retries for a document are exhausted.
|
||||
* For deterministic content errors (no usable text, page limit exceeded) this means
|
||||
* the second occurrence of the error. For other error types, it means the configured
|
||||
* maximum retry count has been reached.
|
||||
* <p>
|
||||
* A document with this status receives exactly one retry in a later batch run.
|
||||
* After that retry, if it still fails, status becomes {@link #FAILED_FINAL}.
|
||||
* No further retries are attempted.
|
||||
* A document with this overall status is skipped in all future batch runs and
|
||||
* a {@link #SKIPPED_FINAL_FAILURE} attempt is historised.
|
||||
*/
|
||||
FAILED_FINAL,
|
||||
|
||||
|
||||
@@ -6,6 +6,7 @@
|
||||
* <li>{@link de.gecheckt.pdf.umbenenner.domain.model.ProcessingStatus} — enumeration of all valid document processing states</li>
|
||||
* <li>{@link de.gecheckt.pdf.umbenenner.domain.model.RunId} — unique identifier for a batch run</li>
|
||||
* <li>{@link de.gecheckt.pdf.umbenenner.domain.model.BatchRunContext} — technical context for a batch run</li>
|
||||
* <li>{@link de.gecheckt.pdf.umbenenner.domain.model.DocumentFingerprint} — content-based document identity (SHA-256 hex); primary key for M4 persistence (M4)</li>
|
||||
* <li>{@link de.gecheckt.pdf.umbenenner.domain.model.SourceDocumentCandidate} — discovered PDF from source folder</li>
|
||||
* <li>{@link de.gecheckt.pdf.umbenenner.domain.model.SourceDocumentLocator} — opaque locator passed from scan adapter to extraction adapter</li>
|
||||
* <li>{@link de.gecheckt.pdf.umbenenner.domain.model.PdfPageCount} — typed page count validation</li>
|
||||
|
||||
Reference in New Issue
Block a user