feat(backend): perceptual hashing for images and video

Adds a 64-bit dHash perceptual hash (internal/imagehash, built on the existing
disintegration/imaging — no new dependency) and starts populating the long-unused
data.files.phash column:

- Upload sets phash inline for images (cheap, from the in-memory bytes).
- Replace recomputes it from new content for images and clears it for anything
  else, so a stale hash never survives a content swap.
- FileRepo.SetPHash sets/clears the hash (used by Replace and, later, the dedup
  backfill).
- DiskStorage.VideoFrameMiddle extracts a frame from the middle of a clip
  (ffprobe duration -> ffmpeg -ss duration/2), avoiding the shared-intro collision
  a fixed early offset causes. It is a concrete method, not part of the storage
  port: only the dedup CLI needs it, keeping ffmpeg off the upload path. Video
  phashes are therefore computed by that CLI, not at upload time.
- DUPLICATE_HASH_THRESHOLD config (default 10/64) for the later pair rescan.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
2026-06-16 12:20:52 +03:00
parent 58cea88f52
commit 88849cc16b
7 changed files with 278 additions and 7 deletions
+2
View File
@@ -50,6 +50,8 @@ type FileRepo interface {
Update(ctx context.Context, id uuid.UUID, f *domain.File) (*domain.File, error)
// SetNeedsReview sets the review status on the given (non-trashed) files.
SetNeedsReview(ctx context.Context, ids []uuid.UUID, value bool) error
// SetPHash sets (or clears, when nil) the perceptual hash of a file.
SetPHash(ctx context.Context, id uuid.UUID, phash *int64) error
// SoftDelete moves a file to trash (sets is_deleted = true).
SoftDelete(ctx context.Context, id uuid.UUID) error
// Restore moves a file out of trash (sets is_deleted = false).