Adds the duplicate-detection backend on top of perceptual hashing:
- Two tables (edited into the original migrations): data.duplicate_pairs holds
precomputed near-duplicate candidates (rebuilt wholesale by the rescan), and
data.duplicate_dismissals is a global "not a duplicate" overlay that survives
rescans. New audit actions file_merge / duplicate_dismiss.
- DuplicateService:
- Rescan builds every pair within DUPLICATE_HASH_THRESHOLD via a BK-tree over
the perceptual hashes and replaces the pairs table. This is the only thing
that populates pairs, so GET never compares all-vs-all (scales to 110k+).
- Clusters reads the precomputed pairs (ACL-filtered, non-trashed, non-
dismissed), groups them into connected components via union-find, and
paginates whole clusters.
- Resolve merges a pair field-by-field: each scalar from keep or discard,
metadata keep/discard/shallow-merge, tags/pools keep or union; then trashes
the discarded file. Enforces edit ACL on both.
- Dismiss records a canonical pair (view ACL on both).
- Endpoints under /files: GET /files/duplicates, POST /files/duplicates/dismiss,
POST /files/duplicates/resolve (registered before /:id to avoid collision).
Plain delete reuses /files/bulk/delete.
- Repo support: ListMissingPHash, ListAllPHashes, CopyPoolMemberships, plus the
DuplicatePairRepo (ReplaceAll via COPY, ListVisible) and DismissalRepo.
Unit tests cover the BK-tree pairing, union-find clustering, metadata merge and
field validation; an integration test covers rescan -> list -> merge -> dismiss
(including that a dismissal survives a re-rescan).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Adds a 64-bit dHash perceptual hash (internal/imagehash, built on the existing
disintegration/imaging — no new dependency) and starts populating the long-unused
data.files.phash column:
- Upload sets phash inline for images (cheap, from the in-memory bytes).
- Replace recomputes it from new content for images and clears it for anything
else, so a stale hash never survives a content swap.
- FileRepo.SetPHash sets/clears the hash (used by Replace and, later, the dedup
backfill).
- DiskStorage.VideoFrameMiddle extracts a frame from the middle of a clip
(ffprobe duration -> ffmpeg -ss duration/2), avoiding the shared-intro collision
a fixed early offset causes. It is a concrete method, not part of the storage
port: only the dedup CLI needs it, keeping ffmpeg off the upload path. Video
phashes are therefore computed by that CLI, not at upload time.
- DUPLICATE_HASH_THRESHOLD config (default 10/64) for the later pair rescan.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Replaces the old "untagged" sentinel tag with a proper per-file workflow
status: needs_review starts true on upload/import and is cleared by an
explicit action (no auto-clear on tagging). Surfaced as a filter token
(r=1 needs review, r=0 done) so it combines with tag/MIME conditions, and
toggled via POST /files/bulk/review (single id or many, edit-ACL enforced,
audit-logged as file_review).
needs_review lives on data.files (column added to the original 003 migration,
partial index in 006, action type seeded in 007).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
CreateRule accepted apply_to_existing but ignored it, so enabling the
checkbox while creating a rule never retroactively tagged files already
carrying the when-tag — only activating an existing rule did. Extract the
retroactive expansion into TagRuleRepo.ApplyToExisting (reused by SetActive)
and call it from CreateRule when the rule is active, inside one transaction
so a file is never left half-tagged. Mirrors SetRuleActive semantics,
including following only active downstream rules.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Add POST /pools/{id}/views, mirroring the file-view endpoint: it
enforces view ACL and appends a row to activity.pool_views (viewed_at
defaults to statement_timestamp(), so each view is its own history row).
The table existed but nothing wrote to it.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Listing files with a tag filter now logs each referenced tag to
activity.tag_uses, flagging it included (positive) or excluded (negated
under an odd number of NOTs); the untagged pseudo-token is skipped. The
filter AST is reused to determine polarity, so grouped negations like
!(A|B) mark both tags excluded.
Recording happens only when a filter is first applied — not on cursor
pagination or an anchored return — so one browse counts once. The write
is best-effort and never fails the listing.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The activity.file_views table existed but nothing ever wrote to it. Add a
POST /files/{id}/views endpoint: FileRepo.RecordView inserts a history row,
FileService.RecordView enforces view ACL first. The file viewer fires it
(fire-and-forget) when a file is opened, including while paging prev/next.
Documented in openapi.yaml; covered by TestRecordFileView (204 on view,
repeatable, 404 for unknown file).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Listings returned every row regardless of ownership: GET /files, /tags,
/pools and /categories exposed other users' private items (while the
single-item GET correctly returned 403), and the pool file operations
(GET /pools/:id, /pools/:id/files, add/remove/reorder) skipped ACL
entirely, so any authenticated user could read and rewrite anyone's
private pool.
- List queries now filter to rows the caller may see (public, owned, or
granted can_view) via a shared SQL condition; admins bypass. The viewer
identity is taken from the request context by the service and passed to
the repository in the list params.
- Tag/Category/Pool single-item Get now enforce CanView (File already did).
- Pool Get/ListFiles require pool view; AddFiles/RemoveFiles/Reorder
require pool edit.
Adds regression tests for private-by-default listing (hidden / public /
granted / admin) and for pool operations rejecting a non-owner.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The auth middleware trusted any unexpired, well-signed access token, so
logout, session termination and admin blocks had no effect until the
15-minute token expired. The middleware now validates that the token's
session is still active on every request (SessionRepo.GetByID), and
blocking a user deactivates all of their sessions, immediately revoking
their outstanding access tokens.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Extend PATCH /tags/{id}/rules/{then_id} to accept apply_to_existing bool.
When a rule is activated with apply_to_existing=true, a single recursive
CTE retroactively inserts the full transitive expansion of then_tag into
data.file_tag for all files already carrying when_tag:
WITH RECURSIVE expansion(tag_id) AS (
SELECT then_tag_id
UNION
SELECT r.then_tag_id FROM data.tag_rules r
JOIN expansion e ON r.when_tag_id = e.tag_id
WHERE r.is_active = true
)
INSERT INTO data.file_tag ... ON CONFLICT DO NOTHING
Changes:
- port/repository.go: add applyToExisting param to TagRuleRepo.SetActive
- db/postgres/tag_repo.go: implement recursive CTE retroactive apply
- service/tag_service.go: thread applyToExisting through SetRuleActive
- handler/tag_handler.go: parse apply_to_existing from PATCH body
- openapi.yaml: document apply_to_existing on PATCH endpoint
- integration test: add TestTagRuleActivateApplyToExisting covering
no-op when false, direct+transitive apply when true
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Define all repository interfaces in port/repository.go:
FileRepo, TagRepo, TagRuleRepo, CategoryRepo, PoolRepo, UserRepo,
SessionRepo, ACLRepo, AuditRepo, MimeRepo, and Transactor.
Add OffsetParams and PoolFileListParams as shared parameter structs.
Define FileStorage interface in port/storage.go with Save, Read,
Delete, Thumbnail, and Preview methods.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>