Adds the duplicate-detection backend on top of perceptual hashing:
- Two tables (edited into the original migrations): data.duplicate_pairs holds
precomputed near-duplicate candidates (rebuilt wholesale by the rescan), and
data.duplicate_dismissals is a global "not a duplicate" overlay that survives
rescans. New audit actions file_merge / duplicate_dismiss.
- DuplicateService:
- Rescan builds every pair within DUPLICATE_HASH_THRESHOLD via a BK-tree over
the perceptual hashes and replaces the pairs table. This is the only thing
that populates pairs, so GET never compares all-vs-all (scales to 110k+).
- Clusters reads the precomputed pairs (ACL-filtered, non-trashed, non-
dismissed), groups them into connected components via union-find, and
paginates whole clusters.
- Resolve merges a pair field-by-field: each scalar from keep or discard,
metadata keep/discard/shallow-merge, tags/pools keep or union; then trashes
the discarded file. Enforces edit ACL on both.
- Dismiss records a canonical pair (view ACL on both).
- Endpoints under /files: GET /files/duplicates, POST /files/duplicates/dismiss,
POST /files/duplicates/resolve (registered before /:id to avoid collision).
Plain delete reuses /files/bulk/delete.
- Repo support: ListMissingPHash, ListAllPHashes, CopyPoolMemberships, plus the
DuplicatePairRepo (ReplaceAll via COPY, ListVisible) and DismissalRepo.
Unit tests cover the BK-tree pairing, union-find clustering, metadata merge and
field validation; an integration test covers rescan -> list -> merge -> dismiss
(including that a dismissal survives a re-rescan).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Adds a 64-bit dHash perceptual hash (internal/imagehash, built on the existing
disintegration/imaging — no new dependency) and starts populating the long-unused
data.files.phash column:
- Upload sets phash inline for images (cheap, from the in-memory bytes).
- Replace recomputes it from new content for images and clears it for anything
else, so a stale hash never survives a content swap.
- FileRepo.SetPHash sets/clears the hash (used by Replace and, later, the dedup
backfill).
- DiskStorage.VideoFrameMiddle extracts a frame from the middle of a clip
(ffprobe duration -> ffmpeg -ss duration/2), avoiding the shared-intro collision
a fixed early offset causes. It is a concrete method, not part of the storage
port: only the dedup CLI needs it, keeping ffmpeg off the upload path. Video
phashes are therefore computed by that CLI, not at upload time.
- DUPLICATE_HASH_THRESHOLD config (default 10/64) for the later pair rescan.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Replaces the old "untagged" sentinel tag with a proper per-file workflow
status: needs_review starts true on upload/import and is cleared by an
explicit action (no auto-clear on tagging). Surfaced as a filter token
(r=1 needs review, r=0 done) so it combines with tag/MIME conditions, and
toggled via POST /files/bulk/review (single id or many, edit-ACL enforced,
audit-logged as file_review).
needs_review lives on data.files (column added to the original 003 migration,
partial index in 006, action type seeded in 007).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Listing files with a tag filter now logs each referenced tag to
activity.tag_uses, flagging it included (positive) or excluded (negated
under an odd number of NOTs); the untagged pseudo-token is skipped. The
filter AST is reused to determine polarity, so grouped negations like
!(A|B) mark both tags excluded.
Recording happens only when a filter is first applied — not on cursor
pagination or an anchored return — so one browse counts once. The write
is best-effort and never fails the listing.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The activity.file_views table existed but nothing ever wrote to it. Add a
POST /files/{id}/views endpoint: FileRepo.RecordView inserts a history row,
FileService.RecordView enforces view ACL first. The file viewer fires it
(fire-and-forget) when a file is opened, including while paging prev/next.
Documented in openapi.yaml; covered by TestRecordFileView (204 on view,
repeatable, 404 for unknown file).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Run gofmt -w across the backend, normalising the manually-aligned := blocks
to the gofmt standard. No code behaviour changes.
Add Prettier (+ prettier-plugin-svelte) to the frontend with the SvelteKit
default config (tabs, single quotes) so formatting is reproducible, then run
it over the whole tree. Add format / format:check npm scripts and a
.prettierignore (build output, generated schema.ts, static assets).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Listings returned every row regardless of ownership: GET /files, /tags,
/pools and /categories exposed other users' private items (while the
single-item GET correctly returned 403), and the pool file operations
(GET /pools/:id, /pools/:id/files, add/remove/reorder) skipped ACL
entirely, so any authenticated user could read and rewrite anyone's
private pool.
- List queries now filter to rows the caller may see (public, owned, or
granted can_view) via a shared SQL condition; admins bypass. The viewer
identity is taken from the request context by the service and passed to
the repository in the list params.
- Tag/Category/Pool single-item Get now enforce CanView (File already did).
- Pool Get/ListFiles require pool view; AddFiles/RemoveFiles/Reorder
require pool edit.
Adds regression tests for private-by-default listing (hidden / public /
granted / admin) and for pool operations rejecting a non-owner.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
- Make data.files.exif column nullable (was NOT NULL but service passes nil
for files without EXIF data, causing a constraint violation on upload)
- FileRepo.Create: include id in INSERT so disk storage path and DB record
share the same UUID (previously DB generated its own UUID, causing a mismatch)
- Integration test: use correct filter DSL format {t=<uuid>} instead of tag:<uuid>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
filter_parser.go — recursive-descent parser for the {token,...} DSL.
Tokens: t=UUID (tag), m=INT (MIME exact), m~PATTERN (MIME LIKE),
operators & | ! ( ) with standard NOT>AND>OR precedence.
All values go through pgx parameters ($N) — SQL injection impossible.
file_repo.go — full FileRepo:
- Create/GetByID/Update via CTE RETURNING with JOIN for one round-trip
- SoftDelete/Restore/DeletePermanent with RowsAffected guards
- SetTags: full replace (DELETE + INSERT per tag)
- ListTags: delegates to loadTagsBatch (single query for N files)
- List: keyset cursor pagination (bidirectional), anchor mode,
filter DSL, search ILIKE, trash flag, 4 sort columns.
Cursor is base64url(JSON) encoding sort position; backward
pagination fetches in reversed ORDER BY then reverses the slice.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>