9216a8687f
Adds the duplicate-detection backend on top of perceptual hashing:
- Two tables (edited into the original migrations): data.duplicate_pairs holds
precomputed near-duplicate candidates (rebuilt wholesale by the rescan), and
data.duplicate_dismissals is a global "not a duplicate" overlay that survives
rescans. New audit actions file_merge / duplicate_dismiss.
- DuplicateService:
- Rescan builds every pair within DUPLICATE_HASH_THRESHOLD via a BK-tree over
the perceptual hashes and replaces the pairs table. This is the only thing
that populates pairs, so GET never compares all-vs-all (scales to 110k+).
- Clusters reads the precomputed pairs (ACL-filtered, non-trashed, non-
dismissed), groups them into connected components via union-find, and
paginates whole clusters.
- Resolve merges a pair field-by-field: each scalar from keep or discard,
metadata keep/discard/shallow-merge, tags/pools keep or union; then trashes
the discarded file. Enforces edit ACL on both.
- Dismiss records a canonical pair (view ACL on both).
- Endpoints under /files: GET /files/duplicates, POST /files/duplicates/dismiss,
POST /files/duplicates/resolve (registered before /:id to avoid collision).
Plain delete reuses /files/bulk/delete.
- Repo support: ListMissingPHash, ListAllPHashes, CopyPoolMemberships, plus the
DuplicatePairRepo (ReplaceAll via COPY, ListVisible) and DismissalRepo.
Unit tests cover the BK-tree pairing, union-find clustering, metadata merge and
field validation; an integration test covers rescan -> list -> merge -> dismiss
(including that a dismissal survives a re-rescan).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
51 lines
1.6 KiB
SQL
51 lines
1.6 KiB
SQL
-- +goose Up
|
|
|
|
INSERT INTO core.mime_types (name, extension) VALUES
|
|
('image/jpeg', 'jpg'),
|
|
('image/png', 'png'),
|
|
('image/gif', 'gif'),
|
|
('image/webp', 'webp'),
|
|
('video/mp4', 'mp4'),
|
|
('video/quicktime', 'mov'),
|
|
('video/x-msvideo', 'avi'),
|
|
('video/webm', 'webm'),
|
|
('video/3gpp', '3gp'),
|
|
('video/x-m4v', 'm4v');
|
|
|
|
INSERT INTO core.object_types (name) VALUES
|
|
('file'), ('tag'), ('category'), ('pool');
|
|
|
|
INSERT INTO activity.action_types (name) VALUES
|
|
-- Auth
|
|
('user_login'), ('user_logout'),
|
|
-- Files
|
|
('file_create'), ('file_edit'), ('file_delete'), ('file_restore'),
|
|
('file_permanent_delete'), ('file_replace'), ('file_review'),
|
|
('file_merge'), ('duplicate_dismiss'),
|
|
-- Tags
|
|
('tag_create'), ('tag_edit'), ('tag_delete'),
|
|
-- Categories
|
|
('category_create'), ('category_edit'), ('category_delete'),
|
|
-- Pools
|
|
('pool_create'), ('pool_edit'), ('pool_delete'),
|
|
-- Relations
|
|
('file_tag_add'), ('file_tag_remove'),
|
|
('file_pool_add'), ('file_pool_remove'),
|
|
-- ACL
|
|
('acl_change'),
|
|
-- Admin
|
|
('user_create'), ('user_delete'), ('user_block'), ('user_unblock'),
|
|
('user_role_change'),
|
|
-- Sessions
|
|
('session_terminate');
|
|
|
|
-- The initial administrator is created at application startup from the
|
|
-- ADMIN_USERNAME / ADMIN_PASSWORD environment variables (see UserService.
|
|
-- EnsureAdmin), so no default credentials are seeded here.
|
|
|
|
-- +goose Down
|
|
|
|
DELETE FROM activity.action_types;
|
|
DELETE FROM core.object_types;
|
|
DELETE FROM core.mime_types;
|