Adds the duplicate-detection backend on top of perceptual hashing:
- Two tables (edited into the original migrations): data.duplicate_pairs holds
precomputed near-duplicate candidates (rebuilt wholesale by the rescan), and
data.duplicate_dismissals is a global "not a duplicate" overlay that survives
rescans. New audit actions file_merge / duplicate_dismiss.
- DuplicateService:
- Rescan builds every pair within DUPLICATE_HASH_THRESHOLD via a BK-tree over
the perceptual hashes and replaces the pairs table. This is the only thing
that populates pairs, so GET never compares all-vs-all (scales to 110k+).
- Clusters reads the precomputed pairs (ACL-filtered, non-trashed, non-
dismissed), groups them into connected components via union-find, and
paginates whole clusters.
- Resolve merges a pair field-by-field: each scalar from keep or discard,
metadata keep/discard/shallow-merge, tags/pools keep or union; then trashes
the discarded file. Enforces edit ACL on both.
- Dismiss records a canonical pair (view ACL on both).
- Endpoints under /files: GET /files/duplicates, POST /files/duplicates/dismiss,
POST /files/duplicates/resolve (registered before /:id to avoid collision).
Plain delete reuses /files/bulk/delete.
- Repo support: ListMissingPHash, ListAllPHashes, CopyPoolMemberships, plus the
DuplicatePairRepo (ReplaceAll via COPY, ListVisible) and DismissalRepo.
Unit tests cover the BK-tree pairing, union-find clustering, metadata merge and
field validation; an integration test covers rescan -> list -> merge -> dismiss
(including that a dismissal survives a re-rescan).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Replaces the old "untagged" sentinel tag with a proper per-file workflow
status: needs_review starts true on upload/import and is cleared by an
explicit action (no auto-clear on tagging). Surfaced as a filter token
(r=1 needs review, r=0 done) so it combines with tag/MIME conditions, and
toggled via POST /files/bulk/review (single id or many, edit-ACL enforced,
audit-logged as file_review).
needs_review lives on data.files (column added to the original 003 migration,
partial index in 006, action type seeded in 007).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
007_seed_data.sql shipped a fixed admin account whose bcrypt hash decodes
to the password "admin", giving every deployment the same known
credentials. The seed row is removed; UserService.EnsureAdmin now creates
the administrator on startup from ADMIN_USERNAME / ADMIN_PASSWORD. It is
idempotent and never overwrites an existing password, so an operator who
rotates the admin password keeps it across restarts.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
007_seed_data.sql: insert 10 MIME types (4 image, 6 video) with their
canonical extensions into core.mime_types.
disk.go: register golang.org/x/image/webp decoder so imaging.Open
handles WebP still images. Videos (mp4, mov, avi, webm, 3gp, m4v)
continue to go through the ffmpeg frame-extraction path.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
cmd/server/main.go: replace stub router with full wiring —
UserRepo, SessionRepo, AuthService, AuthMiddleware, AuthHandler,
NewRouter; use postgres.NewPool instead of pgxpool.New directly.
migrations/001_init_schemas.sql: wrap uuid_v7 and uuid_extract_timestamp
function bodies with goose StatementBegin/End so semicolons inside
dollar-quoted strings are not treated as statement separators.
migrations/007_seed_data.sql: add seed admin user (admin/admin,
bcrypt cost 10, is_admin=true, can_create=true) for manual testing.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>