Commit Graph

3 Commits

Author SHA1 Message Date
H1K0 9216a8687f feat(backend): duplicate pairs, dismissals, and merge resolution
Adds the duplicate-detection backend on top of perceptual hashing:

- Two tables (edited into the original migrations): data.duplicate_pairs holds
  precomputed near-duplicate candidates (rebuilt wholesale by the rescan), and
  data.duplicate_dismissals is a global "not a duplicate" overlay that survives
  rescans. New audit actions file_merge / duplicate_dismiss.
- DuplicateService:
  - Rescan builds every pair within DUPLICATE_HASH_THRESHOLD via a BK-tree over
    the perceptual hashes and replaces the pairs table. This is the only thing
    that populates pairs, so GET never compares all-vs-all (scales to 110k+).
  - Clusters reads the precomputed pairs (ACL-filtered, non-trashed, non-
    dismissed), groups them into connected components via union-find, and
    paginates whole clusters.
  - Resolve merges a pair field-by-field: each scalar from keep or discard,
    metadata keep/discard/shallow-merge, tags/pools keep or union; then trashes
    the discarded file. Enforces edit ACL on both.
  - Dismiss records a canonical pair (view ACL on both).
- Endpoints under /files: GET /files/duplicates, POST /files/duplicates/dismiss,
  POST /files/duplicates/resolve (registered before /:id to avoid collision).
  Plain delete reuses /files/bulk/delete.
- Repo support: ListMissingPHash, ListAllPHashes, CopyPoolMemberships, plus the
  DuplicatePairRepo (ReplaceAll via COPY, ListVisible) and DismissalRepo.

Unit tests cover the BK-tree pairing, union-find clustering, metadata merge and
field validation; an integration test covers rescan -> list -> merge -> dismiss
(including that a dismissal survives a re-rescan).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-16 12:42:37 +03:00
H1K0 48e901cac1 feat(backend): per-file review status with DSL filter and bulk endpoint
Replaces the old "untagged" sentinel tag with a proper per-file workflow
status: needs_review starts true on upload/import and is cleared by an
explicit action (no auto-clear on tagging). Surfaced as a filter token
(r=1 needs review, r=0 done) so it combines with tag/MIME conditions, and
toggled via POST /files/bulk/review (single id or many, edit-ACL enforced,
audit-logged as file_review).

needs_review lives on data.files (column added to the original 003 migration,
partial index in 006, action type seeded in 007).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-15 21:16:47 +03:00
H1K0 ecad017274 refactor(backend): split monolithic migration into 7 goose files
001_init_schemas  — extensions, schemas, uuid_v7 functions
002_core_tables   — core.users, mime_types, object_types
003_data_tables   — data.categories, tags, tag_rules, files, file_tag, pools, file_pool
004_acl_tables    — acl.permissions
005_activity_tables — activity.action_types, sessions, file_views, pool_views, tag_uses, audit_log
006_indexes       — all indexes across all schemas
007_seed_data     — object_types and action_types reference rows

Each file has -- +goose Up / Down annotations; downs drop in reverse
dependency order.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-03 18:40:36 +03:00