chore(scripts): add legacy data migration
deploy / deploy (push) Successful in 5s

One-time migration from the old Python/Flask Tanabata DB into the new
core/data/acl/activity schema.

- transform.sql: reads a `legacy` schema and writes the new one in a single,
  idempotent transaction. Remaps user/mime ids (uuid -> smallint by name),
  inverts is_private -> is_public, lifts EXIF out of files.metadata into the
  exif column, preserves pool hierarchy/created under metadata, synthesises
  file_pool ordering, derives acl object types, sanitises colors/notes.
- migrate.sh: links the new DB to the old one via postgres_fdw, imports the
  old public schema as `legacy`, runs the transform, tears the link down.
- README.md: mapping table, decisions/lossy points, and the separate
  physical-blob copy step.
- docs/reference/schema.sql: the old DB schema the migration is built from
  (referenced by the README).

Verified end-to-end on PostgreSQL 16 (synthetic legacy data, all
transformations and idempotency checked).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
2026-06-11 12:42:06 +03:00
parent 437b66e73a
commit 76942721ad
4 changed files with 2701 additions and 0 deletions
File diff suppressed because it is too large Load Diff
+103
View File
@@ -0,0 +1,103 @@
# Legacy data migration
Moves data from the **old** Tanabata database (the Python/Flask version, schema
in [`docs/reference/schema.sql`](../../docs/reference/schema.sql)) into the
**new** `core` / `data` / `acl` / `activity` schema.
- [`transform.sql`](transform.sql) — the actual data transformation. Reads a
`legacy` schema (the old tables) and writes the new schema, in one
transaction. Idempotent.
- [`migrate.sh`](migrate.sh) — links the new DB to the old one via
`postgres_fdw`, imports the old `public` schema as `legacy`, runs
`transform.sql`, then removes the link. The old DB is only **read**.
Tested end-to-end against PostgreSQL 16 (schema applied, synthetic legacy data,
all transformations + idempotency verified).
## Prerequisites
1. The **new** schema exists and is seeded — start the app once (it runs the
goose migrations incl. `007_seed_data`), or run goose manually.
2. `NEW_DSN` connects as a role allowed to `CREATE EXTENSION postgres_fdw`
(a superuser — the compose Postgres' `POSTGRES_USER` is one).
3. The new Postgres server can reach the old DB host over the network.
4. `psql` on PATH.
## Run
```bash
cd scripts/migrate-legacy
NEW_DSN='postgres://tanabata:PASS@localhost:42777/tanabata' \
OLD_HOST=192.168.1.10 OLD_PORT=5432 OLD_DB=tfm \
OLD_USER=hiko OLD_PASSWORD=SECRET \
./migrate.sh
```
It prints the source (legacy) row counts, then the resulting new-schema counts.
Re-running is safe — `ON CONFLICT DO NOTHING` everywhere means a second run only
fills in what is missing.
### Without postgres_fdw
`transform.sql` only needs the old tables to be visible as a `legacy` schema. If
you'd rather not use fdw, load the old dump into a schema named `legacy` in the
new database by whatever means, then run just the transform:
```bash
psql "$NEW_DSN" -v ON_ERROR_STOP=1 -f transform.sql
```
## What gets migrated, and how
| Old (`public`) | New | Notes |
|-----------------------|-------------------------|-------|
| `users` | `core.users` | id **uuid → smallint** (remapped by unique `name`); `can_edit``can_create`; `is_blocked` = false |
| `mime` | `core.mime_types` | id **uuid → smallint** (remapped by `name`); types not already seeded are added |
| `categories` | `data.categories` | id kept; `is_private`**`is_public`** (inverted) |
| `tags` | `data.tags` | id + `category_id` kept; inverted privacy |
| `autotags` | `data.tag_rules` | `parent_id``when_tag_id`, `child_id``then_tag_id` |
| `files` | `data.files` | id kept; `datetime``content_datetime`; `orig_name``original_name`; **EXIF** lifted from `metadata->'exif'` into the `exif` column, the rest stays as user `metadata` |
| `file_tag` | `data.file_tag` | orphan rows skipped |
| `pools` | `data.pools` | id kept; `parent_id` + `created` preserved under `metadata` (see below) |
| `file_pool` | `data.file_pool` | `position` synthesised (gapped 1000s, ordered by file id) |
| `acl` | `acl.permissions` | object type **derived** by locating the object; `read`/`write``can_view`/`can_edit` |
| `file_views` | `activity.file_views` | `datetime``viewed_at` |
Throughout: empty `notes` (`''`) → `NULL`; colours that aren't 6-hex are set to
`NULL` (the old `CHECK` was `NOT VALID`, so bad values could exist).
### Decisions / lossy points
- **Passwords** are copied verbatim. If the old hashes are bcrypt (as the new
app expects) logins keep working; otherwise affected users need a reset.
- **`created` timestamps** on categories/tags/files are dropped — their UUIDv7
ids already encode creation time. Pools use random v4 ids, so their `created`
(and the dropped **pool hierarchy** `parent_id`) are preserved under
`data.pools.metadata` as `legacy_created` / `legacy_parent_id`.
- **`file_pool` ordering**: the old schema stored none, so position is generated
from file-id order (≈ chronological) with gaps of 1000.
- **Not migrated**: `sessions` / `user_agents` — the new app uses JWTs, so users
simply log in again. There were no audit-log / pool-view / tag-use tables in
the old schema, so those start empty. `phash` and `is_deleted` are new
(`NULL` / `false`).
## Physical files (separate, manual)
The script migrates the **database only**. File blobs must be copied too. The
new layout stores originals at `FILES_PATH/{uuid}` with **no extension**;
thumbnails/previews are regenerated on demand, so don't copy those. Because ids
are preserved, the old `{uuid}.{ext}` files map 1:1 — just strip the extension:
```bash
OLD_FILES=/srv/old-tanabata/files # old originals ({uuid}.{ext})
NEW_FILES=/var/lib/tanabata/files # new FILES_PATH
for src in "$OLD_FILES"/*; do
id="$(basename "$src")"; id="${id%.*}" # uuids contain no dots
cp -n "$src" "$NEW_FILES/$id"
done
# Make them readable by the container user (uid/gid 42776):
chown -R 42776:42776 "$NEW_FILES"
```
+92
View File
@@ -0,0 +1,92 @@
#!/usr/bin/env bash
# =============================================================================
# Tanabata legacy -> new schema migration (orchestrator)
#
# Connects the NEW database to the OLD one via postgres_fdw, imports the old
# `public` schema as `legacy`, runs transform.sql (the actual data move, in one
# transaction), then tears the foreign link down again. The OLD database is
# only read.
#
# Prerequisites:
# - The NEW schema already exists and is seeded (start the app once, or run
# goose, so all migrations incl. 007_seed_data have applied).
# - NEW_DSN connects as a role allowed to CREATE EXTENSION postgres_fdw
# (a superuser; the compose Postgres' POSTGRES_USER is one).
# - The NEW Postgres server can reach OLD_HOST:OLD_PORT over the network.
# - `psql` is on PATH.
#
# Usage:
# NEW_DSN='postgres://tanabata:pass@localhost:42777/tanabata' \
# OLD_HOST=192.168.1.10 OLD_DB=tfm OLD_USER=hiko OLD_PASSWORD=secret \
# ./migrate.sh
# =============================================================================
set -euo pipefail
# --- Config from the environment --------------------------------------------
NEW_DSN="${NEW_DSN:?set NEW_DSN to the new database connection string}"
OLD_HOST="${OLD_HOST:?set OLD_HOST}"
OLD_PORT="${OLD_PORT:-5432}"
OLD_DB="${OLD_DB:?set OLD_DB (old database name)}"
OLD_USER="${OLD_USER:?set OLD_USER}"
OLD_PASSWORD="${OLD_PASSWORD:?set OLD_PASSWORD}"
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
TRANSFORM_SQL="$SCRIPT_DIR/transform.sql"
psql_new() { psql "$NEW_DSN" -v ON_ERROR_STOP=1 "$@"; }
# --- Always remove the foreign link on exit, success or failure -------------
teardown() {
psql "$NEW_DSN" -q >/dev/null 2>&1 <<'SQL' || true
DROP SCHEMA IF EXISTS legacy CASCADE;
DROP SERVER IF EXISTS legacy_src CASCADE;
SQL
}
trap teardown EXIT
echo ">> Linking NEW database to OLD ($OLD_USER@$OLD_HOST:$OLD_PORT/$OLD_DB) via postgres_fdw ..."
psql_new \
-v old_host="$OLD_HOST" \
-v old_port="$OLD_PORT" \
-v old_db="$OLD_DB" \
-v old_user="$OLD_USER" \
-v old_pw="$OLD_PASSWORD" <<'SQL'
CREATE EXTENSION IF NOT EXISTS postgres_fdw;
-- Start clean in case a previous run was interrupted.
DROP SCHEMA IF EXISTS legacy CASCADE;
DROP SERVER IF EXISTS legacy_src CASCADE;
CREATE SERVER legacy_src FOREIGN DATA WRAPPER postgres_fdw
OPTIONS (host :'old_host', port :'old_port', dbname :'old_db');
-- :'old_user' / :'old_pw' are quoted+escaped by psql, so passwords with
-- special characters are safe.
CREATE USER MAPPING FOR CURRENT_USER SERVER legacy_src
OPTIONS (user :'old_user', password :'old_pw');
CREATE SCHEMA legacy;
IMPORT FOREIGN SCHEMA public LIMIT TO (
users, mime, categories, tags, autotags, files, file_tag, pools, file_pool, acl, file_views
) FROM SERVER legacy_src INTO legacy;
SQL
echo ">> Source (legacy) row counts:"
psql_new -P pager=off -c "
SELECT 'users' AS table, count(*) FROM legacy.users
UNION ALL SELECT 'mime', count(*) FROM legacy.mime
UNION ALL SELECT 'categories', count(*) FROM legacy.categories
UNION ALL SELECT 'tags', count(*) FROM legacy.tags
UNION ALL SELECT 'autotags', count(*) FROM legacy.autotags
UNION ALL SELECT 'files', count(*) FROM legacy.files
UNION ALL SELECT 'file_tag', count(*) FROM legacy.file_tag
UNION ALL SELECT 'pools', count(*) FROM legacy.pools
UNION ALL SELECT 'file_pool', count(*) FROM legacy.file_pool
UNION ALL SELECT 'acl', count(*) FROM legacy.acl
UNION ALL SELECT 'file_views', count(*) FROM legacy.file_views
ORDER BY 1;"
echo ">> Running transform (single transaction) ..."
psql_new -P pager=off -f "$TRANSFORM_SQL"
echo ">> Done. The foreign link will be removed now."
+220
View File
@@ -0,0 +1,220 @@
-- =============================================================================
-- Tanabata legacy -> new schema data migration (transform step)
--
-- Reads the OLD database (exposed as the `legacy` schema — see migrate.sh, which
-- imports it via postgres_fdw) and inserts the transformed rows into the new
-- core / data / acl / activity schemas.
--
-- Assumes the new schema already exists (goose migrations applied) and is seeded
-- (core.mime_types, core.object_types from 007_seed_data.sql).
--
-- Idempotent: ON CONFLICT DO NOTHING everywhere + preserved UUID PKs, so a
-- re-run inserts only what is missing. Runs as one transaction — all or nothing.
--
-- Run with: psql "<new-dsn>" -v ON_ERROR_STOP=1 -f transform.sql
-- (migrate.sh does this for you after setting up the `legacy` schema.)
-- =============================================================================
\set ON_ERROR_STOP on
-- Fail early and clearly if the legacy data hasn't been made available.
DO $$
BEGIN
IF to_regclass('legacy.users') IS NULL THEN
RAISE EXCEPTION
'legacy.* tables not found. Populate the "legacy" schema first '
'(run migrate.sh, or load the old dump into a schema named legacy).';
END IF;
END $$;
BEGIN;
-- ---------------------------------------------------------------------------
-- 1. Users. Old PK is uuid; the new table uses a smallint identity. Insert by
-- the unique `name`, then build a uuid -> smallint map used by every FK below.
-- Old `can_edit` becomes the new `can_create`; nobody is blocked on import.
-- ---------------------------------------------------------------------------
INSERT INTO core.users (name, password, is_admin, can_create, is_blocked)
SELECT name, password, is_admin, can_edit, false
FROM legacy.users
ON CONFLICT (name) DO NOTHING;
CREATE TEMP TABLE user_id_map ON COMMIT DROP AS
SELECT lu.id AS old_id, nu.id AS new_id
FROM legacy.users lu
JOIN core.users nu ON nu.name = lu.name;
-- ---------------------------------------------------------------------------
-- 2. MIME types. Same uuid -> smallint remap, keyed by the MIME name. The new
-- DB is pre-seeded with the common types; add any legacy ones not seeded.
-- ---------------------------------------------------------------------------
INSERT INTO core.mime_types (name, extension)
SELECT name, extension
FROM legacy.mime
ON CONFLICT (name) DO NOTHING;
CREATE TEMP TABLE mime_id_map ON COMMIT DROP AS
SELECT lm.id AS old_id, nm.id AS new_id
FROM legacy.mime lm
JOIN core.mime_types nm ON nm.name = lm.name;
-- ---------------------------------------------------------------------------
-- 3. Categories. UUID PK preserved. is_private -> is_public (inverted),
-- '' notes -> NULL, non-hex colors -> NULL (to satisfy the hex CHECK that the
-- old NOT VALID constraint may not have enforced on existing rows).
-- ---------------------------------------------------------------------------
INSERT INTO data.categories (id, name, notes, color, metadata, creator_id, is_public)
SELECT c.id,
c.name,
NULLIF(c.notes, ''),
CASE WHEN c.color ~* '^[A-Fa-f0-9]{6}$' THEN c.color END,
NULL,
um.new_id,
NOT c.is_private
FROM legacy.categories c
JOIN user_id_map um ON um.old_id = c.creator_id
ON CONFLICT (id) DO NOTHING;
-- ---------------------------------------------------------------------------
-- 4. Tags. UUID PK + category_id preserved.
-- ---------------------------------------------------------------------------
INSERT INTO data.tags (id, name, notes, color, category_id, metadata, creator_id, is_public)
SELECT t.id,
t.name,
NULLIF(t.notes, ''),
CASE WHEN t.color ~* '^[A-Fa-f0-9]{6}$' THEN t.color END,
t.category_id,
NULL,
um.new_id,
NOT t.is_private
FROM legacy.tags t
JOIN user_id_map um ON um.old_id = t.creator_id
ON CONFLICT (id) DO NOTHING;
-- ---------------------------------------------------------------------------
-- 5. Tag rules (old `autotags`): parent -> when_tag, child -> then_tag.
-- Skip rules whose tags didn't migrate.
-- ---------------------------------------------------------------------------
INSERT INTO data.tag_rules (when_tag_id, then_tag_id, is_active)
SELECT a.parent_id, a.child_id, a.is_active
FROM legacy.autotags a
WHERE EXISTS (SELECT 1 FROM data.tags t WHERE t.id = a.parent_id)
AND EXISTS (SELECT 1 FROM data.tags t WHERE t.id = a.child_id)
ON CONFLICT (when_tag_id, then_tag_id) DO NOTHING;
-- ---------------------------------------------------------------------------
-- 6. Files. UUID PK preserved. old `datetime` -> content_datetime,
-- `orig_name` -> original_name. EXIF is lifted out of the old metadata blob
-- into its own column; whatever else was in metadata stays as user metadata
-- (NULL if nothing remains). No phash / soft-delete existed before.
-- ---------------------------------------------------------------------------
INSERT INTO data.files (id, original_name, mime_id, content_datetime, notes,
metadata, exif, phash, creator_id, is_public, is_deleted)
SELECT f.id,
f.orig_name,
mm.new_id,
f.datetime,
NULLIF(f.notes, ''),
NULLIF(f.metadata - 'exif', '{}'::jsonb),
f.metadata -> 'exif',
NULL,
um.new_id,
NOT f.is_private,
false
FROM legacy.files f
JOIN user_id_map um ON um.old_id = f.creator_id
JOIN mime_id_map mm ON mm.old_id = f.mime_id
ON CONFLICT (id) DO NOTHING;
-- ---------------------------------------------------------------------------
-- 7. File <-> tag. Skip orphan junction rows.
-- ---------------------------------------------------------------------------
INSERT INTO data.file_tag (file_id, tag_id)
SELECT ft.file_id, ft.tag_id
FROM legacy.file_tag ft
WHERE EXISTS (SELECT 1 FROM data.files f WHERE f.id = ft.file_id)
AND EXISTS (SELECT 1 FROM data.tags t WHERE t.id = ft.tag_id)
ON CONFLICT DO NOTHING;
-- ---------------------------------------------------------------------------
-- 8. Pools. UUID PK preserved. The new schema has neither pool hierarchy nor a
-- `created` column, so the legacy parent_id and created timestamp are kept
-- under metadata (pool ids are random v4, so created isn't otherwise
-- recoverable). is_private -> is_public.
-- ---------------------------------------------------------------------------
INSERT INTO data.pools (id, name, notes, metadata, creator_id, is_public)
SELECT p.id,
p.name,
NULLIF(p.notes, ''),
jsonb_strip_nulls(jsonb_build_object(
'legacy_parent_id', p.parent_id,
'legacy_created', p.created)),
um.new_id,
NOT p.is_private
FROM legacy.pools p
JOIN user_id_map um ON um.old_id = p.creator_id
ON CONFLICT (id) DO NOTHING;
-- ---------------------------------------------------------------------------
-- 9. File <-> pool. The old table has no ordering column; synthesise a stable
-- gapped position per pool, ordered by file id (UUID v7 ≈ chronological), so
-- the app's gap-based reordering keeps working.
-- ---------------------------------------------------------------------------
INSERT INTO data.file_pool (file_id, pool_id, position)
SELECT fp.file_id,
fp.pool_id,
(row_number() OVER (PARTITION BY fp.pool_id ORDER BY fp.file_id))::int * 1000
FROM legacy.file_pool fp
WHERE EXISTS (SELECT 1 FROM data.files f WHERE f.id = fp.file_id)
AND EXISTS (SELECT 1 FROM data.pools p WHERE p.id = fp.pool_id)
ON CONFLICT DO NOTHING;
-- ---------------------------------------------------------------------------
-- 10. ACL. The old table stored no object type; derive it by locating the
-- object among files/tags/categories/pools. read/write -> can_view/can_edit.
-- Rows whose object no longer exists are skipped.
-- ---------------------------------------------------------------------------
INSERT INTO acl.permissions (user_id, object_type_id, object_id, can_view, can_edit)
SELECT um.new_id, ot.id, a.object_id, a.read, a.write
FROM legacy.acl a
JOIN user_id_map um ON um.old_id = a.user_id
JOIN LATERAL (
SELECT CASE
WHEN EXISTS (SELECT 1 FROM data.files f WHERE f.id = a.object_id) THEN 'file'
WHEN EXISTS (SELECT 1 FROM data.tags t WHERE t.id = a.object_id) THEN 'tag'
WHEN EXISTS (SELECT 1 FROM data.categories c WHERE c.id = a.object_id) THEN 'category'
WHEN EXISTS (SELECT 1 FROM data.pools p WHERE p.id = a.object_id) THEN 'pool'
END AS type_name
) k ON true
JOIN core.object_types ot ON ot.name = k.type_name
ON CONFLICT (user_id, object_type_id, object_id) DO NOTHING;
-- ---------------------------------------------------------------------------
-- 11. File view history. old `datetime` -> viewed_at.
-- ---------------------------------------------------------------------------
INSERT INTO activity.file_views (file_id, user_id, viewed_at)
SELECT fv.file_id, um.new_id, fv.datetime
FROM legacy.file_views fv
JOIN user_id_map um ON um.old_id = fv.user_id
WHERE EXISTS (SELECT 1 FROM data.files f WHERE f.id = fv.file_id)
ON CONFLICT DO NOTHING;
COMMIT;
-- ---------------------------------------------------------------------------
-- Summary of what now lives in the new schema.
-- ---------------------------------------------------------------------------
\echo ''
\echo 'Migration committed. New row counts:'
SELECT 'core.users' AS table, count(*) FROM core.users
UNION ALL SELECT 'core.mime_types', count(*) FROM core.mime_types
UNION ALL SELECT 'data.categories', count(*) FROM data.categories
UNION ALL SELECT 'data.tags', count(*) FROM data.tags
UNION ALL SELECT 'data.tag_rules', count(*) FROM data.tag_rules
UNION ALL SELECT 'data.files', count(*) FROM data.files
UNION ALL SELECT 'data.file_tag', count(*) FROM data.file_tag
UNION ALL SELECT 'data.pools', count(*) FROM data.pools
UNION ALL SELECT 'data.file_pool', count(*) FROM data.file_pool
UNION ALL SELECT 'acl.permissions', count(*) FROM acl.permissions
UNION ALL SELECT 'activity.file_views', count(*) FROM activity.file_views
ORDER BY 1;