tech/tanabata

Fork 0

Files

T

History

H1K0 76942721ad

deploy / deploy (push) Successful in 5s

Details

chore(scripts): add legacy data migration

One-time migration from the old Python/Flask Tanabata DB into the new
core/data/acl/activity schema.

- transform.sql: reads a `legacy` schema and writes the new one in a single,
  idempotent transaction. Remaps user/mime ids (uuid -> smallint by name),
  inverts is_private -> is_public, lifts EXIF out of files.metadata into the
  exif column, preserves pool hierarchy/created under metadata, synthesises
  file_pool ordering, derives acl object types, sanitises colors/notes.
- migrate.sh: links the new DB to the old one via postgres_fdw, imports the
  old public schema as `legacy`, runs the transform, tears the link down.
- README.md: mapping table, decisions/lossy points, and the separate
  physical-blob copy step.
- docs/reference/schema.sql: the old DB schema the migration is built from
  (referenced by the README).

Verified end-to-end on PostgreSQL 16 (synthetic legacy data, all
transformations and idempotency checked).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

2026-06-11 12:43:43 +03:00

migrate.sh

chore(scripts): add legacy data migration

2026-06-11 12:43:43 +03:00

README.md

chore(scripts): add legacy data migration

2026-06-11 12:43:43 +03:00

transform.sql

chore(scripts): add legacy data migration

2026-06-11 12:43:43 +03:00

README.md

Legacy data migration

Moves data from the old Tanabata database (the Python/Flask version, schema in docs/reference/schema.sql) into the new core / data / acl / activity schema.

transform.sql — the actual data transformation. Reads a legacy schema (the old tables) and writes the new schema, in one transaction. Idempotent.
migrate.sh — links the new DB to the old one via postgres_fdw, imports the old public schema as legacy, runs transform.sql, then removes the link. The old DB is only read.

Tested end-to-end against PostgreSQL 16 (schema applied, synthetic legacy data, all transformations + idempotency verified).

Prerequisites

The new schema exists and is seeded — start the app once (it runs the goose migrations incl. 007_seed_data), or run goose manually.
NEW_DSN connects as a role allowed to CREATE EXTENSION postgres_fdw (a superuser — the compose Postgres' POSTGRES_USER is one).
The new Postgres server can reach the old DB host over the network.
psql on PATH.

Run

cd scripts/migrate-legacy

NEW_DSN='postgres://tanabata:PASS@localhost:42777/tanabata' \
OLD_HOST=192.168.1.10 OLD_PORT=5432 OLD_DB=tfm \
OLD_USER=hiko OLD_PASSWORD=SECRET \
./migrate.sh

It prints the source (legacy) row counts, then the resulting new-schema counts. Re-running is safe — ON CONFLICT DO NOTHING everywhere means a second run only fills in what is missing.

Without postgres_fdw

transform.sql only needs the old tables to be visible as a legacy schema. If you'd rather not use fdw, load the old dump into a schema named legacy in the new database by whatever means, then run just the transform:

psql "$NEW_DSN" -v ON_ERROR_STOP=1 -f transform.sql

What gets migrated, and how

Old (`public`)	New	Notes
`users`	`core.users`	id uuid → smallint (remapped by unique `name`); `can_edit` → `can_create`; `is_blocked` = false
`mime`	`core.mime_types`	id uuid → smallint (remapped by `name`); types not already seeded are added
`categories`	`data.categories`	id kept; `is_private` → `is_public` (inverted)
`tags`	`data.tags`	id + `category_id` kept; inverted privacy
`autotags`	`data.tag_rules`	`parent_id` → `when_tag_id`, `child_id` → `then_tag_id`
`files`	`data.files`	id kept; `datetime` → `content_datetime`; `orig_name` → `original_name`; EXIF lifted from `metadata->'exif'` into the `exif` column, the rest stays as user `metadata`
`file_tag`	`data.file_tag`	orphan rows skipped
`pools`	`data.pools`	id kept; `parent_id` + `created` preserved under `metadata` (see below)
`file_pool`	`data.file_pool`	`position` synthesised (gapped 1000s, ordered by file id)
`acl`	`acl.permissions`	object type derived by locating the object; `read`/`write` → `can_view`/`can_edit`
`file_views`	`activity.file_views`	`datetime` → `viewed_at`

Throughout: empty notes ('') → NULL; colours that aren't 6-hex are set to NULL (the old CHECK was NOT VALID, so bad values could exist).

Decisions / lossy points

Passwords are copied verbatim. If the old hashes are bcrypt (as the new app expects) logins keep working; otherwise affected users need a reset.
created timestamps on categories/tags/files are dropped — their UUIDv7 ids already encode creation time. Pools use random v4 ids, so their created (and the dropped pool hierarchy parent_id) are preserved under data.pools.metadata as legacy_created / legacy_parent_id.
file_pool ordering: the old schema stored none, so position is generated from file-id order (≈ chronological) with gaps of 1000.
Not migrated: sessions / user_agents — the new app uses JWTs, so users simply log in again. There were no audit-log / pool-view / tag-use tables in the old schema, so those start empty. phash and is_deleted are new (NULL / false).

Physical files (separate, manual)

The script migrates the database only. File blobs must be copied too. The new layout stores originals at FILES_PATH/{uuid} with no extension; thumbnails/previews are regenerated on demand, so don't copy those. Because ids are preserved, the old {uuid}.{ext} files map 1:1 — just strip the extension:

OLD_FILES=/srv/old-tanabata/files     # old originals ({uuid}.{ext})
NEW_FILES=/var/lib/tanabata/files     # new FILES_PATH

for src in "$OLD_FILES"/*; do
  id="$(basename "$src")"; id="${id%.*}"   # uuids contain no dots
  cp -n "$src" "$NEW_FILES/$id"
done

# Make them readable by the container user (uid/gid 42776):
chown -R 42776:42776 "$NEW_FILES"