Skip to content

reco-algo branch — hub search overhaul + reco instrumentation

Branch cut from main on 2026-04-23 to fix Jefray's "not all published games can be found through search" report. Scope expanded to cover three related search-quality issues, then further expanded to bootstrap the recommendation system with behavioral tracking + a first-pass ranking improvement (freshness).

Status: RESUMED on 2026-04-25. Coordination resolved — vk/adea-reload-plugin is being discarded and its better-on-specific-axes pieces have been integrated into this branch directly (Phase 6 below). Reco work continues from here.

Commits on origin/reco-algo (bottom → top)

SHAPhaseSummary
8716ab00f1Unhide 44 published language-variant worlds in hub search; search extendedDescription + multilang variant description.
8af9ca575Dev proxy fix (Hono → Vite HMR invalid upgrade header spam). Not user-facing.
439d3a30a1.5Search goes global — favorited/library worlds stop being hidden; tab filters (Following, Recommended) stop scoping the query.
d5ea96b182Postgres FTS (tsvector + ts_rank) + pg_trgm fuzzy typo match + GIN indexes.
eb63832b22.5OpenCC simplified/traditional Chinese bridge (壶 ↔ 壺).
16435054dThis doc.
ce0c43d7bRenumber our migrations 0022/0023 → 0024/0025 to make room for main's 0022_worlds_target_audience.sql.
17dde8cc2Merge main (audience feature) into reco-algo.
2d9e290253PostHog instrumentation: hub_impression / hub_click / hub_preview_open / hub_preview_close / hub_serve. Client + server wrappers with env tagging + test bailout.
e3fed08453+Additional conversion events: hub_play_start (click "Start Playing"), hub_library_add (click "Add to Library").
2b5128b135Freshness boost: +15 points at publish, linear decay to 0 over 14 days. Added to qualityBase in scoreRecommendationBase.
(this commit)6Integrated vk/adea-reload-plugin improvements: soft/hard penalty tiering (favorites/reviews → soft, library → hard), creator + tag-cluster diversity (CREATOR_PENALTY_WEIGHT=12, TAG_CLUSTER_PENALTY_WEIGHT=7), DB-backed editorial boosts (editorial_boosts + featured_worlds tables, admin CRUD, 60s cache), public /api/worlds/featured, diverse cold-start tag buckets, hub-card StatusBadges (favorited/reviewed). Migration 0026 seeded with the prior hardcoded boosts so behavior is preserved on day-one.

Each commit's message on-branch has full context and rationale.

What actually changed

For users (search overhaul — Phases 1–2.5)

  • Searching "Sakura Season" on the Recommended tab returns the English variant instead of silently missing it.
  • Favoriting a world no longer hides it from your own search results.
  • Typing on the Following tab searches all creators, not just ones you follow.
  • "dectective" / "frirren" (typos) return the intended worlds.
  • Simplified Chinese queries find traditional Chinese content and vice versa.

For users (reco overhaul — Phase 5)

  • New worlds get a freshness boost that lets them compete with mature high-download worlds for their first 14 days.

For the team (Phase 3)

  • Every hub interaction now emits typed PostHog events tagged with environment (production/development/preview). Dashboards can slice prod vs dev without cross-pollution.
  • Server-emitted hub_serve event contains the ordered list of world_ids returned, plus feed parameters (surface, sort, tier, has_query, tag_count). Client events echo feed_request_id so impressions/clicks join back to the server response.
  • Identity hook: identifyAnalyticsUser(userId) fires on profile load, resetAnalyticsUser() on logout. Authenticated events land on the right PostHog person.

For prod data

  • Audit (scripts/audit-search-coverage.ts) found 44 published variant worlds hidden by the old dedup rule. All 44 become findable once Phase 1 deploys. Zero drift between is_published and status='published'.

Test coverage (50 server tests, all passing)

  • packages/server/src/lib/recommendations.search.test.ts — 12 tests that serialize the generated SQL via Drizzle's PgDialect and assert WHERE-clause structure. No DB required.
  • packages/server/src/lib/recommendations.scope.test.ts — 6 unit tests for rankRecommendedWorlds behavior under search (favorited visibility, own-work exclusion, whitespace handling).
  • packages/server/src/lib/recommendations.test.ts — freshness regression test (full 15pt at age 0, ~7.5pt at 7d, 0pt at 30d).

Phase 6 — vk/adea-reload-plugin integration (landed 2026-04-25)

The competing branch was discarded; its differentiated pieces were ported into reco-algo directly. Behavior at the boundaries:

  • Soft/hard penalty tieringRecommendationProfile gains softPenaltyWorldIds: Map<string, number>, favoritedWorldIds: Set, reviewedWorldIds: Set. Favorites and reviews stop adding to excludedWorldIds; only library entries with no sibling variant remain hard-excluded. scoreRecommendationBase subtracts the soft-penalty value from baseScore. Brackets: fav={fresh:12, mid:8, old:4}, review={fresh:10, mid:7, old:4}, buckets at 7d / 30d / older. Multi-signal worlds keep the larger penalty (Math.max).
  • Diversity weights enabledCREATOR_PENALTY_WEIGHT=12 (was 0), TAG_CLUSTER_PENALTY_WEIGHT=7 (new). Tag-cluster uses the candidate's first tag as the dominant cluster signal (creators put intent first).
  • Editorial DB tables — new editorial_boosts (with XOR check on worldId vs languageGroupId, plus optional starts_at / ends_at) and featured_worlds (slot-unique). Migration 0026 seeds the prior hardcoded values so day-one behavior is preserved. Admin CRUD at POST/DELETE /api/admin/editorial/boosts, GET/PUT/DELETE /api/admin/editorial/featured/:slot. Reco profile picks up boosts via a closure (profile.editorialBoostFor) that closes over a 60s-TTL cache snapshot.
  • Public featured endpointGET /api/worlds/featured reads the featured_worlds table, joins worlds, applies language-variant resolution, returns slot-ordered. 60s Cache-Control header.
  • Cold-start tag fan-out — exported COLD_START_BUCKETS (5 buckets × 8 candidates each: 角色卡 / 世界卡 / 模拟器 / 小说 / 游戏). DEFAULT_ROUTE_LIMITS.cold.popularRecent reduced 48 → 16 to make room. Tag-cluster rerank then enforces diversity across the head.
  • Hub-card badges — new StatusBadges component renders Favorited / Reviewed pills overlaid top-left on cards. Wired in hub-card.tsx, featured-section.tsx. rec-tab.tsx and following-tab.tsx subscribe to useFavoritesStore so pills flip instantly on toggle.

Phase 1.5 cleanup

The !filters.query && excludedWorldIds.has(…) gate stayed but its purpose narrowed: with favorites/reviews now soft, it only hides single-version library items from non-search browsing. Comment in recommendations.ts updated to reflect the narrower role.

What did NOT come over from vk

  • Most of vk's reco-test diff was carried (soft penalty brackets, diversity penalties, cold-start buckets). The editorial.test.ts file landed verbatim. Tests in recommendations.scope.test.ts remain valid since they exercise excludedWorldIds directly, which still hard-excludes library items.

Ship-to-production procedure

Do not merge to main before running migrations on the prod DB. Two new columns + indexes need to exist before the deployed code queries them.

Prerequisites

  • Postgres 17 psql locally:

    bash
    brew install postgresql@17

    Use it via the absolute path to avoid conflicts with any older brew-linked version: /usr/local/opt/postgresql@17/bin/psql.

  • Prod DATABASE_URL from Railway:

    Railway → Yumina project → Postgres service → Variables → DATABASE_URL

    The public URL (used from outside Railway) looks like postgresql://postgres:***@switchyard.proxy.rlwy.net:PORT/railway. Do NOT use postgres.railway.internal — that hostname is Railway-internal only.

Steps

Note on main's audience feature: this branch has been merged with main's feat(audience) changes, which add a target_audience column to worlds. Main's migration 0022_worlds_target_audience.sql is a backfill only — the column itself is added via pnpm db:push (drizzle generates the ALTER). If main has already shipped to prod, the column + backfill are already applied and you can skip step 0 below. If not (e.g., this PR is the first time audience hits prod), run step 0 first.

0. (If audience feature is NEW on prod) Add the target_audience column and backfill:

bash
cd /path/to/yumina
PROD='<railway-public-DATABASE_URL>'

# Drizzle's db:push generates and applies the column ALTER.
# Alternatively, run the SQL directly:
/usr/local/opt/postgresql@17/bin/psql "$PROD" -c \
  "ALTER TABLE worlds ADD COLUMN IF NOT EXISTS target_audience text NOT NULL DEFAULT 'all';"

# Main's backfill (idempotent — re-running is safe):
/usr/local/opt/postgresql@17/bin/psql "$PROD" \
  -f packages/server/drizzle/0022_worlds_target_audience.sql

Verify:

bash
/usr/local/opt/postgresql@17/bin/psql "$PROD" -c \
  "SELECT target_audience, COUNT(*) FROM worlds GROUP BY target_audience;"

Should show mostly all, plus male equal to the count of NSFW worlds pre-migration.

1. Apply migration 0024 (Phase 2: FTS + pg_trgm + GIN indexes)

bash
cd /path/to/yumina
PROD='<railway-public-DATABASE_URL>'
/usr/local/opt/postgresql@17/bin/psql "$PROD" \
  -f packages/server/drizzle/0024_hub_search_index.sql

Expected output:

CREATE EXTENSION
ALTER TABLE
CREATE FUNCTION
DROP TRIGGER        -- NOTICE on first run (trigger doesn't exist yet)
CREATE TRIGGER
UPDATE 256          -- backfill: N = current published+unpublished world count
CREATE INDEX
CREATE INDEX
CREATE INDEX

Verify:

bash
/usr/local/opt/postgresql@17/bin/psql "$PROD" -c \
  "SELECT COUNT(*) AS total, COUNT(search_doc) AS filled FROM worlds;"

total and filled should match.

2. Apply migration 0025 (Phase 2.5: trad/simp normalized column)

bash
/usr/local/opt/postgresql@17/bin/psql "$PROD" \
  -f packages/server/drizzle/0025_search_doc_normalized.sql

Expected: ALTER TABLE + CREATE INDEX. The column comes out NULL for every row — the backfill step below populates it.

2.5. Apply migration 0026 (Phase 6: editorial curation tables + seed)

bash
/usr/local/opt/postgresql@17/bin/psql "$PROD" \
  -f packages/server/drizzle/0026_editorial_curation.sql

Expected: CREATE TABLE × 2, ALTER TABLE (XOR check), CREATE INDEX × 3, plus seed INSERTs (each gated on world existence + non-existence of its row, so re-running is safe). Verify:

bash
/usr/local/opt/postgresql@17/bin/psql "$PROD" -c \
  "SELECT COUNT(*) FROM editorial_boosts; SELECT COUNT(*) FROM featured_worlds;"

Both counts should be > 0 (3 group + up-to-2 world boosts; 5 featured slots, fewer if some seed UUIDs aren't in prod).

3. Backfill search_doc_normalized for existing worlds

bash
cd packages/server
DATABASE_URL="$PROD" pnpm exec tsx ../../scripts/backfill-search-normalized.ts

Runs OpenCC (traditional → simplified) across every existing world and populates the new column. Progress logged every 50 rows. Finishes in ~10 seconds for a few hundred worlds. Safe to re-run — it only touches rows where search_doc_normalized IS NULL.

Verify:

bash
/usr/local/opt/postgresql@17/bin/psql "$PROD" -c \
  "SELECT COUNT(*) AS missing FROM worlds WHERE search_doc_normalized IS NULL;"

Should return 0.

4. Confirm PostHog env vars on Railway

The Phase 3 events only fire when env vars are set. Confirm:

POSTHOG_API_KEY              — server-side event emitter
POSTHOG_HOST                 — should be https://t.yumina.io (or us.i.posthog.com)
VITE_PUBLIC_POSTHOG_PROJECT_TOKEN  — client-side; must start with phc_
VITE_PUBLIC_POSTHOG_HOST     — same as server host

Missing vars = events silently no-op. Not destructive, just no data.

5. Merge reco-algomain

Hold off on this step until the ownership discussion with Jefray + Jason resolves. Coordinating with vk/adea-reload-plugin:

  • If they land first, rebase reco-algo onto updated main, drop the Phase 1.5 !filters.query && excludedWorldIds gate in favor of their soft/hard penalty system, keep everything else.
  • If reco-algo lands first, they need to rebase and reconcile.
  • If both land independently and diverge, someone handles the merge — probably whoever's PR is second.

When ready:

bash
gh pr create --base main --head reco-algo \
  --title "fix(hub): search overhaul + PostHog instrumentation + freshness boost" \
  --body "See RECO_ALGO_BRANCH.md for full context."
# ...review, then:
gh pr merge --squash   # or merge commit — project's usual style

Railway auto-deploys on push to main. Once the new server binary is live (~30 seconds of build + swap), the new searchDoc / searchDocNormalized columns are read by the query path, Phase 3 events start flowing, and freshness boost is applied in scoring.

6. Smoke test on yumina.io

  • Hub search for a world previously hidden by Phase 1's dedup bug (e.g., "Hell's Paradise" English variant).
  • Search a typo like "frirren" → Frieren should appear.
  • Search (simplified) → traditional 壺中の毒 should appear.
  • Favorite a world, then search it from the Recommended tab → still appears.
  • Open a world preview → click "Start Playing" → check PostHog Activity: hub_play_start event appears with environment=production.

Rollback

If any phase misbehaves, the migrations are each fully reversible. Drop in reverse order:

sql
-- Undo Phase 6 (editorial curation)
DROP TABLE IF EXISTS featured_worlds;
DROP TABLE IF EXISTS editorial_boosts;

-- Undo Phase 2.5
DROP INDEX IF EXISTS idx_worlds_search_doc_normalized_trgm;
ALTER TABLE worlds DROP COLUMN IF EXISTS search_doc_normalized;

-- Undo Phase 2
DROP INDEX IF EXISTS idx_worlds_search_doc;
DROP INDEX IF EXISTS idx_worlds_name_trgm;
DROP INDEX IF EXISTS idx_worlds_description_trgm;
DROP TRIGGER IF EXISTS worlds_search_doc_trg ON worlds;
DROP FUNCTION IF EXISTS worlds_search_doc_update();
ALTER TABLE worlds DROP COLUMN IF EXISTS search_doc;
-- (leave pg_trgm extension installed; nothing breaks by keeping it)

Then revert the git commits on main. DROP COLUMN is non-destructive — source columns (name, description, etc.) are untouched.

To roll back Phase 3 (PostHog) or Phase 5 (freshness): pure code changes, no DB. Revert the commits on main, Railway redeploys. Events stop flowing (Phase 3) or scoring reverts (Phase 5).


Caveats worth knowing

CJK full-text search is whitespace-tokenized

The simple FTS config does NOT tokenize Chinese/Japanese at the character level — 壺中の毒 is stored as one atomic token. A user searching for a single CJK character () will not match via tsvector @@ tsquery. We rely on ILIKE-substring-match against searchDocNormalized as the fallback, accelerated by a pg_trgm GIN index. This works but is slower than the tsvector path.

Proper CJK tokenization would require zhparser or pg_bigm, both of which are unavailable on Neon/Railway. Not fixable without moving off Neon.

Traditional ↔ simplified is one-way-normalized

The mapping table is traditional → simplified (via OpenCC from: 'tw', to: 'cn'). User queries and stored text both normalize to simplified. Edge cases:

  • Japanese shinjitai normalizes as if it were traditional Chinese — works for most kanji, may over-simplify in rare cases (e.g., 國 in a Japanese name becomes 国).
  • Ambiguous traditional→simplified mappings (e.g., 發/髮 → 发) use OpenCC's default rules. Proper nouns with unusual characters may normalize inconsistently. Not a bug; a limitation of any trad ↔ simp mapping system.
  • Queries mixing Latin and CJK work — OpenCC leaves Latin characters untouched.

pg_trgm threshold is tuned for long titles

similarity(name, query) > 0.15 was set after finding that "Frieren · Journey to the North" against typo "frirren" scores 0.161 — below the default pg_trgm threshold of 0.3 and even our initial 0.2. Lowering further risks false positives. If short titles start over-matching junk, revisit this threshold.

Deploy order matters

Migration 0024 MUST apply before the Phase 2 code deploys; migration 0025 + backfill MUST apply before Phase 2.5 code deploys. Deploying code first will error with column "search_doc" does not exist. Railway auto-deploys on push, so the manual migration has to happen before the merge lands.

If you accidentally deploy first: the app will 500 on hub search, but existing non-search browsing is unaffected. Apply migrations, then it recovers without a restart (next request succeeds).

Phase 3 events are noise until prod has traffic

With dev's ~4 user events, PostHog funnels and breakdowns look like nothing. They only become meaningful with real traffic. Don't judge the dashboard quality until the branch is live in prod for at least a week.

PostHog project token comes from VITE_PUBLIC_POSTHOG_PROJECT_TOKEN (client) and POSTHOG_API_KEY (server). Dev + prod share one PostHog project, disambiguated by the environment property each event carries. The actual values live in Railway env vars + local .env — never commit them. PostHog "project tokens" are technically client-public (they ride the browser bundle), but keep them out of source so any future audit doesn't have to argue about it.

Freshness boost is a hand-tuned constant

FRESHNESS_BONUS = 15, FRESHNESS_WINDOW_DAYS = 14 are guesses. No data justifies them yet. Watch CTR on new-vs-mature worlds in PostHog after shipping; expect to tune. The comment above the constants in recommendations.ts explains the intent so future edits are deliberate.

Dev proxy fix is dev-only

8af9ca575 edits packages/server/src/index.ts in the dev-mode branch only. Production path is unchanged. Worth keeping in the PR but not a ship-gate concern.

Dev-vs-prod database divergence

CLAUDE.md claims "both dev and prod are on Neon." Turns out dev is on Neon (ep-shy-resonance-…neon.tech) but prod is on Railway-managed Postgres (switchyard.proxy.rlwy.net). The distinction matters for:

  • No Neon console for prod
  • No Neon branching available for prod testing
  • Migrations apply identically via psql -f to either

CLAUDE.md should probably be updated to reflect this — left as a follow-up.

.env management

Never commit the prod DATABASE_URL into .env in the reco worktree — it would be read by pnpm dev if present. The audit script and migration commands accept an inline DATABASE_URL=... prefix, which is safer.

The .env.backup file in the worktree is from a temporary scratch- DB test and is gitignored. Safe to delete if still present.


Scratch-DB test recipe (for future migration changes)

We used a local Docker Postgres loaded with a prod dump to verify Phase 2 + 2.5 migrations against real prod data shape before touching real prod. The recipe:

bash
# 1. Dump prod (requires pg_dump v17 since prod is v17.x)
PROD='<prod-url>'
/usr/local/opt/postgresql@17/bin/pg_dump "$PROD" \
  --no-owner --no-acl --format=custom -f /tmp/prod-dump.dump

# 2. Start a scratch Postgres with pgvector (prod uses it)
docker run --rm -d --name pg-scratch \
  -e POSTGRES_PASSWORD=scratch \
  -p 5433:5432 \
  pgvector/pgvector:pg17
sleep 5

# 3. Restore into scratch (prod has an orphan FK — need --disable-triggers)
SCRATCH='postgresql://postgres:scratch@localhost:5433/postgres'
/usr/local/opt/postgresql@17/bin/pg_restore \
  --no-owner --no-acl --disable-triggers --single-transaction \
  -d "$SCRATCH" /tmp/prod-dump.dump

# 4. Create stub `threads` table (prod doesn't have it yet; server
#    startup self-heal references it via community_events FK)
/usr/local/opt/postgresql@17/bin/psql "$SCRATCH" \
  -c "CREATE TABLE IF NOT EXISTS threads (id TEXT PRIMARY KEY);"

# 5. Apply your migration, run queries, or point dev at the scratch:
#    edit .env to DATABASE_URL=$SCRATCH, restart pnpm dev, then
#    browse as an incognito guest (your dev user doesn't exist in
#    the prod snapshot).

# 6. Clean up when done
docker stop pg-scratch
rm /tmp/prod-dump.dump

Known gotcha: prod dumps may lack columns that dev has (e.g., user.referral_code, stripe_customer_id). Better-Auth login and other features that query those columns will fail. Workarounds:

  • Run pnpm db:push --force against scratch after restore (adds missing columns from current schema)
  • Or accept the broken state if you only need read-path testing

Not fully implemented — recommendation system

What's live (Phase 3, 3+, 5)

Phase 3 + 3+ + 5 are all on reco-algo and ready to deploy. Summary above. These are the lightweight changes that can ship independently of any algorithm rewrite.

What's still open (deferred)

After Phase 6, the remaining reco-system gaps:

  1. Weights are still magic numbers in sourceCHAR_PENALTY=20, CREATOR_PENALTY=12, TAG_CLUSTER_PENALTY=7, freshness 15/14d, soft-penalty brackets {12,8,4}/{10,7,4} are all literals. No config table, no A/B framework. Phase 4 (still deferred): move into PostHog feature-flag payloads so weights become variants measurable against Phase 3 metrics.
  2. Follow signals underusedfixed in Phase 6 follow-up. New followed_recent candidate route (warm:12 / mature:12, sorted newest-first) gives every followed creator's brand-new release guaranteed entry to the candidate pool, decoupled from popularity. Score side (creator weight 8.0) unchanged.
  3. No "not interested" / dismiss UI — users can't explicitly demote a world. Implicit negative signals (5 impressions, 0 clicks) become measurable once Phase 3 is in prod.
  4. Lineage dedup may be too aggressive — favoriting one variant boosts all variants' score via interactedLineageKeys. Might be crossing the line from helpful to annoying.
  5. Soft-penalty buckets are guesses — no data justifies the {12,8,4} / {10,7,4} brackets or 7d / 30d boundaries yet. Watch PostHog hub_click rate on favorited cards after shipping; expect to tune.
  6. Cold-start tag list is hardcodedCOLD_START_BUCKETS picks five "popular categories" but those tags are baked into source. Could move to editorial_boosts-style table later if curators want to rotate them seasonally.

Proposed phases when we resume

  • Phase 4 — config table + A/B harness via PostHog feature flags. Few days. Blocked on Phase 3 events being in prod for a week so there's baseline data to compare experiment variants against.
  • Phase 6 (long-term) — world embeddings via pgvector (already installed on prod), two-tower neural recs, multi-armed bandit exploration. Weeks-to-months each; only worth doing after Phases 3-5 are live and the event volume justifies it.

Quick references

  • Audit script: scripts/audit-search-coverage.ts — read-only, reports hidden variants + is_published/status drift.
  • Backfill script: scripts/backfill-search-normalized.ts — safe to re-run, only touches NULL rows.
  • Main files touched:
    • packages/server/src/lib/recommendations.ts (Phase 1–2.5 + 5)
    • packages/server/src/lib/normalize-search.ts (new, Phase 2.5)
    • packages/server/src/lib/analytics.ts (new, Phase 3 server)
    • packages/app/src/lib/analytics.ts (new, Phase 3 client)
    • packages/server/src/routes/worlds.ts (Phase 1–2.5 + 3)
    • packages/server/src/db/schema.ts (Phase 2 + 2.5 columns)
    • packages/server/drizzle/0024_hub_search_index.sql (new)
    • packages/server/drizzle/0025_search_doc_normalized.sql (new)
    • packages/app/src/features/hub/rec-tab.tsx (Phase 1.5 + 3)
    • packages/app/src/features/hub/following-tab.tsx (Phase 1.5 + 3)
    • packages/app/src/features/hub/hub-card.tsx (Phase 3)
    • packages/app/src/features/hub/world-preview-modal.tsx (Phase 3 + 3+)
    • packages/app/src/components/layout/app-shell.tsx (Phase 3 identify)
  • vk/adea-reload-plugin analysis: chat thread + git log origin/main..FETCH_HEAD after git fetch origin vk/adea-reload-plugin.