Skip to content

Session findings — 2026-06-01: Cloudflare stream bypass + editor typing perf

Self-contained record of what shipped on 2026-06-01. Two workstreams: (1) the long-Studio-run "连接中断" cut, and (2) editor typing lag on large worlds. All PRs below are merged to main and deployed (or deploying) on Railway.


1. Long agent runs cut at ~100s ("连接中断") — FIXED & LIVE

Symptom

On long Studio agent runs (complex TSX edits, 2–5 min) the live stream died mid-run and showed "连接中断 / connection dropped". Reported by @stay0325.

Root cause

yumina.io is on Cloudflare Pro, proxied (orange-cloud). CF severs any proxied connection at ~100s total duration. Pro cannot raise this (proxy_read_timeout is Enterprise-only), and the agent's ~8–13s SSE heartbeat can't beat it (the cap is total-duration, not idle).

  • CF zone f75c62add8fcecdf8bb3f7fe7a592b98, account 585d63e1fb8bcbecf45eabb17441a40d.

Not a failure: the run keeps going server-side and completes (recovery poller tryRecoverAgentRun — @stay0325's runs were 22/22 complete). The real damage was behavioral: creators saw the scary banner, re-sent, and the new run superseded/killed the in-flight one (Superseded by new agent run in prod data). So work was lost by the re-send, not the cut.

Fix (3 parts across 2 PRs)

  1. PR #31 — client routes ONLY the long agent/start stream at a configurable streamBase = import.meta.env.VITE_STREAM_URL || apiBase (packages/app/src/stores/studio.ts). Point it at a DNS-only / un-proxied subdomain → stream goes direct to origin, skipping CF's 100s cap. Everything else still goes through CF. No auth/CORS changes needed: the session cookie is already Domain=.yumina.io (auth.ts) and CORS already allowlists yumina.io with credentials (middleware/cors.ts), so cross-subdomain auth works as-is. Short requests (status/stop/approve) stay on CF.
  2. PR #31 — reworded the recovering banner (editor.json key, en/zh/ja/es) from "连接中断/connection dropped" → "still working… long edits keep running in the background", so creators stop panic-re-sending. Helps even without the bypass.
  3. PR #33the actual activation fix (and the gap in #31): VITE_ vars are inlined by Vite at build time, and a Railway Dockerfile build only sees build ARGs the Dockerfile declares. #31 added the code but never declared the ARG, so the var compiled to empty no matter what was set in Railway. Added ARG VITE_STREAM_URL + ENV VITE_STREAM_URL=$VITE_STREAM_URL to the build stage in Dockerfile, next to the other VITE_ args.

Infra wired (done by Jefray in the Cloudflare + Railway dashboards)

  • Railway: custom domain stream.yumina.io → the service.
  • Cloudflare: CNAME stream → <railway target>, gray-cloud (Proxy OFF / DNS only). This is the key toggle.
  • Railway env: VITE_STREAM_URL=https://stream.yumina.io.

Verified LIVE

  • stream.yumina.io/healthServer: railway-edge, no CF-RAY (bypasses CF), Let's Encrypt cert (Railway's own).
  • Deployed bundle's studio-shell-*.js chunk now contains https://stream.yumina.ioVITE_STREAM_URL baked in.
  • Long runs no longer cut at 100s. Confirm in DevTools → Network: agent/start goes to stream.yumina.io.

Tradeoff (accepted)

The bypass subdomain exposes the origin directly (no CF edge in front of that one endpoint). It's auth-gated and the app's rate-limit middleware still applies; CF still fronts everything else.

Notes / gotchas for future

  • The CF API token provided can read zone settings but cannot edit DNS (10000 Authentication error) — DNS changes must be done in the dashboard or with a DNS-scoped token.
  • VITE_STREAM_URL is build-time → setting/changing it requires a rebuild, not just a restart.

2. Editor typing lag on large worlds — FIXED

Symptom

Typing in editor fields lags multiple seconds when a world has many entries / is large, worst on phones with a Chinese IME (which fires several composition events per character). Reported repeatedly ("打拼音能好几秒甚至十几秒").

Root cause

The whole world is one big object in the Zustand useEditorStore (worldDraft). Every field bound value straight to the store and called its setter on every keystroke → each setter rebuilt the whole-world object and re-rendered the entries panel, which also re-scanned all entries (tag lists + token counts) and re-rendered every row. So a single keystroke did work proportional to total entry count — fine at 30 entries, multi-second at 500.

Fix — PR #32 (earlier) + PR #34 (the big one)

  • PR #32 fixed only the entry content textarea: type into local state, debounce the store commit ~300ms, IME-aware (EntryContentTextarea in entries.tsx). Proven solid in prod.

  • PR #34 (the "safe set" #1–3 from a 6-agent audit; 30 findings) generalized that pattern to everything else:

    (a) Shared debounced fieldpackages/app/src/features/editor/components/debounced-field.tsx: useDebouncedFieldCommit hook + DebouncedInput + DebouncedTextarea. Behavior: type into local useState; commit to the store once per ~300ms pause / on blur / on IME compositionend; resync from the store when the bound value or syncKey changes externally (selection switch, agent edit, undo) — but never while focused/composing (won't yank text out from under typing). On unmount it only clears the pending timer (no flush) — matches the original content box; blur covers the common case. scheduleCommit/flush capture the CURRENT onCommit at call time, so a pending timer commits to the item it was editing even if the component is reused (regression-safe). Optional transform (live sanitization, e.g. variable id) and delay.

    Rolled out to:

    • entry name (entries.tsx)
    • variable id (transform = spaces→_, lowercase), name, behaviorRules (variables.tsx)
    • behavior name, description (behaviors-section.tsx)
    • first-message greeting (first-message.tsx)
    • rootComponent name + TSX editor (delay=500) (components.tsx)
    • KeywordsInput (components/keywords-input.tsx) — text echo stays instant, the onChange store write is now debounced; IME-aware; flush on Enter/blur.

    (b) Memoized entry rowsSortableEntryCard wrapped in React.memo (entries.tsx). Row handlers changed to id-keyed stable useCallbacks (onSelect/onDelete/onRename); selectedIdRef keeps the delete handler stable instead of changing with selection. Editing one entry now re-renders only the changed row (+ the two whose isActive flips), not all 200–1000+. updateEntry preserves object identity of unchanged entries, so memo's shallow compare holds.

    (c) Per-entry token cache — module-level WeakMap<WorldEntry, number> (entryTokenCache / getEntryTokens in entries.tsx). tokenSummary now recomputes only the edited entry's tokens and reuses cached counts for the rest (O(1) per keystroke instead of re-tokenizing all content). WeakMap → stale entries GC'd automatically.

Net effect

A keystroke costs ~O(1) instead of O(total entries) — instant vs multi-second on a 500-entry world, in every field.

Deliberately NOT done (and why)

  • List virtualization (audit #4) — biggest structural win, but tangles with dnd-kit drag-and-drop + folder grouping + variable row heights. Deserves its own carefully-tested PR. This is the top follow-up.
  • Structured WHEN/DO/condition editors in behaviors — left as-is (the "then" is a structured DoEditor, not a raw text field).
  • Undo stack — investigated and confirmed it stores cheap references to immutable snapshots, NOT deep clones per keystroke (one investigator's "multi-MB structuredClone per keystroke" claim was wrong). It is not a memory bottleneck; left untouched. Don't re-flag it.

QA / smoke-test checklist for PR #34 (run on the deployed build)

The debounce decouples "what's on screen" (local) from "what's saved" (store); every edge case lives in that ~300ms gap. Low risk (same pattern as #32) but the new surface (many fields + memoized rows) is worth a 2-minute pass:

  1. Mid-type entry switch / text bleed — type a few chars into Entry A's name, immediately (<300ms) click Entry B. Expect: B shows B's name (no A text bleeding in); go back to A → your typed text is saved (flush-on-blur). The syncKey={entry.id} forces resync on switch.
  2. Fast type + immediate save — type, then save instantly via keyboard shortcut before clicking away. Edge: the last word may not have committed (save reads the store). Clicking the Save button blurs first → safe. Worth a check.
  3. Drag reorder (the #1 new-interaction risk: React.memo + dnd-kit) — drag an entry to reorder within a section; drag into/out of a folder; drag a folder. Expect smooth, lands correctly.
  4. Chinese IME — type pinyin into entry name / keywords / content: characters compose normally, no mid-composition flicker/reset, final text saves.
  5. Keywords field — type keywords, press Enter (adds separator + commits immediately), use commas/、/,; switch entries mid-type → no bleed.
  6. Variable ID field — edit a variable's id: spaces→underscores live, uppercase→lowercase, preview updates, selection (by index) stays.
  7. Agent edit / undo while NOT focused — run the Studio agent to change an entry, or hit undo → the field updates to the new value (resync path).

Most likely failure candidates if any: drag-reorder glitch, or text bleeding between entries on fast switch.


PR status (all merged to main)

FixPRSquashStatus
Stream bypass code + banner reword#31✅ deployed
Editor content-box typing lag#3201083c0a✅ deployed
Dockerfile: pass VITE_STREAM_URL (activates bypass)#330d7c1cde✅ deployed & verified live
Full field debounce + memo rows + token cache#346f9473f4✅ deploying

Closed without merge: PR #28 (per-world SSE push — over-scope), PR #30 (earlier lag attempt — carried a duplicate of #29's editor.ts and risked reverting it; superseded by the clean cherry-pick in #32).

Earlier today (context): #27 (save clobber: version-guard + folder-heal + 3-way merge) and #29 (refreshWorldSchema merges instead of bailing so agent edits show immediately, not "next day") — both merged earlier and live.


Key files touched today

  • DockerfileARG/ENV VITE_STREAM_URL
  • packages/app/src/stores/studio.tsstreamBase for agent/start
  • packages/app/src/locales/{en,zh,ja,es}/editor.jsonrecovering banner reword
  • packages/app/src/features/editor/components/debounced-field.tsxnew shared component
  • packages/app/src/features/editor/components/keywords-input.tsx — debounced commit
  • packages/app/src/features/editor/sections/entries.tsx — name debounce, memo rows, token cache
  • packages/app/src/features/editor/sections/variables.tsx — id/name/behaviorRules debounce
  • packages/app/src/features/editor/sections/behaviors-section.tsx — name/description debounce
  • packages/app/src/features/editor/sections/first-message.tsx — greeting debounce
  • packages/app/src/features/editor/sections/components.tsx — name + TSX editor debounce

Open follow-ups

  • Virtualize the entries list (audit #4) — the remaining big-world structural win; needs care with dnd-kit + folders. Separate tested PR.
  • Optional: narrow other sections' store subscriptions (variables/behaviors) so editing entries doesn't re-render them; and editor-shell.tsx sectionCounts. Lower priority (medium risk, smaller win) — defer unless profiling shows it.