Yumina Credit System — Architecture Plan

1. Tier Table (Final)

Tier	Price	Raw Compute	Monthly Credits	Memory Cap	Models	RPM	Concurrent
Free	$0	$1	1,000	32K	≤ DeepSeek V3.2	6	1
Go	$5	$2	2,000	64K	≤ Gemini 3.1 Pro Preview	6	2
Plus	$20	$8	8,000	Unlimited (default 48K)	All	6	2
Pro	$50	$20	20,000	Unlimited (default 64K)	All	6	3
Ultra	$100	$40	40,000	Unlimited (default 64K)	All	6	3

Credit unit: $1 raw compute = 1,000 credits. 1 credit = $0.001 compute.

Model access by tier (ordered by weight):

#	Model	Weight	Free	Go	Plus+
1	google/gemini-2.5-flash-lite	1.0x	✓	✓	✓
2	x-ai/grok-4.1-fast	1.9x	✓	✓	✓
3	deepseek/deepseek-v3.2	2.3x	✓	✓	✓
4	google/gemini-3.1-flash-lite-preview	2.7x		✓	✓
5	google/gemini-2.5-flash	3.5x		✓	✓
6	google/gemini-3-flash-preview	5.5x		✓	✓
7	anthropic/claude-haiku-4.5	10.5x		✓	✓
8	x-ai/grok-4.20	19x		✓	✓
9	google/gemini-3.1-pro-preview	22x		✓	✓
10	anthropic/claude-sonnet-4.6	31x			✓
11	anthropic/claude-opus-4.6	52x			✓

2. Database Schema

New tables

sql

-- ─── Credit Wallets (one per user) ───────────────────────────────────
CREATE TABLE credit_wallets (
  id              TEXT PRIMARY KEY DEFAULT gen_random_uuid(),
  user_id         TEXT NOT NULL UNIQUE REFERENCES "user"(id) ON DELETE CASCADE,
  balance         NUMERIC(12,1) NOT NULL DEFAULT 0,       -- 1 decimal precision
  plan            TEXT NOT NULL DEFAULT 'free',            -- free|go|plus|pro|ultra
  monthly_credits INTEGER NOT NULL DEFAULT 1000,
  memory_cap      INTEGER,                                 -- NULL = unlimited
  period_start    TIMESTAMP NOT NULL DEFAULT NOW(),
  period_end      TIMESTAMP NOT NULL DEFAULT (NOW() + INTERVAL '1 month'),
  created_at      TIMESTAMP NOT NULL DEFAULT NOW(),
  updated_at      TIMESTAMP NOT NULL DEFAULT NOW()
);
CREATE INDEX credit_wallets_user_id_idx ON credit_wallets(user_id);

-- ─── Credit Transactions (append-only ledger) ────────────────────────
CREATE TABLE credit_transactions (
  id              TEXT PRIMARY KEY DEFAULT gen_random_uuid(),
  wallet_id       TEXT NOT NULL REFERENCES credit_wallets(id) ON DELETE CASCADE,
  amount          NUMERIC(12,1) NOT NULL,                  -- positive=credit, negative=debit
  type            TEXT NOT NULL,                            -- plan_grant|usage|addon|admin|refund|expire
  reference_id    TEXT,                                     -- usage_log.id, stripe payment id, etc.
  balance_after   NUMERIC(12,1) NOT NULL,                  -- snapshot for audit
  description     TEXT,
  created_at      TIMESTAMP NOT NULL DEFAULT NOW()
);
CREATE INDEX credit_txn_wallet_idx ON credit_transactions(wallet_id);
CREATE INDEX credit_txn_ref_idx ON credit_transactions(reference_id);
CREATE UNIQUE INDEX credit_txn_usage_unique ON credit_transactions(reference_id) WHERE type = 'usage';
  -- ^ Prevents double-deduction: same usage_log.id can only be charged once

-- ─── Model Prices ────────────────────────────────────────────────────
CREATE TABLE model_prices (
  id                          TEXT PRIMARY KEY DEFAULT gen_random_uuid(),
  model_id                    TEXT NOT NULL,               -- e.g. "google/gemini-2.5-flash-lite"
  input_price_per_m           NUMERIC(8,4) NOT NULL,       -- $/M input tokens
  output_price_per_m          NUMERIC(8,4) NOT NULL,       -- $/M output tokens
  context_threshold           INTEGER,                     -- NULL = flat pricing
  input_price_above_threshold NUMERIC(8,4),                -- price when prompt > threshold
  output_price_above_threshold NUMERIC(8,4),
  min_plan                    TEXT NOT NULL DEFAULT 'free', -- minimum plan to use this model
  is_active                   BOOLEAN NOT NULL DEFAULT TRUE,
  updated_at                  TIMESTAMP NOT NULL DEFAULT NOW()
);
CREATE UNIQUE INDEX model_prices_model_id_threshold ON model_prices(model_id, context_threshold);

Drizzle schema (packages/server/src/db/schema.ts)

typescript

// ─── Credit System ──────────────────────────────────────────────────

export const creditWallets = pgTable("credit_wallets", {
  id: text("id").primaryKey().$defaultFn(() => crypto.randomUUID()),
  userId: text("user_id").notNull().unique()
    .references(() => user.id, { onDelete: "cascade" }),
  balance: real("balance").notNull().default(0),
  plan: text("plan").notNull().default("free"),
  monthlyCredits: integer("monthly_credits").notNull().default(1000),
  memoryCap: integer("memory_cap"),  // null = unlimited
  periodStart: timestamp("period_start").notNull().defaultNow(),
  periodEnd: timestamp("period_end").notNull()
    .$defaultFn(() => new Date(Date.now() + 30 * 24 * 60 * 60 * 1000)),
  createdAt: timestamp("created_at").notNull().defaultNow(),
  updatedAt: timestamp("updated_at").notNull().defaultNow(),
}, (t) => [
  index("credit_wallets_user_id_idx").on(t.userId),
]);

export const creditTransactions = pgTable("credit_transactions", {
  id: text("id").primaryKey().$defaultFn(() => crypto.randomUUID()),
  walletId: text("wallet_id").notNull()
    .references(() => creditWallets.id, { onDelete: "cascade" }),
  amount: real("amount").notNull(),
  type: text("type").notNull(),  // plan_grant|usage|addon|admin|refund|expire
  referenceId: text("reference_id"),
  balanceAfter: real("balance_after").notNull(),
  description: text("description"),
  createdAt: timestamp("created_at").notNull().defaultNow(),
}, (t) => [
  index("credit_txn_wallet_idx").on(t.walletId),
  index("credit_txn_ref_idx").on(t.referenceId),
]);

export const modelPrices = pgTable("model_prices", {
  id: text("id").primaryKey().$defaultFn(() => crypto.randomUUID()),
  modelId: text("model_id").notNull(),
  inputPricePerM: real("input_price_per_m").notNull(),
  outputPricePerM: real("output_price_per_m").notNull(),
  contextThreshold: integer("context_threshold"),
  inputPriceAboveThreshold: real("input_price_above_threshold"),
  outputPriceAboveThreshold: real("output_price_above_threshold"),
  minPlan: text("min_plan").notNull().default("free"),
  isActive: boolean("is_active").notNull().default(true),
  updatedAt: timestamp("updated_at").notNull().defaultNow(),
});

Initial model price data

sql

INSERT INTO model_prices (model_id, input_price_per_m, output_price_per_m, context_threshold, input_price_above_threshold, output_price_above_threshold, min_plan) VALUES
  ('google/gemini-2.5-flash-lite',          0.10,  0.40,   NULL, NULL, NULL, 'free'),
  ('x-ai/grok-4.1-fast',                    0.20,  0.50,   128000, 0.40, 1.00, 'free'),
  ('deepseek/deepseek-v3.2',                0.26,  0.38,   NULL, NULL, NULL, 'free'),
  ('google/gemini-3.1-flash-lite-preview',   0.25,  1.50,   NULL, NULL, NULL, 'go'),
  ('google/gemini-2.5-flash',               0.30,  2.50,   NULL, NULL, NULL, 'go'),
  ('google/gemini-3-flash-preview',          0.50,  3.00,   NULL, NULL, NULL, 'go'),
  ('anthropic/claude-haiku-4.5',             1.00,  5.00,   NULL, NULL, NULL, 'go'),
  ('x-ai/grok-4.20',                        2.00,  6.00,   200000, 4.00, 12.00, 'go'),
  ('google/gemini-3.1-pro-preview',          2.00, 12.00,   NULL, NULL, NULL, 'go'),
  ('anthropic/claude-sonnet-4.6',            3.00, 15.00,   NULL, NULL, NULL, 'plus'),
  ('anthropic/claude-opus-4.6',              5.00, 25.00,   NULL, NULL, NULL, 'plus');

Modify existing user table

sql

-- Replace the tier column semantics. Keep the column but repurpose it.
-- Old: "regular" | "invited"
-- New: "free" | "go" | "plus" | "pro" | "ultra"
-- Migration: all existing "regular" → "free", "invited" → "go"
ALTER TABLE "user" ALTER COLUMN tier SET DEFAULT 'free';
UPDATE "user" SET tier = 'free' WHERE tier = 'regular';
UPDATE "user" SET tier = 'go' WHERE tier = 'invited';

3. Credit Calculation

Formula

credits = (prompt_tokens × input_price + completion_tokens × output_price) / 1000

Where prices are looked up from model_prices table, with tiered pricing check:

typescript

function getModelPrices(modelId: string, promptTokens: number): {
  inputPrice: number;   // $/M tokens
  outputPrice: number;  // $/M tokens
} {
  const row = modelPriceCache.get(modelId);
  if (!row) throw new Error(`Unknown model: ${modelId}`);

  // Check tiered pricing (Grok models charge 2x above threshold)
  if (row.contextThreshold && promptTokens > row.contextThreshold) {
    return {
      inputPrice: row.inputPriceAboveThreshold!,
      outputPrice: row.outputPriceAboveThreshold!,
    };
  }

  return {
    inputPrice: row.inputPricePerM,
    outputPrice: row.outputPricePerM,
  };
}

function calculateCredits(
  modelId: string,
  promptTokens: number,
  completionTokens: number,
): number {
  const { inputPrice, outputPrice } = getModelPrices(modelId, promptTokens);
  const cost = (promptTokens * inputPrice + completionTokens * outputPrice) / 1_000_000;
  // cost is in dollars. $1 = 1000 credits. So credits = cost * 1000.
  // Round up to nearest 0.1 (1 decimal place)
  return Math.ceil(cost * 1000 * 10) / 10;
}

Examples (at 48K context, 1500 output tokens)

Model	Calculation	Credits
Flash Lite	(48000×0.10 + 1500×0.40) / 1000	5.4
DeepSeek V3.2	(48000×0.26 + 1500×0.38) / 1000	13.1
Gemini 3 Flash	(48000×0.50 + 1500×3.00) / 1000	28.5
Claude Haiku	(48000×1.00 + 1500×5.00) / 1000	55.5
Claude Sonnet	(48000×3.00 + 1500×15.00) / 1000	166.5
Claude Opus	(48000×5.00 + 1500×25.00) / 1000	277.5
Grok 4.1 Fast @ 64K	(64000×0.20 + 1500×0.50) / 1000	13.6
Grok 4.1 Fast @ 200K	(200000×0.40 + 1500×1.00) / 1000	81.5

4. Server Architecture

New files to create

packages/server/src/
  lib/
    credit-service.ts     — CreditService class (all credit operations)
    model-price-cache.ts  — In-memory model price cache (loaded from DB on startup)
    plan-config.ts        — Plan definitions (static config)
  middleware/
    credit-guard.ts       — Middleware: plan validation + balance check + model access

plan-config.ts — Static plan definitions

typescript

export type PlanId = "free" | "go" | "plus" | "pro" | "ultra";

export interface PlanConfig {
  id: PlanId;
  monthlyCredits: number;
  memoryCap: number | null;  // null = unlimited
  rateLimit: number;         // messages per minute
  maxConcurrent: number;
  priceCents: number;
}

// Plan hierarchy for access checks: higher index = higher tier
const PLAN_HIERARCHY: PlanId[] = ["free", "go", "plus", "pro", "ultra"];

export function planMeetsMinimum(userPlan: PlanId, requiredPlan: PlanId): boolean {
  return PLAN_HIERARCHY.indexOf(userPlan) >= PLAN_HIERARCHY.indexOf(requiredPlan);
}

export const PLANS: Record<PlanId, PlanConfig> = {
  free:  { id: "free",  monthlyCredits: 1000,  memoryCap: 32_000,  rateLimit: 6, maxConcurrent: 1, priceCents: 0 },
  go:    { id: "go",    monthlyCredits: 2000,  memoryCap: 64_000,  rateLimit: 6, maxConcurrent: 2, priceCents: 500 },
  plus:  { id: "plus",  monthlyCredits: 8000,  memoryCap: null,    rateLimit: 6, maxConcurrent: 2, priceCents: 2000 },
  pro:   { id: "pro",   monthlyCredits: 20000, memoryCap: null,    rateLimit: 6, maxConcurrent: 3, priceCents: 5000 },
  ultra: { id: "ultra", monthlyCredits: 40000, memoryCap: null,    rateLimit: 6, maxConcurrent: 3, priceCents: 10000 },
};

model-price-cache.ts — Price cache loaded from DB

typescript

import { db } from "../db/index.js";
import { modelPrices } from "../db/schema.js";
import { eq } from "drizzle-orm";

interface ModelPriceEntry {
  modelId: string;
  inputPricePerM: number;
  outputPricePerM: number;
  contextThreshold: number | null;
  inputPriceAboveThreshold: number | null;
  outputPriceAboveThreshold: number | null;
  minPlan: string;
}

let cache: Map<string, ModelPriceEntry> = new Map();
let lastLoaded = 0;
const CACHE_TTL = 5 * 60 * 1000; // 5 minutes

export async function getModelPrice(modelId: string): Promise<ModelPriceEntry | null> {
  if (Date.now() - lastLoaded > CACHE_TTL || cache.size === 0) {
    await refreshCache();
  }
  return cache.get(modelId) ?? null;
}

export async function getAllModelPrices(): Promise<ModelPriceEntry[]> {
  if (Date.now() - lastLoaded > CACHE_TTL || cache.size === 0) {
    await refreshCache();
  }
  return Array.from(cache.values());
}

async function refreshCache() {
  const rows = await db.select().from(modelPrices).where(eq(modelPrices.isActive, true));
  const newCache = new Map<string, ModelPriceEntry>();
  for (const row of rows) {
    newCache.set(row.modelId, {
      modelId: row.modelId,
      inputPricePerM: row.inputPricePerM,
      outputPricePerM: row.outputPricePerM,
      contextThreshold: row.contextThreshold,
      inputPriceAboveThreshold: row.inputPriceAboveThreshold,
      outputPriceAboveThreshold: row.outputPriceAboveThreshold,
      minPlan: row.minPlan,
    });
  }
  cache = newCache;
  lastLoaded = Date.now();
}

credit-service.ts — Central credit operations

typescript

import { db } from "../db/index.js";
import { creditWallets, creditTransactions } from "../db/schema.js";
import { eq, sql } from "drizzle-orm";
import { getModelPrice } from "./model-price-cache.js";
import type { PlanId } from "./plan-config.js";
import { PLANS, planMeetsMinimum } from "./plan-config.js";

// ─── Types ──────────────────────────────────────────────────────────

export interface CreditWallet {
  id: string;
  userId: string;
  balance: number;
  plan: PlanId;
  memoryCap: number | null;
  periodEnd: Date;
}

export interface DeductionResult {
  creditsDeducted: number;
  newBalance: number;
  transactionId: string;
}

// ─── Ensure wallet exists ───────────────────────────────────────────
// Called on first generation or subscription change. Creates wallet with
// initial plan grant if it doesn't exist.

export async function ensureWallet(userId: string, plan: PlanId = "free"): Promise<CreditWallet> {
  // Try to get existing wallet
  const [existing] = await db.select().from(creditWallets).where(eq(creditWallets.userId, userId));
  if (existing) {
    return {
      id: existing.id,
      userId: existing.userId,
      balance: existing.balance,
      plan: existing.plan as PlanId,
      memoryCap: existing.memoryCap,
      periodEnd: existing.periodEnd,
    };
  }

  // Create new wallet with initial grant
  const config = PLANS[plan];
  const periodEnd = new Date(Date.now() + 30 * 24 * 60 * 60 * 1000);

  const [wallet] = await db.insert(creditWallets).values({
    userId,
    balance: config.monthlyCredits,
    plan,
    monthlyCredits: config.monthlyCredits,
    memoryCap: config.memoryCap,
    periodEnd,
  }).returning();

  // Record the grant in the ledger
  await db.insert(creditTransactions).values({
    walletId: wallet!.id,
    amount: config.monthlyCredits,
    type: "plan_grant",
    balanceAfter: config.monthlyCredits,
    description: `Initial ${plan} plan grant`,
  });

  return {
    id: wallet!.id,
    userId,
    balance: config.monthlyCredits,
    plan,
    memoryCap: config.memoryCap,
    periodEnd,
  };
}

// ─── Check balance ──────────────────────────────────────────────────
// Fast check: is balance > 0? Used as pre-flight before generation.

export async function checkBalance(userId: string): Promise<{ ok: boolean; balance: number; wallet: CreditWallet }> {
  const wallet = await ensureWallet(userId);
  return { ok: wallet.balance > 0, balance: wallet.balance, wallet };
}

// ─── Validate model access ──────────────────────────────────────────
// Checks if the user's plan allows access to the requested model.

export async function validateModelAccess(
  plan: PlanId,
  modelId: string,
): Promise<{ allowed: boolean; reason?: string }> {
  const price = await getModelPrice(modelId);
  if (!price) {
    return { allowed: false, reason: `Model "${modelId}" is not available on the official API.` };
  }
  if (!planMeetsMinimum(plan, price.minPlan as PlanId)) {
    return { allowed: false, reason: `Model "${modelId}" requires the ${price.minPlan} plan or higher.` };
  }
  return { allowed: true };
}

// ─── Calculate cost ─────────────────────────────────────────────────
// Pure function. No DB access. Uses cached model prices.

export async function calculateCost(
  modelId: string,
  promptTokens: number,
  completionTokens: number,
): Promise<number> {
  const price = await getModelPrice(modelId);
  if (!price) throw new Error(`No pricing for model: ${modelId}`);

  let inputPrice = price.inputPricePerM;
  let outputPrice = price.outputPricePerM;

  // Tiered pricing: if prompt exceeds threshold, use higher rates
  if (price.contextThreshold && promptTokens > price.contextThreshold) {
    inputPrice = price.inputPriceAboveThreshold ?? inputPrice;
    outputPrice = price.outputPriceAboveThreshold ?? outputPrice;
  }

  const costDollars = (promptTokens * inputPrice + completionTokens * outputPrice) / 1_000_000;
  // $1 = 1000 credits. Round up to 1 decimal.
  return Math.ceil(costDollars * 1000 * 10) / 10;
}

// ─── Deduct credits ─────────────────────────────────────────────────
// ATOMIC deduction with balance guard. Returns new balance.
// Uses a single UPDATE with RETURNING to prevent race conditions.
// Allows overdraft up to -500 credits (one message grace).

const MAX_OVERDRAFT = 500;

export async function deductCredits(
  userId: string,
  credits: number,
  referenceId: string,
  description: string,
): Promise<DeductionResult> {
  // Atomic deduction: UPDATE ... WHERE balance > -MAX_OVERDRAFT RETURNING
  // This is a single statement — no race condition possible.
  const [updated] = await db
    .update(creditWallets)
    .set({
      balance: sql`${creditWallets.balance} - ${credits}`,
      updatedAt: new Date(),
    })
    .where(
      sql`${creditWallets.userId} = ${userId} AND ${creditWallets.balance} > ${-MAX_OVERDRAFT}`
    )
    .returning({ id: creditWallets.id, balance: creditWallets.balance });

  if (!updated) {
    throw new Error("INSUFFICIENT_CREDITS");
  }

  // Record in ledger (append-only, fire-and-forget is OK here since the
  // wallet balance is already updated atomically above)
  const [txn] = await db.insert(creditTransactions).values({
    walletId: updated.id,
    amount: -credits,
    type: "usage",
    referenceId,
    balanceAfter: updated.balance,
    description,
  }).returning({ id: creditTransactions.id });

  return {
    creditsDeducted: credits,
    newBalance: updated.balance,
    transactionId: txn!.id,
  };
}

// ─── Grant monthly credits ──────────────────────────────────────────
// Called by a cron job or on first request after period expires.

export async function refreshMonthlyCredits(userId: string): Promise<void> {
  const wallet = await ensureWallet(userId);
  const now = new Date();

  if (now < wallet.periodEnd) return; // Not yet expired

  const config = PLANS[wallet.plan];
  const newPeriodEnd = new Date(now.getTime() + 30 * 24 * 60 * 60 * 1000);

  // Reset balance to monthly grant (don't accumulate — expired credits are lost)
  await db.update(creditWallets).set({
    balance: config.monthlyCredits,
    periodStart: now,
    periodEnd: newPeriodEnd,
    updatedAt: now,
  }).where(eq(creditWallets.userId, userId));

  await db.insert(creditTransactions).values({
    walletId: wallet.id,
    amount: config.monthlyCredits,
    type: "plan_grant",
    balanceAfter: config.monthlyCredits,
    description: `Monthly ${wallet.plan} plan renewal`,
  });
}

credit-guard.ts — Middleware for credit checks

typescript

import type { Context, Next } from "hono";
import type { AppEnv } from "../lib/types.js";
import { checkBalance, validateModelAccess, refreshMonthlyCredits } from "../lib/credit-service.js";
import { PLANS, type PlanId } from "../lib/plan-config.js";

/**
 * Credit guard middleware. Run BEFORE rate limiting.
 *
 * Checks:
 * 1. User's plan period — refresh if expired
 * 2. Model access — is the model allowed for this plan?
 * 3. Credit balance — is balance > 0?
 *
 * Sets on context:
 * - c.set("wallet", wallet)       — for downstream use
 * - c.set("planConfig", config)   — for rate limit lookups
 */
export async function creditGuard(c: Context<AppEnv>, next: Next) {
  const user = c.get("user");

  // BYOK users skip credit checks entirely
  // (resolved later in the handler — we can't check here yet)
  // The handler itself will skip deduction if isByok = true.
  // But we still enforce model access for official key users.

  const plan = (user.tier ?? "free") as PlanId;
  const config = PLANS[plan] ?? PLANS.free;
  c.set("planConfig" as never, config);

  // Refresh monthly credits if period expired (lazy renewal)
  await refreshMonthlyCredits(user.id);

  // Check balance
  const { ok, balance, wallet } = await checkBalance(user.id);
  c.set("wallet" as never, wallet);

  if (!ok) {
    return c.json({
      error: "You've used all your credits for this period. Purchase additional credits or upgrade your plan.",
      code: "NO_CREDITS",
      balance: 0,
    }, 402);
  }

  await next();
}

/**
 * Validate that the requested model is allowed for the user's plan.
 * Call this in the handler after extracting the model from the request body.
 */
export async function validateModelForPlan(
  plan: PlanId,
  modelId: string,
): Promise<{ allowed: boolean; error?: string }> {
  const result = await validateModelAccess(plan, modelId);
  if (!result.allowed) {
    return { allowed: false, error: result.reason };
  }
  return { allowed: true };
}

5. Integration Points (Exact Code Changes)

5.1 messages.ts — Send endpoint (lines 432-878)

BEFORE (current, line 432-445):

typescript

const useProtections = !resolved.isByok;
if (useProtections) {
  if (checkSuspended(currentUser)) → 403
  checkRateLimit(currentUser.id) → 429
  acquireConcurrency(currentUser.id) → 429
}

AFTER (new flow):

typescript

const useProtections = !resolved.isByok;
if (useProtections) {
  // 1. Suspension check (unchanged)
  if (checkSuspended(currentUser)) → 403

  // 2. Model access check (NEW)
  const plan = (currentUser.tier ?? "free") as PlanId;
  const modelAccess = await validateModelForPlan(plan, model);
  if (!modelAccess.allowed) → 403 { error: modelAccess.error, code: "MODEL_NOT_ALLOWED" }

  // 3. Credit balance check (NEW)
  const { ok, balance } = await checkBalance(currentUser.id);
  if (!ok) → 402 { error: "Out of credits", code: "NO_CREDITS" }

  // 4. Rate limit (CHANGED: use plan-based RPM)
  const planConfig = PLANS[(currentUser.tier ?? "free") as PlanId];
  const rateLimitError = await checkRateLimit(currentUser.id, planConfig.rateLimit);
  if (rateLimitError) → 429

  // 5. Concurrency (CHANGED: use plan-based max)
  if (!(await acquireConcurrency(currentUser.id, planConfig.maxConcurrent))) → 429
}

AFTER stream done (line 830-843, add credit deduction):

typescript

if (useProtections) {
  // Log usage (existing, keep as-is)
  const usageLogId = crypto.randomUUID();
  db.insert(usageLogs).values({
    id: usageLogId,
    userId: currentUser.id,
    sessionId,
    model,
    promptTokens: chunk.usage?.promptTokens ?? 0,
    completionTokens: chunk.usage?.completionTokens ?? 0,
    totalTokens: chunk.usage?.totalTokens ?? 0,
    endpoint: "send",
    apiKeyTier: resolved.apiKeyTier,
    generationTimeMs,
  }).catch(err => console.error("[UsageLog]", err.message));

  // Deduct credits (NEW)
  const promptTokens = chunk.usage?.promptTokens ?? 0;
  const completionTokens = chunk.usage?.completionTokens ?? 0;
  if (promptTokens > 0 || completionTokens > 0) {
    try {
      const cost = await calculateCost(model, promptTokens, completionTokens);
      const result = await deductCredits(
        currentUser.id,
        cost,
        usageLogId,
        `${model} — ${promptTokens + completionTokens} tokens`,
      );
      // Include in done SSE event so client can update balance
      creditsCost = cost;
      newBalance = result.newBalance;
    } catch (err) {
      console.error("[Credit] Deduction failed:", err);
      // Don't fail the request — the message is already generated.
      // Log for manual reconciliation.
    }
  }
}

Done SSE event (line 845-858, add credit info):

typescript

await stream.writeSSE({
  event: "done",
  data: JSON.stringify({
    messageId: assistantMsg.id,
    userMessageId: userMsg?.id ?? null,
    content: cleanText,
    stateChanges: allChanges,
    state: finalState,
    tokenCount: chunk.usage?.totalTokens ?? null,
    generationTimeMs,
    choices,
    audioEffects: allAudioEffects.length > 0 ? allAudioEffects : undefined,
    // NEW: credit info for client-side balance display
    credits: useProtections ? { cost: creditsCost, balance: newBalance } : undefined,
  }),
});

5.2 Same pattern for regenerate (line 1050-1385) and continue (line 1440-1822)

Identical changes: add model access check, balance check, plan-based rate limits, credit deduction after done chunk, credit info in done SSE event.

5.3 studio.ts — Playtest endpoint

Add same credit deduction after generation completes. Use endpoint "studio-playtest" in usage log.

5.4 agent.ts — Studio agent endpoint

Add credit deduction per iteration of the agent loop. Each LLM call within the agent deducts independently. Critical: check balance BEFORE each iteration — abort the agent loop if credits depleted mid-run.

5.5 room-messages.ts — Multiplayer messages

Add credit deduction for the message sender (the user who triggered the AI response). Use endpoint "room-send" in usage log.

5.6 rate-limit.ts — Plan-based limits

typescript

// CHANGED: Accept rate limit as parameter instead of hardcoded
const DEFAULT_RPM = 6;
const DEFAULT_CONCURRENT = 2;

export async function checkRateLimit(
  userId: string,
  maxPerMinute: number = DEFAULT_RPM,
): Promise<{ error: string; code: string; retryAfter: number } | null> {
  // ... same sliding window logic, but use maxPerMinute instead of MAX_PER_MINUTE
}

export async function acquireConcurrency(
  userId: string,
  maxConcurrent: number = DEFAULT_CONCURRENT,
): Promise<boolean> {
  // ... same logic, but use maxConcurrent parameter
}

5.7 resolve-provider.ts — Update OFFICIAL_ALLOWED_MODELS

Replace the hardcoded set with a dynamic check against the model_prices table:

typescript

// BEFORE: hardcoded set
const OFFICIAL_ALLOWED_MODELS = new Set([...]);

// AFTER: check model_prices table (via cache)
import { getModelPrice } from "./model-price-cache.js";

async function isOfficialModel(modelId: string): Promise<boolean> {
  const price = await getModelPrice(modelId);
  return price !== null;  // If it's in the price table and active, it's official
}

5.8 subscription.ts — Updated status endpoint

typescript

// AFTER: include credit wallet info
subscriptionRoutes.get("/subscription/status", async (c) => {
  const currentUser = c.get("user");
  const wallet = await ensureWallet(currentUser.id);
  const plan = PLANS[wallet.plan as PlanId];

  // ... existing usage query ...

  return c.json({
    data: {
      plan: wallet.plan,
      balance: Math.floor(wallet.balance),  // integer for display
      monthlyCredits: plan.monthlyCredits,
      memoryCap: plan.memoryCap,
      periodEnd: wallet.periodEnd.toISOString(),
      // ... existing fields (mode, hasByokKeys, usage, etc.)
    },
  });
});

5.9 Memory cap enforcement — Prompt builder

In the message route's prompt building section, apply the memory cap from the wallet:

typescript

// In messages.ts, during prompt building (before the LLM call):
const wallet = await ensureWallet(currentUser.id);
const memoryCap = wallet.memoryCap; // null = unlimited

// When building the message history, truncate to memoryCap:
// This MUST happen server-side, not client-side.
const maxContextTokens = memoryCap ?? model_context_limit;
// Apply to the history loading / context window calculation

6. Security Model

Attack vectors and defenses

6.1 Model access bypass

Attack: User crafts API request with a model they shouldn't have access to (e.g., Free user requests Claude Opus).

Defense: Server validates model against model_prices.min_plan BEFORE resolving the provider. The check happens in the message handler, after auth but before any LLM call. The model ID comes from the request body, but access is validated server-side against the DB. There is no client-side gate to bypass.

Request: POST /api/sessions/:id/messages { model: "anthropic/claude-opus-4.6" }
Server:  user.tier = "free"
         model_prices.min_plan = "plus"
         planMeetsMinimum("free", "plus") → false → 403

6.2 Memory cap bypass

Attack: User sends a request hoping to use more context than their plan allows.

Defense: Memory cap is enforced in the prompt builder on the server. The client doesn't control how many tokens are sent to the LLM — the server builds the prompt. The server reads wallet.memoryCap and truncates the message history to fit within the cap. Even if the client sends a maxContext override in the request body, the server ignores it and uses the plan's cap.

User plan: Free (32K cap)
Conversation history: 80K tokens
Server truncates to: 32K tokens (drops oldest messages)
LLM receives: 32K tokens
Credits charged: based on 32K (what was actually sent)

6.3 Credit balance race condition (double-spend)

Attack: User opens 5 browser tabs, sends messages simultaneously, hoping to use credits before the balance updates.

Defense: Three layers:

Concurrency limit (1-2 max per plan) — most tabs get rejected immediately with 429
Atomic SQL deduction — UPDATE credit_wallets SET balance = balance - $cost WHERE userId = $id AND balance > -500 RETURNING balance. This is a single atomic statement. PostgreSQL guarantees serialization — two concurrent UPDATEs will execute sequentially, not overlap.
Max overdraft cap (-500 credits) — even if they slip through with 2 concurrent messages, the overdraft is bounded.

6.4 Token count manipulation

Attack: Somehow fake lower token counts to get cheaper deductions.

Defense: Token counts come from the LLM provider response (chunk.usage.promptTokens, chunk.usage.completionTokens), not from the client. The client never sends token counts. The server reads them from the streaming response's done chunk. There is no client-controllable parameter that affects the token count used for billing.

6.5 BYOK mode exploitation

Attack: User pretends to be in BYOK mode to skip credit deduction.

Defense: BYOK status is determined entirely server-side by resolveProviderForModel():

It checks user.preferences.preferredProvider (stored in DB, not a request param)
It checks if the user actually has stored API keys in the apiKeys table
If they DO have keys and prefer "private", isByok = true
If they DON'T have keys, they fall back to official and isByok = false There is no request parameter to set isByok. It's computed from DB state.

6.6 Plan spoofing

Attack: User claims to be on a higher plan than they are.

Defense: The plan is stored in credit_wallets.plan (database). It's read server-side. The user cannot set their plan via any API request. Plan changes only happen through:

Subscription payment (Stripe webhook → update DB)
Admin adjustment

6.7 Replay attacks

Attack: Replay a successful /api/sessions/:id/messages request to get free generations.

Defense: Each message creates a new usage log entry with a unique ID. Credit deduction uses this ID as referenceId. The credit_transactions table has a unique index on (reference_id) WHERE type = 'usage' — attempting to deduct twice with the same reference ID will fail with a unique constraint violation. Additionally, each message is saved to the DB with a unique ID, and the conversation state advances — replaying a request would just create a duplicate message at the same conversation point, which is harmless and still costs credits.

6.8 Direct API access without the client

Attack: User reverse-engineers the API and calls it directly, bypassing client-side UX warnings about credit cost.

Defense: All protections are server-side. The client-side UX (balance display, cost warnings) is purely informational. Even if a user calls the API directly with curl, every check (auth, plan, model access, balance, rate limit, concurrency, deduction) happens on the server. The API is the enforcement layer, not the client.

6.9 Timing attack on period refresh

Attack: User notices that monthly credits are granted lazily (on first request after period expires). They wait until period expires, then send a burst of requests hoping to use old credits + new credits.

Defense: refreshMonthlyCredits() resets balance to monthlyCredits — it doesn't add to existing balance. So the old balance is replaced, not stacked. And the refresh runs synchronously before the balance check, so there's no window where both old and new credits are available.

6.10 Free tier farming via API

Attack: User writes a script to burn through their 1,000 free credits as fast as possible, extracting maximum value.

Defense: Rate limiting (6/min) caps throughput regardless of automation. With 1,000 credits on the cheapest model (3.8 cr/msg), that's ~263 messages. At 6/min, it takes ~44 minutes to exhaust. The user gets exactly what the Free tier promises — nothing more. This isn't an attack; it's just usage.

7. Request Flow (Complete)

POST /api/sessions/:sessionId/messages
│
├─ 1. Auth middleware
│     → Validates session token (Redis cache / DB)
│     → Sets c.user (includes tier)
│     → 401 if unauthorized
│
├─ 2. Extract model from request body
│     → Default: plan's default model
│
├─ 3. Resolve provider
│     → resolveProviderForModel(userId, model, { forceOfficial })
│     → Returns: { provider, isByok, apiKeyTier }
│     → 400 if no key available
│
├─ 4. Determine protection mode
│     → useProtections = !resolved.isByok
│     → If BYOK: skip steps 5-9, jump to 10
│
├─ 5. Suspension check
│     → user.isSuspended → 403 SUSPENDED
│
├─ 6. Model access check  ← NEW
│     → validateModelForPlan(user.tier, model)
│     → 403 MODEL_NOT_ALLOWED if plan too low
│
├─ 7. Credit balance check  ← NEW
│     → checkBalance(userId)
│     → Includes lazy period refresh
│     → 402 NO_CREDITS if balance ≤ 0
│
├─ 8. Rate limit check  ← CHANGED (plan-based)
│     → checkRateLimit(userId, planConfig.rateLimit)
│     → 429 RATE_LIMITED if >6/min
│
├─ 9. Concurrency check  ← CHANGED (plan-based)
│     → acquireConcurrency(userId, planConfig.maxConcurrent)
│     → 429 CONCURRENT_LIMIT if at max
│
├─ 10. Build prompt
│      → Apply memory cap server-side  ← NEW
│      → Truncate history to wallet.memoryCap
│
├─ 11. Stream generation (SSE)
│      → provider.generateStream(params)
│      → Emit text/reasoning/segment events
│
├─ 12. On "done" chunk:
│      ├─ Parse response, apply effects, persist message (existing)
│      ├─ Log usage to usage_logs (existing)
│      ├─ Calculate credit cost from actual tokens  ← NEW
│      │    → calculateCost(model, promptTokens, completionTokens)
│      ├─ Atomic deduct from wallet  ← NEW
│      │    → deductCredits(userId, cost, usageLogId, description)
│      └─ Include { credits: { cost, balance } } in done SSE  ← NEW
│
├─ 13. On error/abort:
│      ├─ If tokens were generated: deduct for actual tokens  ← NEW
│      └─ If no tokens: no deduction
│
└─ 14. Finally:
       → releaseConcurrency(userId) if useProtections

8. Client-Side Changes

8.1 Balance display (header)

Always-visible credit counter in the app header:

⚡ 1,247

Read from GET /api/subscription/status on app load. Updated locally from the credits.balance field in every done SSE event.

8.2 Per-message cost (after AI responds)

Small indicator on each AI message:

[AI response text...]
                                      ⚡ 28

Read from credits.cost in the done SSE event.

8.3 Model selector cost indicator

Show approximate cost per message with color-coded tier:

🟢 Gemini 2.5 Flash Lite          ~4 /msg
🟢 Grok 4.1 Fast                  ~7 /msg
🟢 DeepSeek V3.2                  ~9 /msg
🟢 Gemini 3.1 Flash Lite          ~14 /msg
🟡 Gemini 2.5 Flash               ~18 /msg
🟡 Gemini 3 Flash Preview         ~29 /msg
🟠 Claude Haiku 4.5               ~56 /msg
🟠 Grok 4.20                      ~105 /msg
🟠 Gemini 3.1 Pro Preview         ~114 /msg
🔴 Claude Sonnet 4.6              ~167 /msg
🔴 Claude Opus 4.6                ~278 /msg

Approximate values based on 48K context, 1500 output. Shown in model picker UI. Color = green/yellow/orange/red dot in the UI.

8.4 Low balance warnings

> 20% remaining:  normal
< 20%:            yellow: "⚡ 187 — running low"
< 5%:             orange: "⚡ 43 — top up to keep playing"  [Buy Credits]
= 0:              "Out of credits" → show upgrade/addon/BYOK options

8.5 Context slider (Plus+ only)

Settings panel for Plus/Pro/Ultra users:

Memory: 48K (recommended)
[======●==========================] model max

Higher memory = smarter AI, but uses more credits per message.

Free/Go users see their cap as fixed:

Memory: 32K (Free plan limit)
Upgrade to Plus for unlimited memory →

9. Migration Plan

Phase 1: Database + backend (no client changes yet)

Add credit_wallets, credit_transactions, model_prices tables
Seed model_prices with the 11 models
Create credit-service.ts, model-price-cache.ts, plan-config.ts
Migrate existing users:
- tier = "regular" → create wallet with plan "free", balance 1000
- tier = "invited" → create wallet with plan "go", balance 2000
Update user.tier values: "regular" → "free", "invited" → "go"

Phase 2: Enforce credits in message flow

Add model access validation to messages.ts (send, regenerate, continue)
Add credit balance check before generation
Add credit deduction after stream completion
Update rate limits from 10/min to 6/min, plan-based concurrency
Add credit deduction to studio.ts (playtest), agent.ts, room-messages.ts
Enforce memory cap in prompt builder server-side
Include credits field in done SSE event

Phase 3: Client UX

Add balance display to app header
Show per-message credit cost on AI messages
Add cost indicators to model selector
Add low-balance warnings
Update subscription status page
Add context slider for Plus+ users

Phase 4: Billing integration (future)

Stripe checkout for plan upgrades
Stripe webhooks for plan changes
Add-on credit packs
Auto-renewal on period expiry (instead of lazy refresh)
Monthly credit expiry for plan grants (add-on credits don't expire)

10. Testing Checklist

Critical paths to test

[ ] Free user can only access Flash Lite, Grok Fast, DeepSeek
[ ] Free user requesting Sonnet gets 403 MODEL_NOT_ALLOWED
[ ] Go user can access all models up to Gemini 3.1 Pro Preview
[ ] Plus user can access all models including Opus
[ ] Credits deducted correctly for Flash Lite message (~3-7 credits)
[ ] Credits deducted correctly for Opus message (~100-300 credits)
[ ] Grok 4.1 Fast tiered pricing: <128K context gets base price, >128K gets 2x
[ ] Balance goes to 0 → next request blocked with 402
[ ] Balance at 5 credits, message costs 200 → goes to -195 (allowed, one grace message)
[ ] Balance at -195 → next request blocked
[ ] Two concurrent requests: both get correct deduction (no race condition)
[ ] Three concurrent requests on Free (max 1): third gets 429
[ ] BYOK user: zero credit deduction, no rate limiting
[ ] Cancel mid-stream: credits deducted for actual tokens generated
[ ] Monthly period expires: balance reset to plan's monthly credits
[ ] Monthly period expires: old balance NOT added to new (reset, not accumulate)
[ ] Regenerate: full credit deduction (not free)
[ ] Continue: full credit deduction (not free)
[ ] Studio playtest: credits deducted
[ ] Studio agent: credits deducted per iteration
[ ] Room messages: sender pays
[ ] Memory cap enforced for Free at 32K (server-side truncation)
[ ] Memory cap enforced for Go at 64K
[ ] Plus+ users can use up to model's native context limit
[ ] Done SSE event includes credits.cost and credits.balance
[ ] Same usage_log ID cannot be charged twice (unique constraint)

Yumina Credit System — Architecture Plan ​

1. Tier Table (Final) ​

2. Database Schema ​

New tables ​

Drizzle schema (packages/server/src/db/schema.ts) ​

Initial model price data ​

Modify existing user table ​

3. Credit Calculation ​

Formula ​

Examples (at 48K context, 1500 output tokens) ​

4. Server Architecture ​

New files to create ​

plan-config.ts — Static plan definitions ​

model-price-cache.ts — Price cache loaded from DB ​

credit-service.ts — Central credit operations ​

credit-guard.ts — Middleware for credit checks ​

5. Integration Points (Exact Code Changes) ​

5.1 messages.ts — Send endpoint (lines 432-878) ​

5.2 Same pattern for regenerate (line 1050-1385) and continue (line 1440-1822) ​

5.3 studio.ts — Playtest endpoint ​

5.4 agent.ts — Studio agent endpoint ​

5.5 room-messages.ts — Multiplayer messages ​

5.6 rate-limit.ts — Plan-based limits ​

5.7 resolve-provider.ts — Update OFFICIAL_ALLOWED_MODELS ​

5.8 subscription.ts — Updated status endpoint ​

5.9 Memory cap enforcement — Prompt builder ​

6. Security Model ​

Attack vectors and defenses ​

6.1 Model access bypass ​

6.2 Memory cap bypass ​

6.3 Credit balance race condition (double-spend) ​

6.4 Token count manipulation ​

6.5 BYOK mode exploitation ​

6.6 Plan spoofing ​

6.7 Replay attacks ​

6.8 Direct API access without the client ​

6.9 Timing attack on period refresh ​

6.10 Free tier farming via API ​

7. Request Flow (Complete) ​

8. Client-Side Changes ​

8.1 Balance display (header) ​

8.2 Per-message cost (after AI responds) ​

8.3 Model selector cost indicator ​

8.4 Low balance warnings ​

8.5 Context slider (Plus+ only) ​

9. Migration Plan ​

Phase 1: Database + backend (no client changes yet) ​

Phase 2: Enforce credits in message flow ​

Phase 3: Client UX ​

Phase 4: Billing integration (future) ​

10. Testing Checklist ​

Critical paths to test ​

Yumina Credit System — Architecture Plan

1. Tier Table (Final)

2. Database Schema

New tables

Drizzle schema (packages/server/src/db/schema.ts)

Initial model price data

Modify existing user table

3. Credit Calculation

Formula

Examples (at 48K context, 1500 output tokens)

4. Server Architecture

New files to create

plan-config.ts — Static plan definitions

model-price-cache.ts — Price cache loaded from DB

credit-service.ts — Central credit operations

credit-guard.ts — Middleware for credit checks

5. Integration Points (Exact Code Changes)

5.1 messages.ts — Send endpoint (lines 432-878)

5.2 Same pattern for regenerate (line 1050-1385) and continue (line 1440-1822)

5.3 studio.ts — Playtest endpoint

5.4 agent.ts — Studio agent endpoint

5.5 room-messages.ts — Multiplayer messages

5.6 rate-limit.ts — Plan-based limits

5.7 resolve-provider.ts — Update OFFICIAL_ALLOWED_MODELS

5.8 subscription.ts — Updated status endpoint

5.9 Memory cap enforcement — Prompt builder

6. Security Model

Attack vectors and defenses

6.1 Model access bypass

6.2 Memory cap bypass

6.3 Credit balance race condition (double-spend)

6.4 Token count manipulation

6.5 BYOK mode exploitation

6.6 Plan spoofing

6.7 Replay attacks

6.8 Direct API access without the client

6.9 Timing attack on period refresh

6.10 Free tier farming via API

7. Request Flow (Complete)

8. Client-Side Changes

8.1 Balance display (header)

8.2 Per-message cost (after AI responds)

8.3 Model selector cost indicator

8.4 Low balance warnings

8.5 Context slider (Plus+ only)

9. Migration Plan

Phase 1: Database + backend (no client changes yet)

Phase 2: Enforce credits in message flow

Phase 3: Client UX

Phase 4: Billing integration (future)

10. Testing Checklist

Critical paths to test