Yumina Credit System — Architecture Plan
1. Tier Table (Final)
| Tier | Price | Raw Compute | Monthly Credits | Memory Cap | Models | RPM | Concurrent |
|---|---|---|---|---|---|---|---|
| Free | $0 | $1 | 1,000 | 32K | ≤ DeepSeek V3.2 | 6 | 1 |
| Go | $5 | $2 | 2,000 | 64K | ≤ Gemini 3.1 Pro Preview | 6 | 2 |
| Plus | $20 | $8 | 8,000 | Unlimited (default 48K) | All | 6 | 2 |
| Pro | $50 | $20 | 20,000 | Unlimited (default 64K) | All | 6 | 3 |
| Ultra | $100 | $40 | 40,000 | Unlimited (default 64K) | All | 6 | 3 |
Credit unit: $1 raw compute = 1,000 credits. 1 credit = $0.001 compute.
Model access by tier (ordered by weight):
| # | Model | Weight | Free | Go | Plus+ |
|---|---|---|---|---|---|
| 1 | google/gemini-2.5-flash-lite | 1.0x | ✓ | ✓ | ✓ |
| 2 | x-ai/grok-4.1-fast | 1.9x | ✓ | ✓ | ✓ |
| 3 | deepseek/deepseek-v3.2 | 2.3x | ✓ | ✓ | ✓ |
| 4 | google/gemini-3.1-flash-lite-preview | 2.7x | ✓ | ✓ | |
| 5 | google/gemini-2.5-flash | 3.5x | ✓ | ✓ | |
| 6 | google/gemini-3-flash-preview | 5.5x | ✓ | ✓ | |
| 7 | anthropic/claude-haiku-4.5 | 10.5x | ✓ | ✓ | |
| 8 | x-ai/grok-4.20 | 19x | ✓ | ✓ | |
| 9 | google/gemini-3.1-pro-preview | 22x | ✓ | ✓ | |
| 10 | anthropic/claude-sonnet-4.6 | 31x | ✓ | ||
| 11 | anthropic/claude-opus-4.6 | 52x | ✓ |
2. Database Schema
New tables
-- ─── Credit Wallets (one per user) ───────────────────────────────────
CREATE TABLE credit_wallets (
id TEXT PRIMARY KEY DEFAULT gen_random_uuid(),
user_id TEXT NOT NULL UNIQUE REFERENCES "user"(id) ON DELETE CASCADE,
balance NUMERIC(12,1) NOT NULL DEFAULT 0, -- 1 decimal precision
plan TEXT NOT NULL DEFAULT 'free', -- free|go|plus|pro|ultra
monthly_credits INTEGER NOT NULL DEFAULT 1000,
memory_cap INTEGER, -- NULL = unlimited
period_start TIMESTAMP NOT NULL DEFAULT NOW(),
period_end TIMESTAMP NOT NULL DEFAULT (NOW() + INTERVAL '1 month'),
created_at TIMESTAMP NOT NULL DEFAULT NOW(),
updated_at TIMESTAMP NOT NULL DEFAULT NOW()
);
CREATE INDEX credit_wallets_user_id_idx ON credit_wallets(user_id);
-- ─── Credit Transactions (append-only ledger) ────────────────────────
CREATE TABLE credit_transactions (
id TEXT PRIMARY KEY DEFAULT gen_random_uuid(),
wallet_id TEXT NOT NULL REFERENCES credit_wallets(id) ON DELETE CASCADE,
amount NUMERIC(12,1) NOT NULL, -- positive=credit, negative=debit
type TEXT NOT NULL, -- plan_grant|usage|addon|admin|refund|expire
reference_id TEXT, -- usage_log.id, stripe payment id, etc.
balance_after NUMERIC(12,1) NOT NULL, -- snapshot for audit
description TEXT,
created_at TIMESTAMP NOT NULL DEFAULT NOW()
);
CREATE INDEX credit_txn_wallet_idx ON credit_transactions(wallet_id);
CREATE INDEX credit_txn_ref_idx ON credit_transactions(reference_id);
CREATE UNIQUE INDEX credit_txn_usage_unique ON credit_transactions(reference_id) WHERE type = 'usage';
-- ^ Prevents double-deduction: same usage_log.id can only be charged once
-- ─── Model Prices ────────────────────────────────────────────────────
CREATE TABLE model_prices (
id TEXT PRIMARY KEY DEFAULT gen_random_uuid(),
model_id TEXT NOT NULL, -- e.g. "google/gemini-2.5-flash-lite"
input_price_per_m NUMERIC(8,4) NOT NULL, -- $/M input tokens
output_price_per_m NUMERIC(8,4) NOT NULL, -- $/M output tokens
context_threshold INTEGER, -- NULL = flat pricing
input_price_above_threshold NUMERIC(8,4), -- price when prompt > threshold
output_price_above_threshold NUMERIC(8,4),
min_plan TEXT NOT NULL DEFAULT 'free', -- minimum plan to use this model
is_active BOOLEAN NOT NULL DEFAULT TRUE,
updated_at TIMESTAMP NOT NULL DEFAULT NOW()
);
CREATE UNIQUE INDEX model_prices_model_id_threshold ON model_prices(model_id, context_threshold);Drizzle schema (packages/server/src/db/schema.ts)
// ─── Credit System ──────────────────────────────────────────────────
export const creditWallets = pgTable("credit_wallets", {
id: text("id").primaryKey().$defaultFn(() => crypto.randomUUID()),
userId: text("user_id").notNull().unique()
.references(() => user.id, { onDelete: "cascade" }),
balance: real("balance").notNull().default(0),
plan: text("plan").notNull().default("free"),
monthlyCredits: integer("monthly_credits").notNull().default(1000),
memoryCap: integer("memory_cap"), // null = unlimited
periodStart: timestamp("period_start").notNull().defaultNow(),
periodEnd: timestamp("period_end").notNull()
.$defaultFn(() => new Date(Date.now() + 30 * 24 * 60 * 60 * 1000)),
createdAt: timestamp("created_at").notNull().defaultNow(),
updatedAt: timestamp("updated_at").notNull().defaultNow(),
}, (t) => [
index("credit_wallets_user_id_idx").on(t.userId),
]);
export const creditTransactions = pgTable("credit_transactions", {
id: text("id").primaryKey().$defaultFn(() => crypto.randomUUID()),
walletId: text("wallet_id").notNull()
.references(() => creditWallets.id, { onDelete: "cascade" }),
amount: real("amount").notNull(),
type: text("type").notNull(), // plan_grant|usage|addon|admin|refund|expire
referenceId: text("reference_id"),
balanceAfter: real("balance_after").notNull(),
description: text("description"),
createdAt: timestamp("created_at").notNull().defaultNow(),
}, (t) => [
index("credit_txn_wallet_idx").on(t.walletId),
index("credit_txn_ref_idx").on(t.referenceId),
]);
export const modelPrices = pgTable("model_prices", {
id: text("id").primaryKey().$defaultFn(() => crypto.randomUUID()),
modelId: text("model_id").notNull(),
inputPricePerM: real("input_price_per_m").notNull(),
outputPricePerM: real("output_price_per_m").notNull(),
contextThreshold: integer("context_threshold"),
inputPriceAboveThreshold: real("input_price_above_threshold"),
outputPriceAboveThreshold: real("output_price_above_threshold"),
minPlan: text("min_plan").notNull().default("free"),
isActive: boolean("is_active").notNull().default(true),
updatedAt: timestamp("updated_at").notNull().defaultNow(),
});Initial model price data
INSERT INTO model_prices (model_id, input_price_per_m, output_price_per_m, context_threshold, input_price_above_threshold, output_price_above_threshold, min_plan) VALUES
('google/gemini-2.5-flash-lite', 0.10, 0.40, NULL, NULL, NULL, 'free'),
('x-ai/grok-4.1-fast', 0.20, 0.50, 128000, 0.40, 1.00, 'free'),
('deepseek/deepseek-v3.2', 0.26, 0.38, NULL, NULL, NULL, 'free'),
('google/gemini-3.1-flash-lite-preview', 0.25, 1.50, NULL, NULL, NULL, 'go'),
('google/gemini-2.5-flash', 0.30, 2.50, NULL, NULL, NULL, 'go'),
('google/gemini-3-flash-preview', 0.50, 3.00, NULL, NULL, NULL, 'go'),
('anthropic/claude-haiku-4.5', 1.00, 5.00, NULL, NULL, NULL, 'go'),
('x-ai/grok-4.20', 2.00, 6.00, 200000, 4.00, 12.00, 'go'),
('google/gemini-3.1-pro-preview', 2.00, 12.00, NULL, NULL, NULL, 'go'),
('anthropic/claude-sonnet-4.6', 3.00, 15.00, NULL, NULL, NULL, 'plus'),
('anthropic/claude-opus-4.6', 5.00, 25.00, NULL, NULL, NULL, 'plus');Modify existing user table
-- Replace the tier column semantics. Keep the column but repurpose it.
-- Old: "regular" | "invited"
-- New: "free" | "go" | "plus" | "pro" | "ultra"
-- Migration: all existing "regular" → "free", "invited" → "go"
ALTER TABLE "user" ALTER COLUMN tier SET DEFAULT 'free';
UPDATE "user" SET tier = 'free' WHERE tier = 'regular';
UPDATE "user" SET tier = 'go' WHERE tier = 'invited';3. Credit Calculation
Formula
credits = (prompt_tokens × input_price + completion_tokens × output_price) / 1000Where prices are looked up from model_prices table, with tiered pricing check:
function getModelPrices(modelId: string, promptTokens: number): {
inputPrice: number; // $/M tokens
outputPrice: number; // $/M tokens
} {
const row = modelPriceCache.get(modelId);
if (!row) throw new Error(`Unknown model: ${modelId}`);
// Check tiered pricing (Grok models charge 2x above threshold)
if (row.contextThreshold && promptTokens > row.contextThreshold) {
return {
inputPrice: row.inputPriceAboveThreshold!,
outputPrice: row.outputPriceAboveThreshold!,
};
}
return {
inputPrice: row.inputPricePerM,
outputPrice: row.outputPricePerM,
};
}
function calculateCredits(
modelId: string,
promptTokens: number,
completionTokens: number,
): number {
const { inputPrice, outputPrice } = getModelPrices(modelId, promptTokens);
const cost = (promptTokens * inputPrice + completionTokens * outputPrice) / 1_000_000;
// cost is in dollars. $1 = 1000 credits. So credits = cost * 1000.
// Round up to nearest 0.1 (1 decimal place)
return Math.ceil(cost * 1000 * 10) / 10;
}Examples (at 48K context, 1500 output tokens)
| Model | Calculation | Credits |
|---|---|---|
| Flash Lite | (48000×0.10 + 1500×0.40) / 1000 | 5.4 |
| DeepSeek V3.2 | (48000×0.26 + 1500×0.38) / 1000 | 13.1 |
| Gemini 3 Flash | (48000×0.50 + 1500×3.00) / 1000 | 28.5 |
| Claude Haiku | (48000×1.00 + 1500×5.00) / 1000 | 55.5 |
| Claude Sonnet | (48000×3.00 + 1500×15.00) / 1000 | 166.5 |
| Claude Opus | (48000×5.00 + 1500×25.00) / 1000 | 277.5 |
| Grok 4.1 Fast @ 64K | (64000×0.20 + 1500×0.50) / 1000 | 13.6 |
| Grok 4.1 Fast @ 200K | (200000×0.40 + 1500×1.00) / 1000 | 81.5 |
4. Server Architecture
New files to create
packages/server/src/
lib/
credit-service.ts — CreditService class (all credit operations)
model-price-cache.ts — In-memory model price cache (loaded from DB on startup)
plan-config.ts — Plan definitions (static config)
middleware/
credit-guard.ts — Middleware: plan validation + balance check + model accessplan-config.ts — Static plan definitions
export type PlanId = "free" | "go" | "plus" | "pro" | "ultra";
export interface PlanConfig {
id: PlanId;
monthlyCredits: number;
memoryCap: number | null; // null = unlimited
rateLimit: number; // messages per minute
maxConcurrent: number;
priceCents: number;
}
// Plan hierarchy for access checks: higher index = higher tier
const PLAN_HIERARCHY: PlanId[] = ["free", "go", "plus", "pro", "ultra"];
export function planMeetsMinimum(userPlan: PlanId, requiredPlan: PlanId): boolean {
return PLAN_HIERARCHY.indexOf(userPlan) >= PLAN_HIERARCHY.indexOf(requiredPlan);
}
export const PLANS: Record<PlanId, PlanConfig> = {
free: { id: "free", monthlyCredits: 1000, memoryCap: 32_000, rateLimit: 6, maxConcurrent: 1, priceCents: 0 },
go: { id: "go", monthlyCredits: 2000, memoryCap: 64_000, rateLimit: 6, maxConcurrent: 2, priceCents: 500 },
plus: { id: "plus", monthlyCredits: 8000, memoryCap: null, rateLimit: 6, maxConcurrent: 2, priceCents: 2000 },
pro: { id: "pro", monthlyCredits: 20000, memoryCap: null, rateLimit: 6, maxConcurrent: 3, priceCents: 5000 },
ultra: { id: "ultra", monthlyCredits: 40000, memoryCap: null, rateLimit: 6, maxConcurrent: 3, priceCents: 10000 },
};model-price-cache.ts — Price cache loaded from DB
import { db } from "../db/index.js";
import { modelPrices } from "../db/schema.js";
import { eq } from "drizzle-orm";
interface ModelPriceEntry {
modelId: string;
inputPricePerM: number;
outputPricePerM: number;
contextThreshold: number | null;
inputPriceAboveThreshold: number | null;
outputPriceAboveThreshold: number | null;
minPlan: string;
}
let cache: Map<string, ModelPriceEntry> = new Map();
let lastLoaded = 0;
const CACHE_TTL = 5 * 60 * 1000; // 5 minutes
export async function getModelPrice(modelId: string): Promise<ModelPriceEntry | null> {
if (Date.now() - lastLoaded > CACHE_TTL || cache.size === 0) {
await refreshCache();
}
return cache.get(modelId) ?? null;
}
export async function getAllModelPrices(): Promise<ModelPriceEntry[]> {
if (Date.now() - lastLoaded > CACHE_TTL || cache.size === 0) {
await refreshCache();
}
return Array.from(cache.values());
}
async function refreshCache() {
const rows = await db.select().from(modelPrices).where(eq(modelPrices.isActive, true));
const newCache = new Map<string, ModelPriceEntry>();
for (const row of rows) {
newCache.set(row.modelId, {
modelId: row.modelId,
inputPricePerM: row.inputPricePerM,
outputPricePerM: row.outputPricePerM,
contextThreshold: row.contextThreshold,
inputPriceAboveThreshold: row.inputPriceAboveThreshold,
outputPriceAboveThreshold: row.outputPriceAboveThreshold,
minPlan: row.minPlan,
});
}
cache = newCache;
lastLoaded = Date.now();
}credit-service.ts — Central credit operations
import { db } from "../db/index.js";
import { creditWallets, creditTransactions } from "../db/schema.js";
import { eq, sql } from "drizzle-orm";
import { getModelPrice } from "./model-price-cache.js";
import type { PlanId } from "./plan-config.js";
import { PLANS, planMeetsMinimum } from "./plan-config.js";
// ─── Types ──────────────────────────────────────────────────────────
export interface CreditWallet {
id: string;
userId: string;
balance: number;
plan: PlanId;
memoryCap: number | null;
periodEnd: Date;
}
export interface DeductionResult {
creditsDeducted: number;
newBalance: number;
transactionId: string;
}
// ─── Ensure wallet exists ───────────────────────────────────────────
// Called on first generation or subscription change. Creates wallet with
// initial plan grant if it doesn't exist.
export async function ensureWallet(userId: string, plan: PlanId = "free"): Promise<CreditWallet> {
// Try to get existing wallet
const [existing] = await db.select().from(creditWallets).where(eq(creditWallets.userId, userId));
if (existing) {
return {
id: existing.id,
userId: existing.userId,
balance: existing.balance,
plan: existing.plan as PlanId,
memoryCap: existing.memoryCap,
periodEnd: existing.periodEnd,
};
}
// Create new wallet with initial grant
const config = PLANS[plan];
const periodEnd = new Date(Date.now() + 30 * 24 * 60 * 60 * 1000);
const [wallet] = await db.insert(creditWallets).values({
userId,
balance: config.monthlyCredits,
plan,
monthlyCredits: config.monthlyCredits,
memoryCap: config.memoryCap,
periodEnd,
}).returning();
// Record the grant in the ledger
await db.insert(creditTransactions).values({
walletId: wallet!.id,
amount: config.monthlyCredits,
type: "plan_grant",
balanceAfter: config.monthlyCredits,
description: `Initial ${plan} plan grant`,
});
return {
id: wallet!.id,
userId,
balance: config.monthlyCredits,
plan,
memoryCap: config.memoryCap,
periodEnd,
};
}
// ─── Check balance ──────────────────────────────────────────────────
// Fast check: is balance > 0? Used as pre-flight before generation.
export async function checkBalance(userId: string): Promise<{ ok: boolean; balance: number; wallet: CreditWallet }> {
const wallet = await ensureWallet(userId);
return { ok: wallet.balance > 0, balance: wallet.balance, wallet };
}
// ─── Validate model access ──────────────────────────────────────────
// Checks if the user's plan allows access to the requested model.
export async function validateModelAccess(
plan: PlanId,
modelId: string,
): Promise<{ allowed: boolean; reason?: string }> {
const price = await getModelPrice(modelId);
if (!price) {
return { allowed: false, reason: `Model "${modelId}" is not available on the official API.` };
}
if (!planMeetsMinimum(plan, price.minPlan as PlanId)) {
return { allowed: false, reason: `Model "${modelId}" requires the ${price.minPlan} plan or higher.` };
}
return { allowed: true };
}
// ─── Calculate cost ─────────────────────────────────────────────────
// Pure function. No DB access. Uses cached model prices.
export async function calculateCost(
modelId: string,
promptTokens: number,
completionTokens: number,
): Promise<number> {
const price = await getModelPrice(modelId);
if (!price) throw new Error(`No pricing for model: ${modelId}`);
let inputPrice = price.inputPricePerM;
let outputPrice = price.outputPricePerM;
// Tiered pricing: if prompt exceeds threshold, use higher rates
if (price.contextThreshold && promptTokens > price.contextThreshold) {
inputPrice = price.inputPriceAboveThreshold ?? inputPrice;
outputPrice = price.outputPriceAboveThreshold ?? outputPrice;
}
const costDollars = (promptTokens * inputPrice + completionTokens * outputPrice) / 1_000_000;
// $1 = 1000 credits. Round up to 1 decimal.
return Math.ceil(costDollars * 1000 * 10) / 10;
}
// ─── Deduct credits ─────────────────────────────────────────────────
// ATOMIC deduction with balance guard. Returns new balance.
// Uses a single UPDATE with RETURNING to prevent race conditions.
// Allows overdraft up to -500 credits (one message grace).
const MAX_OVERDRAFT = 500;
export async function deductCredits(
userId: string,
credits: number,
referenceId: string,
description: string,
): Promise<DeductionResult> {
// Atomic deduction: UPDATE ... WHERE balance > -MAX_OVERDRAFT RETURNING
// This is a single statement — no race condition possible.
const [updated] = await db
.update(creditWallets)
.set({
balance: sql`${creditWallets.balance} - ${credits}`,
updatedAt: new Date(),
})
.where(
sql`${creditWallets.userId} = ${userId} AND ${creditWallets.balance} > ${-MAX_OVERDRAFT}`
)
.returning({ id: creditWallets.id, balance: creditWallets.balance });
if (!updated) {
throw new Error("INSUFFICIENT_CREDITS");
}
// Record in ledger (append-only, fire-and-forget is OK here since the
// wallet balance is already updated atomically above)
const [txn] = await db.insert(creditTransactions).values({
walletId: updated.id,
amount: -credits,
type: "usage",
referenceId,
balanceAfter: updated.balance,
description,
}).returning({ id: creditTransactions.id });
return {
creditsDeducted: credits,
newBalance: updated.balance,
transactionId: txn!.id,
};
}
// ─── Grant monthly credits ──────────────────────────────────────────
// Called by a cron job or on first request after period expires.
export async function refreshMonthlyCredits(userId: string): Promise<void> {
const wallet = await ensureWallet(userId);
const now = new Date();
if (now < wallet.periodEnd) return; // Not yet expired
const config = PLANS[wallet.plan];
const newPeriodEnd = new Date(now.getTime() + 30 * 24 * 60 * 60 * 1000);
// Reset balance to monthly grant (don't accumulate — expired credits are lost)
await db.update(creditWallets).set({
balance: config.monthlyCredits,
periodStart: now,
periodEnd: newPeriodEnd,
updatedAt: now,
}).where(eq(creditWallets.userId, userId));
await db.insert(creditTransactions).values({
walletId: wallet.id,
amount: config.monthlyCredits,
type: "plan_grant",
balanceAfter: config.monthlyCredits,
description: `Monthly ${wallet.plan} plan renewal`,
});
}credit-guard.ts — Middleware for credit checks
import type { Context, Next } from "hono";
import type { AppEnv } from "../lib/types.js";
import { checkBalance, validateModelAccess, refreshMonthlyCredits } from "../lib/credit-service.js";
import { PLANS, type PlanId } from "../lib/plan-config.js";
/**
* Credit guard middleware. Run BEFORE rate limiting.
*
* Checks:
* 1. User's plan period — refresh if expired
* 2. Model access — is the model allowed for this plan?
* 3. Credit balance — is balance > 0?
*
* Sets on context:
* - c.set("wallet", wallet) — for downstream use
* - c.set("planConfig", config) — for rate limit lookups
*/
export async function creditGuard(c: Context<AppEnv>, next: Next) {
const user = c.get("user");
// BYOK users skip credit checks entirely
// (resolved later in the handler — we can't check here yet)
// The handler itself will skip deduction if isByok = true.
// But we still enforce model access for official key users.
const plan = (user.tier ?? "free") as PlanId;
const config = PLANS[plan] ?? PLANS.free;
c.set("planConfig" as never, config);
// Refresh monthly credits if period expired (lazy renewal)
await refreshMonthlyCredits(user.id);
// Check balance
const { ok, balance, wallet } = await checkBalance(user.id);
c.set("wallet" as never, wallet);
if (!ok) {
return c.json({
error: "You've used all your credits for this period. Purchase additional credits or upgrade your plan.",
code: "NO_CREDITS",
balance: 0,
}, 402);
}
await next();
}
/**
* Validate that the requested model is allowed for the user's plan.
* Call this in the handler after extracting the model from the request body.
*/
export async function validateModelForPlan(
plan: PlanId,
modelId: string,
): Promise<{ allowed: boolean; error?: string }> {
const result = await validateModelAccess(plan, modelId);
if (!result.allowed) {
return { allowed: false, error: result.reason };
}
return { allowed: true };
}5. Integration Points (Exact Code Changes)
5.1 messages.ts — Send endpoint (lines 432-878)
BEFORE (current, line 432-445):
const useProtections = !resolved.isByok;
if (useProtections) {
if (checkSuspended(currentUser)) → 403
checkRateLimit(currentUser.id) → 429
acquireConcurrency(currentUser.id) → 429
}AFTER (new flow):
const useProtections = !resolved.isByok;
if (useProtections) {
// 1. Suspension check (unchanged)
if (checkSuspended(currentUser)) → 403
// 2. Model access check (NEW)
const plan = (currentUser.tier ?? "free") as PlanId;
const modelAccess = await validateModelForPlan(plan, model);
if (!modelAccess.allowed) → 403 { error: modelAccess.error, code: "MODEL_NOT_ALLOWED" }
// 3. Credit balance check (NEW)
const { ok, balance } = await checkBalance(currentUser.id);
if (!ok) → 402 { error: "Out of credits", code: "NO_CREDITS" }
// 4. Rate limit (CHANGED: use plan-based RPM)
const planConfig = PLANS[(currentUser.tier ?? "free") as PlanId];
const rateLimitError = await checkRateLimit(currentUser.id, planConfig.rateLimit);
if (rateLimitError) → 429
// 5. Concurrency (CHANGED: use plan-based max)
if (!(await acquireConcurrency(currentUser.id, planConfig.maxConcurrent))) → 429
}AFTER stream done (line 830-843, add credit deduction):
if (useProtections) {
// Log usage (existing, keep as-is)
const usageLogId = crypto.randomUUID();
db.insert(usageLogs).values({
id: usageLogId,
userId: currentUser.id,
sessionId,
model,
promptTokens: chunk.usage?.promptTokens ?? 0,
completionTokens: chunk.usage?.completionTokens ?? 0,
totalTokens: chunk.usage?.totalTokens ?? 0,
endpoint: "send",
apiKeyTier: resolved.apiKeyTier,
generationTimeMs,
}).catch(err => console.error("[UsageLog]", err.message));
// Deduct credits (NEW)
const promptTokens = chunk.usage?.promptTokens ?? 0;
const completionTokens = chunk.usage?.completionTokens ?? 0;
if (promptTokens > 0 || completionTokens > 0) {
try {
const cost = await calculateCost(model, promptTokens, completionTokens);
const result = await deductCredits(
currentUser.id,
cost,
usageLogId,
`${model} — ${promptTokens + completionTokens} tokens`,
);
// Include in done SSE event so client can update balance
creditsCost = cost;
newBalance = result.newBalance;
} catch (err) {
console.error("[Credit] Deduction failed:", err);
// Don't fail the request — the message is already generated.
// Log for manual reconciliation.
}
}
}Done SSE event (line 845-858, add credit info):
await stream.writeSSE({
event: "done",
data: JSON.stringify({
messageId: assistantMsg.id,
userMessageId: userMsg?.id ?? null,
content: cleanText,
stateChanges: allChanges,
state: finalState,
tokenCount: chunk.usage?.totalTokens ?? null,
generationTimeMs,
choices,
audioEffects: allAudioEffects.length > 0 ? allAudioEffects : undefined,
// NEW: credit info for client-side balance display
credits: useProtections ? { cost: creditsCost, balance: newBalance } : undefined,
}),
});5.2 Same pattern for regenerate (line 1050-1385) and continue (line 1440-1822)
Identical changes: add model access check, balance check, plan-based rate limits, credit deduction after done chunk, credit info in done SSE event.
5.3 studio.ts — Playtest endpoint
Add same credit deduction after generation completes. Use endpoint "studio-playtest" in usage log.
5.4 agent.ts — Studio agent endpoint
Add credit deduction per iteration of the agent loop. Each LLM call within the agent deducts independently. Critical: check balance BEFORE each iteration — abort the agent loop if credits depleted mid-run.
5.5 room-messages.ts — Multiplayer messages
Add credit deduction for the message sender (the user who triggered the AI response). Use endpoint "room-send" in usage log.
5.6 rate-limit.ts — Plan-based limits
// CHANGED: Accept rate limit as parameter instead of hardcoded
const DEFAULT_RPM = 6;
const DEFAULT_CONCURRENT = 2;
export async function checkRateLimit(
userId: string,
maxPerMinute: number = DEFAULT_RPM,
): Promise<{ error: string; code: string; retryAfter: number } | null> {
// ... same sliding window logic, but use maxPerMinute instead of MAX_PER_MINUTE
}
export async function acquireConcurrency(
userId: string,
maxConcurrent: number = DEFAULT_CONCURRENT,
): Promise<boolean> {
// ... same logic, but use maxConcurrent parameter
}5.7 resolve-provider.ts — Update OFFICIAL_ALLOWED_MODELS
Replace the hardcoded set with a dynamic check against the model_prices table:
// BEFORE: hardcoded set
const OFFICIAL_ALLOWED_MODELS = new Set([...]);
// AFTER: check model_prices table (via cache)
import { getModelPrice } from "./model-price-cache.js";
async function isOfficialModel(modelId: string): Promise<boolean> {
const price = await getModelPrice(modelId);
return price !== null; // If it's in the price table and active, it's official
}5.8 subscription.ts — Updated status endpoint
// AFTER: include credit wallet info
subscriptionRoutes.get("/subscription/status", async (c) => {
const currentUser = c.get("user");
const wallet = await ensureWallet(currentUser.id);
const plan = PLANS[wallet.plan as PlanId];
// ... existing usage query ...
return c.json({
data: {
plan: wallet.plan,
balance: Math.floor(wallet.balance), // integer for display
monthlyCredits: plan.monthlyCredits,
memoryCap: plan.memoryCap,
periodEnd: wallet.periodEnd.toISOString(),
// ... existing fields (mode, hasByokKeys, usage, etc.)
},
});
});5.9 Memory cap enforcement — Prompt builder
In the message route's prompt building section, apply the memory cap from the wallet:
// In messages.ts, during prompt building (before the LLM call):
const wallet = await ensureWallet(currentUser.id);
const memoryCap = wallet.memoryCap; // null = unlimited
// When building the message history, truncate to memoryCap:
// This MUST happen server-side, not client-side.
const maxContextTokens = memoryCap ?? model_context_limit;
// Apply to the history loading / context window calculation6. Security Model
Attack vectors and defenses
6.1 Model access bypass
Attack: User crafts API request with a model they shouldn't have access to (e.g., Free user requests Claude Opus).
Defense: Server validates model against model_prices.min_plan BEFORE resolving the provider. The check happens in the message handler, after auth but before any LLM call. The model ID comes from the request body, but access is validated server-side against the DB. There is no client-side gate to bypass.
Request: POST /api/sessions/:id/messages { model: "anthropic/claude-opus-4.6" }
Server: user.tier = "free"
model_prices.min_plan = "plus"
planMeetsMinimum("free", "plus") → false → 4036.2 Memory cap bypass
Attack: User sends a request hoping to use more context than their plan allows.
Defense: Memory cap is enforced in the prompt builder on the server. The client doesn't control how many tokens are sent to the LLM — the server builds the prompt. The server reads wallet.memoryCap and truncates the message history to fit within the cap. Even if the client sends a maxContext override in the request body, the server ignores it and uses the plan's cap.
User plan: Free (32K cap)
Conversation history: 80K tokens
Server truncates to: 32K tokens (drops oldest messages)
LLM receives: 32K tokens
Credits charged: based on 32K (what was actually sent)6.3 Credit balance race condition (double-spend)
Attack: User opens 5 browser tabs, sends messages simultaneously, hoping to use credits before the balance updates.
Defense: Three layers:
- Concurrency limit (1-2 max per plan) — most tabs get rejected immediately with 429
- Atomic SQL deduction —
UPDATE credit_wallets SET balance = balance - $cost WHERE userId = $id AND balance > -500 RETURNING balance. This is a single atomic statement. PostgreSQL guarantees serialization — two concurrent UPDATEs will execute sequentially, not overlap. - Max overdraft cap (-500 credits) — even if they slip through with 2 concurrent messages, the overdraft is bounded.
6.4 Token count manipulation
Attack: Somehow fake lower token counts to get cheaper deductions.
Defense: Token counts come from the LLM provider response (chunk.usage.promptTokens, chunk.usage.completionTokens), not from the client. The client never sends token counts. The server reads them from the streaming response's done chunk. There is no client-controllable parameter that affects the token count used for billing.
6.5 BYOK mode exploitation
Attack: User pretends to be in BYOK mode to skip credit deduction.
Defense: BYOK status is determined entirely server-side by resolveProviderForModel():
- It checks
user.preferences.preferredProvider(stored in DB, not a request param) - It checks if the user actually has stored API keys in the
apiKeystable - If they DO have keys and prefer "private",
isByok = true - If they DON'T have keys, they fall back to official and
isByok = falseThere is no request parameter to setisByok. It's computed from DB state.
6.6 Plan spoofing
Attack: User claims to be on a higher plan than they are.
Defense: The plan is stored in credit_wallets.plan (database). It's read server-side. The user cannot set their plan via any API request. Plan changes only happen through:
- Subscription payment (Stripe webhook → update DB)
- Admin adjustment
6.7 Replay attacks
Attack: Replay a successful /api/sessions/:id/messages request to get free generations.
Defense: Each message creates a new usage log entry with a unique ID. Credit deduction uses this ID as referenceId. The credit_transactions table has a unique index on (reference_id) WHERE type = 'usage' — attempting to deduct twice with the same reference ID will fail with a unique constraint violation. Additionally, each message is saved to the DB with a unique ID, and the conversation state advances — replaying a request would just create a duplicate message at the same conversation point, which is harmless and still costs credits.
6.8 Direct API access without the client
Attack: User reverse-engineers the API and calls it directly, bypassing client-side UX warnings about credit cost.
Defense: All protections are server-side. The client-side UX (balance display, cost warnings) is purely informational. Even if a user calls the API directly with curl, every check (auth, plan, model access, balance, rate limit, concurrency, deduction) happens on the server. The API is the enforcement layer, not the client.
6.9 Timing attack on period refresh
Attack: User notices that monthly credits are granted lazily (on first request after period expires). They wait until period expires, then send a burst of requests hoping to use old credits + new credits.
Defense: refreshMonthlyCredits() resets balance to monthlyCredits — it doesn't add to existing balance. So the old balance is replaced, not stacked. And the refresh runs synchronously before the balance check, so there's no window where both old and new credits are available.
6.10 Free tier farming via API
Attack: User writes a script to burn through their 1,000 free credits as fast as possible, extracting maximum value.
Defense: Rate limiting (6/min) caps throughput regardless of automation. With 1,000 credits on the cheapest model (3.8 cr/msg), that's ~263 messages. At 6/min, it takes ~44 minutes to exhaust. The user gets exactly what the Free tier promises — nothing more. This isn't an attack; it's just usage.
7. Request Flow (Complete)
POST /api/sessions/:sessionId/messages
│
├─ 1. Auth middleware
│ → Validates session token (Redis cache / DB)
│ → Sets c.user (includes tier)
│ → 401 if unauthorized
│
├─ 2. Extract model from request body
│ → Default: plan's default model
│
├─ 3. Resolve provider
│ → resolveProviderForModel(userId, model, { forceOfficial })
│ → Returns: { provider, isByok, apiKeyTier }
│ → 400 if no key available
│
├─ 4. Determine protection mode
│ → useProtections = !resolved.isByok
│ → If BYOK: skip steps 5-9, jump to 10
│
├─ 5. Suspension check
│ → user.isSuspended → 403 SUSPENDED
│
├─ 6. Model access check ← NEW
│ → validateModelForPlan(user.tier, model)
│ → 403 MODEL_NOT_ALLOWED if plan too low
│
├─ 7. Credit balance check ← NEW
│ → checkBalance(userId)
│ → Includes lazy period refresh
│ → 402 NO_CREDITS if balance ≤ 0
│
├─ 8. Rate limit check ← CHANGED (plan-based)
│ → checkRateLimit(userId, planConfig.rateLimit)
│ → 429 RATE_LIMITED if >6/min
│
├─ 9. Concurrency check ← CHANGED (plan-based)
│ → acquireConcurrency(userId, planConfig.maxConcurrent)
│ → 429 CONCURRENT_LIMIT if at max
│
├─ 10. Build prompt
│ → Apply memory cap server-side ← NEW
│ → Truncate history to wallet.memoryCap
│
├─ 11. Stream generation (SSE)
│ → provider.generateStream(params)
│ → Emit text/reasoning/segment events
│
├─ 12. On "done" chunk:
│ ├─ Parse response, apply effects, persist message (existing)
│ ├─ Log usage to usage_logs (existing)
│ ├─ Calculate credit cost from actual tokens ← NEW
│ │ → calculateCost(model, promptTokens, completionTokens)
│ ├─ Atomic deduct from wallet ← NEW
│ │ → deductCredits(userId, cost, usageLogId, description)
│ └─ Include { credits: { cost, balance } } in done SSE ← NEW
│
├─ 13. On error/abort:
│ ├─ If tokens were generated: deduct for actual tokens ← NEW
│ └─ If no tokens: no deduction
│
└─ 14. Finally:
→ releaseConcurrency(userId) if useProtections8. Client-Side Changes
8.1 Balance display (header)
Always-visible credit counter in the app header:
⚡ 1,247Read from GET /api/subscription/status on app load. Updated locally from the credits.balance field in every done SSE event.
8.2 Per-message cost (after AI responds)
Small indicator on each AI message:
[AI response text...]
⚡ 28Read from credits.cost in the done SSE event.
8.3 Model selector cost indicator
Show approximate cost per message with color-coded tier:
🟢 Gemini 2.5 Flash Lite ~4 /msg
🟢 Grok 4.1 Fast ~7 /msg
🟢 DeepSeek V3.2 ~9 /msg
🟢 Gemini 3.1 Flash Lite ~14 /msg
🟡 Gemini 2.5 Flash ~18 /msg
🟡 Gemini 3 Flash Preview ~29 /msg
🟠 Claude Haiku 4.5 ~56 /msg
🟠 Grok 4.20 ~105 /msg
🟠 Gemini 3.1 Pro Preview ~114 /msg
🔴 Claude Sonnet 4.6 ~167 /msg
🔴 Claude Opus 4.6 ~278 /msgApproximate values based on 48K context, 1500 output. Shown in model picker UI. Color = green/yellow/orange/red dot in the UI.
8.4 Low balance warnings
> 20% remaining: normal
< 20%: yellow: "⚡ 187 — running low"
< 5%: orange: "⚡ 43 — top up to keep playing" [Buy Credits]
= 0: "Out of credits" → show upgrade/addon/BYOK options8.5 Context slider (Plus+ only)
Settings panel for Plus/Pro/Ultra users:
Memory: 48K (recommended)
[======●==========================] model max
Higher memory = smarter AI, but uses more credits per message.Free/Go users see their cap as fixed:
Memory: 32K (Free plan limit)
Upgrade to Plus for unlimited memory →9. Migration Plan
Phase 1: Database + backend (no client changes yet)
- Add
credit_wallets,credit_transactions,model_pricestables - Seed
model_priceswith the 11 models - Create
credit-service.ts,model-price-cache.ts,plan-config.ts - Migrate existing users:
tier = "regular"→ create wallet with plan"free", balance 1000tier = "invited"→ create wallet with plan"go", balance 2000
- Update
user.tiervalues:"regular"→"free","invited"→"go"
Phase 2: Enforce credits in message flow
- Add model access validation to
messages.ts(send, regenerate, continue) - Add credit balance check before generation
- Add credit deduction after stream completion
- Update rate limits from 10/min to 6/min, plan-based concurrency
- Add credit deduction to
studio.ts(playtest),agent.ts,room-messages.ts - Enforce memory cap in prompt builder server-side
- Include
creditsfield in done SSE event
Phase 3: Client UX
- Add balance display to app header
- Show per-message credit cost on AI messages
- Add cost indicators to model selector
- Add low-balance warnings
- Update subscription status page
- Add context slider for Plus+ users
Phase 4: Billing integration (future)
- Stripe checkout for plan upgrades
- Stripe webhooks for plan changes
- Add-on credit packs
- Auto-renewal on period expiry (instead of lazy refresh)
- Monthly credit expiry for plan grants (add-on credits don't expire)
10. Testing Checklist
Critical paths to test
- [ ] Free user can only access Flash Lite, Grok Fast, DeepSeek
- [ ] Free user requesting Sonnet gets 403 MODEL_NOT_ALLOWED
- [ ] Go user can access all models up to Gemini 3.1 Pro Preview
- [ ] Plus user can access all models including Opus
- [ ] Credits deducted correctly for Flash Lite message (~3-7 credits)
- [ ] Credits deducted correctly for Opus message (~100-300 credits)
- [ ] Grok 4.1 Fast tiered pricing: <128K context gets base price, >128K gets 2x
- [ ] Balance goes to 0 → next request blocked with 402
- [ ] Balance at 5 credits, message costs 200 → goes to -195 (allowed, one grace message)
- [ ] Balance at -195 → next request blocked
- [ ] Two concurrent requests: both get correct deduction (no race condition)
- [ ] Three concurrent requests on Free (max 1): third gets 429
- [ ] BYOK user: zero credit deduction, no rate limiting
- [ ] Cancel mid-stream: credits deducted for actual tokens generated
- [ ] Monthly period expires: balance reset to plan's monthly credits
- [ ] Monthly period expires: old balance NOT added to new (reset, not accumulate)
- [ ] Regenerate: full credit deduction (not free)
- [ ] Continue: full credit deduction (not free)
- [ ] Studio playtest: credits deducted
- [ ] Studio agent: credits deducted per iteration
- [ ] Room messages: sender pays
- [ ] Memory cap enforced for Free at 32K (server-side truncation)
- [ ] Memory cap enforced for Go at 64K
- [ ] Plus+ users can use up to model's native context limit
- [ ] Done SSE event includes credits.cost and credits.balance
- [ ] Same usage_log ID cannot be charged twice (unique constraint)
