AI Settings

These settings control which AI model powers your experience and how it behaves. They apply globally across all worlds. You'll find them in Settings > AI Configuration.

Choosing a model

Yumina offers a curated lineup of models across four cost tiers:

Tier	Examples	Notes
Budget	Yumina Free, Gemini 2.5 Flash Lite, Grok 4.1 Fast, DeepSeek V3.2	Good for casual play. Grok 4.1 Fast is the platform default.
Standard	Gemini 2.5 Flash, Gemini 3 Flash, DeepSeek V4 Pro	Better writing quality and instruction following.
Premium	Claude Haiku 4.5, Grok 4.20, Gemini 3.1 Pro	Noticeably better characterization and narrative coherence. Requires Go plan or above.
Ultra	Claude Sonnet 4.6, Claude Opus 4.7	Best writing quality available. Requires Plus plan or above.

Higher tiers cost more credits per response but produce better writing. If you're unsure, start with the default (Grok 4.1 Fast) and experiment from there.

Pinned models

You can pin up to 8 models for quick access in the model picker. Four are pinned by default. Go to Settings > AI Configuration > Your Models to manage pins. Click any pinned model to set it as your default.

Recently used

Models you've used recently appear below your pinned list (if they aren't already pinned). You can pin them from there.

Context size

What it is: How much conversation history the AI can "see" when generating a response, measured in tokens. More context means the AI remembers more of what happened earlier in your session.

Default: 64,000 tokens (Free plan) / 96,000 tokens (Gold) / up to 2M with BYOK.

Recommendation: For most play, 42k-62k is the sweet spot -- enough context for the AI to maintain narrative consistency without unnecessary cost. Going above 96k rarely improves the experience unless you're in a very long session with complex state. The setting is in Settings > AI Configuration > Context Size.

Free plan users have a cap on context size. Upgrading your plan or using BYOK removes the cap.

Creativity (temperature)

The temperature slider controls how random/creative the AI's responses are:

Lower (toward 0.5): More predictable, focused, consistent. Good for strategy games or worlds where precision matters.
Higher (toward 1.5): More creative, varied, surprising. Good for creative writing and exploration.
Default: 1.0 -- balanced for most use cases.

The slider in Settings runs from 0.5 to 1.5. Don't go past 1.3 unless you want the AI to get noticeably more unpredictable.

Response length (max tokens)

Controls the maximum length of a single AI response. Default is 12,000 tokens. Increase for longer, more detailed responses; decrease for snappier, more concise ones. Range: 256 to 32,768.

Reasoning effort

For models that support reasoning (Claude, GPT-5), this controls how much "thinking" the AI does before responding:

Level	Effect
Minimal	Least thinking, fastest responses, lowest cost
Low	Light reasoning (default)
Medium	More careful responses
High	Most thorough, slowest, highest cost

For most roleplay and interactive fiction, Low is fine. Bump it up if the AI is making logical errors or forgetting constraints.

Streaming

When on (default), AI responses appear token by token as they're generated. When off, the full response appears at once after generation completes. Keep this on unless your connection is unstable.

Advanced sampling parameters

Under the Advanced Parameters toggle in AI Configuration:

Parameter	Default	What it does
Top P	1.0	Nucleus sampling -- limits the candidate pool to the top P% of likely tokens. Lower = more focused.
Frequency Penalty	0.0	Reduces word repetition. Try 0.3-0.5 if the AI keeps repeating itself.
Presence Penalty	0.0	Encourages new topics. Try 0.2-0.3 if the AI keeps circling the same ideas.
Top K	0 (off)	Hard limit on candidate tokens. Usually not needed alongside Top P.
Min P	0 (off)	Minimum probability threshold. Smarter alternative to Top K.

Rule of thumb: Adjust temperature first. Only touch these if temperature alone doesn't solve your problem, and change one at a time.

Bring Your Own Key (BYOK)

You can use your own API key instead of Yumina credits. Go to Settings > AI Configuration and switch to Private Key mode.

Supported providers:

Provider	Where to get a key
OpenRouter	openrouter.ai/keys -- one key unlocks hundreds of models
Anthropic	console.anthropic.com
OpenAI	platform.openai.com
Google	aistudio.google.dev
Ollama	ollama.com -- run models locally

Setup:

Switch the provider toggle from Yumina API to Private Key
Select your provider and enter your key
Click verify to test the key

Your key is encrypted at rest (AES-256-GCM). The raw key is never returned from the server after storage -- only metadata (provider, label, masked suffix).

With BYOK, you have no context size cap and access to whatever models your provider offers. Costs go directly to your API provider instead of Yumina credits.

Custom prompts

An advanced feature for tuning AI behavior across all worlds. Found in Settings > AI Configuration at the bottom.

You can inject your own prompts at three positions:

System -- into the system prompt (strongest effect)
In-Chat -- into the middle of the chat history
Final -- at the very end, right before the AI responds

Use this if the AI consistently misbehaves in a specific way (always forgetting a rule, always responding in the wrong language, etc.). Most players won't need this.

Prompt presets

Every world's creator sets up default prompt presets. You can choose:

Use Creator's -- use what the creator intended (recommended)
Use My Own -- override with your own configuration

Unless you understand the prompt architecture, leave this on Creator's. Changing presets can break worlds in subtle ways.

AI Settings ​

Choosing a model ​

Pinned models ​

Recently used ​

Context size ​

Creativity (temperature) ​

Response length (max tokens) ​

Reasoning effort ​

Streaming ​

Advanced sampling parameters ​

Bring Your Own Key (BYOK) ​

Custom prompts ​

Prompt presets ​