AI Settings
These settings control which AI model powers your experience and how it behaves. They apply globally across all worlds. You'll find them in Settings > AI Configuration.
Choosing a model
Yumina offers a curated lineup of models across four cost tiers:
| Tier | Examples | Notes |
|---|---|---|
| Budget | Yumina Free, Gemini 2.5 Flash Lite, Grok 4.1 Fast, DeepSeek V3.2 | Good for casual play. Grok 4.1 Fast is the platform default. |
| Standard | Gemini 2.5 Flash, Gemini 3 Flash, DeepSeek V4 Pro | Better writing quality and instruction following. |
| Premium | Claude Haiku 4.5, Grok 4.20, Gemini 3.1 Pro | Noticeably better characterization and narrative coherence. Requires Go plan or above. |
| Ultra | Claude Sonnet 4.6, Claude Opus 4.7 | Best writing quality available. Requires Plus plan or above. |
Higher tiers cost more credits per response but produce better writing. If you're unsure, start with the default (Grok 4.1 Fast) and experiment from there.
Pinned models
You can pin up to 8 models for quick access in the model picker. Four are pinned by default. Go to Settings > AI Configuration > Your Models to manage pins. Click any pinned model to set it as your default.
Recently used
Models you've used recently appear below your pinned list (if they aren't already pinned). You can pin them from there.
Context size
What it is: How much conversation history the AI can "see" when generating a response, measured in tokens. More context means the AI remembers more of what happened earlier in your session.
Default: 64,000 tokens (Free plan) / 96,000 tokens (Gold) / up to 2M with BYOK.
Recommendation: For most play, 42k-62k is the sweet spot -- enough context for the AI to maintain narrative consistency without unnecessary cost. Going above 96k rarely improves the experience unless you're in a very long session with complex state. The setting is in Settings > AI Configuration > Context Size.
Free plan users have a cap on context size. Upgrading your plan or using BYOK removes the cap.
Creativity (temperature)
The temperature slider controls how random/creative the AI's responses are:
- Lower (toward 0.5): More predictable, focused, consistent. Good for strategy games or worlds where precision matters.
- Higher (toward 1.5): More creative, varied, surprising. Good for creative writing and exploration.
- Default: 1.0 -- balanced for most use cases.
The slider in Settings runs from 0.5 to 1.5. Don't go past 1.3 unless you want the AI to get noticeably more unpredictable.
Response length (max tokens)
Controls the maximum length of a single AI response. Default is 12,000 tokens. Increase for longer, more detailed responses; decrease for snappier, more concise ones. Range: 256 to 32,768.
Reasoning effort
For models that support reasoning (Claude, GPT-5), this controls how much "thinking" the AI does before responding:
| Level | Effect |
|---|---|
| Minimal | Least thinking, fastest responses, lowest cost |
| Low | Light reasoning (default) |
| Medium | More careful responses |
| High | Most thorough, slowest, highest cost |
For most roleplay and interactive fiction, Low is fine. Bump it up if the AI is making logical errors or forgetting constraints.
Streaming
When on (default), AI responses appear token by token as they're generated. When off, the full response appears at once after generation completes. Keep this on unless your connection is unstable.
Advanced sampling parameters
Under the Advanced Parameters toggle in AI Configuration:
| Parameter | Default | What it does |
|---|---|---|
| Top P | 1.0 | Nucleus sampling -- limits the candidate pool to the top P% of likely tokens. Lower = more focused. |
| Frequency Penalty | 0.0 | Reduces word repetition. Try 0.3-0.5 if the AI keeps repeating itself. |
| Presence Penalty | 0.0 | Encourages new topics. Try 0.2-0.3 if the AI keeps circling the same ideas. |
| Top K | 0 (off) | Hard limit on candidate tokens. Usually not needed alongside Top P. |
| Min P | 0 (off) | Minimum probability threshold. Smarter alternative to Top K. |
Rule of thumb: Adjust temperature first. Only touch these if temperature alone doesn't solve your problem, and change one at a time.
Bring Your Own Key (BYOK)
You can use your own API key instead of Yumina credits. Go to Settings > AI Configuration and switch to Private Key mode.
Supported providers:
| Provider | Where to get a key |
|---|---|
| OpenRouter | openrouter.ai/keys -- one key unlocks hundreds of models |
| Anthropic | console.anthropic.com |
| OpenAI | platform.openai.com |
| aistudio.google.dev | |
| Ollama | ollama.com -- run models locally |
Setup:
- Switch the provider toggle from Yumina API to Private Key
- Select your provider and enter your key
- Click verify to test the key
Your key is encrypted at rest (AES-256-GCM). The raw key is never returned from the server after storage -- only metadata (provider, label, masked suffix).
With BYOK, you have no context size cap and access to whatever models your provider offers. Costs go directly to your API provider instead of Yumina credits.
Custom prompts
An advanced feature for tuning AI behavior across all worlds. Found in Settings > AI Configuration at the bottom.
You can inject your own prompts at three positions:
- System -- into the system prompt (strongest effect)
- In-Chat -- into the middle of the chat history
- Final -- at the very end, right before the AI responds
Use this if the AI consistently misbehaves in a specific way (always forgetting a rule, always responding in the wrong language, etc.). Most players won't need this.
Prompt presets
Every world's creator sets up default prompt presets. You can choose:
- Use Creator's -- use what the creator intended (recommended)
- Use My Own -- override with your own configuration
Unless you understand the prompt architecture, leave this on Creator's. Changing presets can break worlds in subtle ways.
