Skip to main content
Technical Deep Dive

How We Built an AI Dungeon Master with Claude: Architecture Deep-Dive

How Scrollbook uses Claude's 200K context window, prompt caching (90% cost savings), and tool use to run live D&D sessions inside Discord. Full technical breakdown.

Cipher Team
January 8, 2025
9 min read

Building an AI Dungeon Master isn't just about hooking up an LLM to a Discord bot. It requires careful engineering around context management, prompt caching, real-time responsiveness, and cost optimization. Here's how we did it.

Why Claude Sonnet 4.5?

When we started Cipher, we evaluated multiple AI providers:

  • OpenAI GPT-4: Excellent, but expensive for long-context campaigns
  • Google Gemini: Good context window, but less consistent personalities
  • AWS Bedrock: Convenient for deployment, but limited features
  • Anthropic Claude: 200K context, prompt caching, and tool use

We chose Claude Sonnet 4.5 for three critical features:

1. 200K Token Context Window

D&D campaigns are long. A single session can generate 10K-20K tokens. Over 10 sessions, that's 100K-200K tokens of context. Claude's massive context window means we can include:

  • Entire campaign history
  • All NPC interactions and personalities
  • Character backstories and progression
  • World state and lore
  • House rules and homebrew content

Without truncating or summarizing. The AI genuinely remembers everything.

2. Prompt Caching (90% Cost Savings)

This feature is a game-changer. Here's how it works:

Traditional AI calls:

text
Every request = Full context + new prompt
Cost = $3 per million input tokens

With prompt caching:

text
First request = Full context + new prompt (cached)
Subsequent requests = Cache reference + new prompt
Cost = $0.30 per million cached tokens (10x cheaper!)

For a 50K token campaign context, caching saves us ~$0.15 per request. Multiply that across thousands of interactions per session, and it's the difference between sustainable pricing and bankruptcy.

3. Tool Use (Function Calling)

Claude can invoke functions to:

  • Roll dice and calculate modifiers
  • Look up spells and monster stat blocks
  • Update character sheet values
  • Track initiative and combat state
  • Query campaign database

This hybrid approach (AI + deterministic tools) gives us the best of both worlds: creative storytelling with mechanical accuracy.

Architecture Overview

Here's our high-level architecture:

text
┌─────────────┐
│   Discord   │
│   Players   │
└──────┬──────┘
       │
       ↓
┌─────────────────────┐
│   Discord Bot       │
│   (Python/Discord.py)│
└──────┬──────────────┘
       │
       ↓
┌────────────────────────────────┐
│   Cipher Context Service       │
│   - Assembles campaign context  │
│   - Manages prompt caching      │
│   - Handles tool calls          │
└──────┬─────────────────────────┘
       │
       ↓
┌──────────────────┐      ┌──────────────┐
│  Claude API      │◄────►│  PostgreSQL  │
│  (Sonnet 4.5)    │      │  + pgvector  │
└──────────────────┘      └──────────────┘
       │
       ↓
┌──────────────────────┐
│   Response Handler   │
│   - Formats output   │
│   - Updates game state│
│   - Logs interactions │
└──────────────────────┘

Context Building: The Heart of Cipher

The hardest problem isn't calling the API - it's what context to send. Here's our approach:

Layer 1: System Prompt (Cached)

python
system_prompt = f"""
You are an expert Dungeon Master for D&D 5e.

CAMPAIGN: {campaign.name}
ERA: {campaign.era}
TONE: {campaign.tone}

HOUSE RULES:
{campaign.house_rules}

Your responsibilities:
- Narrate scenes with vivid descriptions
- Voice NPCs with distinct personalities
- Apply D&D 5e rules accurately
- Track combat state and initiative
- Adapt to player choices
- Use tools for mechanical tasks
"""

Caching: This changes rarely (only on campaign settings updates), so it stays cached for days.

Layer 2: World State (Cached)

python
world_context = f"""
LOCATIONS:
{serialize_locations(campaign.locations)}

NPCS:
{serialize_npcs(campaign.npcs)}

FACTIONS:
{serialize_factions(campaign.factions)}

ACTIVE QUESTS:
{serialize_quests(campaign.quests)}
"""

Caching: Updated between sessions, cached during sessions.

Layer 3: Character Sheets (Partially Cached)

python
party_context = f"""
PARTY COMPOSITION:
{serialize_characters(session.characters)}
"""

Caching: Character basics cached, current HP/resources updated each turn.

Layer 4: Session History (Cached)

python
history_context = get_recent_messages(
    session_id=session.id,
    limit=50,  # Last 50 messages
    include_summaries=True  # Summaries of older sessions
)

Caching: Recent messages cached, only latest player input is new.

Layer 5: Current Turn (Not Cached)

python
current_input = f"""
CURRENT SITUATION:
{combat_state if in_combat else world_state}

PLAYER ACTION:
{player_message}
"""

Not cached: This is the new content that changes every request.

Total Context Size

Typical breakdown:

  • System prompt: 2K tokens (cached)
  • World state: 10K tokens (cached)
  • Character sheets: 5K tokens (cached)
  • Session history: 30K tokens (cached)
  • Current turn: 1K tokens (new)

Result: 48K cached tokens + 1K new tokens = 49K total

Cost per request:

  • Without caching: $0.147
  • With caching: $0.0174
  • Savings: 88%

Real-Time Responsiveness

Discord users expect fast responses. Our optimization strategy:

1. Streaming Responses

We use Claude's streaming API to start sending responses before completion:

python
async for chunk in claude_client.messages.stream(...):
    await discord_channel.send(chunk.content)

Players see Cipher "thinking" in real-time, just like reading a DM's narration.

2. Parallel Processing

For multi-part responses (narration + dice rolls + state updates), we parallelize:

python
async def process_turn(player_action):
    # Run these concurrently
    narration, dice_results, state_updates = await asyncio.gather(
        claude_generate_narration(player_action),
        roll_dice_tools(player_action),
        update_game_state(player_action)
    )

    return combine_responses(narration, dice_results, state_updates)

3. Prefetch Context

When a session starts, we build and cache the context BEFORE players begin:

python
@bot.event
async def on_session_start(session_id):
    # Warm up the cache
    await build_session_context(session_id)
    # This takes 2-3 seconds, but happens BEFORE gameplay

Result: First player action responds in ~2 seconds instead of ~5 seconds.

Tool Use: Hybrid AI + Deterministic Functions

Claude can call tools, but we carefully designed which operations stay deterministic:

AI Handles:

  • Creative narration
  • NPC dialogue and reactions
  • Plot adaptation
  • Rule interpretation (ambiguous cases)

Tools Handle:

  • Dice rolling (RNG must be provably fair)
  • Stat calculations (must be mathematically correct)
  • Database queries (direct DB access faster than AI)
  • Combat math (accuracy critical)

Example tool definition:

python
tools = [
    {
        "name": "roll_dice",
        "description": "Roll dice using standard notation",
        "input_schema": {
            "type": "object",
            "properties": {
                "notation": {"type": "string", "description": "e.g., '2d20+5'"},
                "advantage": {"type": "boolean", "default": False},
                "disadvantage": {"type": "boolean", "default": False}
            },
            "required": ["notation"]
        }
    },
    # ... more tools
]

When Claude says:

"You swing your sword at the goblin. Let me roll your attack..."

Claude invokes:

python
tool_use = {
    "tool": "roll_dice",
    "input": {
        "notation": "1d20+5",
        "advantage": False
    }
}

We execute the tool, return results, and Claude continues:

"...you rolled a 17! That hits the goblin's AC."

Cost Optimization Strategies

Running AI for every session requires careful cost management:

1. Prompt Caching (88% savings)

Already covered above.

2. Context Pruning

We summarize old sessions instead of including full transcripts:

python
if session_age > 5_sessions:
    # Replace full transcript with AI-generated summary
    context = get_session_summary(session_id)
else:
    # Include full conversation
    context = get_full_history(session_id)

3. Batching Operations

Instead of calling Claude for every dice roll narration, we batch:

python
# Bad: 5 API calls
for attack in attacks:
    narrate(attack)

# Good: 1 API call
narrate_all(attacks)

4. Smart Context Invalidation

We only regenerate context when necessary:

python
@cache(ttl=300)  # Cache for 5 minutes
async def get_campaign_context(campaign_id):
    # Expensive DB queries
    return build_context(campaign_id)

During combat (many rapid turns), context stays cached.

5. Usage Tracking

We track AI usage down to the token level and expose it to users:

python
async def log_ai_usage(session_id, tokens_used, cost):
    usage = AIUsage(
        session_id=session_id,
        tokens_input=tokens_used['input'],
        tokens_output=tokens_used['output'],
        tokens_cached=tokens_used['cached'],
        cost_usd=cost,
        timestamp=datetime.utcnow()
    )
    await db.save(usage)

Users can see exactly how many "AI hours" they've used and when.

Challenges We Solved

Challenge #1: Context Ordering

Order matters! We learned (through trial and error) that:

python
# Bad: AI focuses too much on recent history
[system_prompt, history, world_state, current_input]

# Good: AI balances all context
[system_prompt, world_state, history, current_input]

Putting world state before history helps Claude remember NPC names and locations.

Challenge #2: Combat Latency

Combat requires many rapid-fire calls. Solution: pre-generate common responses:

python
# Pre-cache common combat narrations
combat_templates = {
    "hit_narration": [...],
    "miss_narration": [...],
    "critical_hit": [...]
}

For repetitive actions (basic attack), we template instead of regenerating.

Challenge #3: Hallucinated Rules

Claude occasionally invents D&D rules. Solution: grounding with tool use:

python
async def verify_rule(rule_claim):
    # Check against SRD database
    srd_rule = await db.query_srd(rule_claim)

    if not srd_rule:
        # Claude hallucinated - correct it
        await claude_send_correction(srd_rule)

Challenge #4: Character Voice Consistency

NPCs should sound the same across sessions. Solution: detailed NPC profiles:

python
npc_context = f"""
NPC: Mayor Elara Brightwood
Voice: Warm, maternal, speaks in complete sentences
Quirks: Ends statements with "wouldn't you say?"
Mood: Currently worried about bandit attacks
History with party: Grateful for rescue of village
"""

What's Next?

We're constantly improving the AI integration:

Q1 2025

  • Voice channel integration (speech-to-text + text-to-speech)
  • Multi-lingual support (non-English campaigns)
  • Image generation (character portraits, maps)

Q2 2025

  • Emotion analysis (detect player frustration/excitement)
  • Dynamic difficulty (AI adjusts encounter CR based on engagement)
  • Cross-campaign learning (AI learns DM preferences over time)

Try It Yourself

Want to see Claude in action? Start a free campaign and run a test session. The first 3 AI hours are free.

Start Free Campaign →

Interested in the technical details? Join our Discord where we discuss architecture, AI techniques, and infrastructure.


Technical FAQs

Q: What's your average Claude API latency? A: ~2 seconds for streaming first token, ~5-8 seconds for full response (depending on context size).

Q: How do you handle Claude API outages? A: We have AWS Bedrock as a fallback (though it lacks prompt caching and tool use).

Q: What's your average cost per 2-hour session? A: ~$0.20-0.40 depending on combat frequency (we charge $2.50-3.00, so ~85-90% margins).

Q: Do you fine-tune Claude? A: No, we use prompt engineering + retrieval augmented generation (RAG) for customization.

Q: How do you prevent prompt injection? A: We sanitize user input, use strict tool schemas, and have content moderation filters.

Q: Is the code open source? A: Selected components are open source on GitHub. Core IP (context building, caching strategy) is proprietary.

technicalaiclaudeengineeringarchitectureai dungeon master

About the Author

Cipher Team

Part of the Cipher team building AI-powered tools for D&D campaign management.

Ready to Try Cipher?

Start with 3 free AI hours - no credit card required

Start Free Campaign
How We Built an AI Dungeon Master with Claude: Architecture Deep-Dive | Scrollbook