Performance Architecture Guide

Status: Production Standard

Core Principle

Design for the end state from day one. Late-game performance collapse is preventable.

The Problem

Grand strategy games face compounding performance challenges:

Early game: Few active provinces, minimal history, simple relationships. Fast.

Late game: Many provinces, extensive history, complex webs. Slow - even when PAUSED.

Root causes:

Data accumulation without cleanup
O(n²) algorithms that scale poorly
Memory fragmentation
Cache misses from scattered data
UI touching entire game state every frame

Core Principles

Principle 1: Design for Scale

Wrong: "We'll optimize when it becomes a problem" Right: "Architecture assumes worst-case from day one"

Profile at target scale regularly. Don't wait for problems.

Principle 2: Hot/Cold Data Separation

Data "temperature" = access frequency, not importance.

Hot: Every frame/tick → Compact structs, contiguous arrays Warm: Occasional → Can stay in main struct if space permits Cold: Rare → Separate storage, loaded on-demand

Benefit: Hot data fits in cache. Cold data doesn't pollute it.

Principle 3: Fixed-Size Data Structures

Dynamic growth is the enemy.

Bad: Unbounded lists that grow forever Good: Ring buffers with automatic compression

Result: Bounded memory regardless of game length.

Principle 4: Pre-Allocation (Zero Allocations During Gameplay)

Industry lesson: Malloc lock destroys parallelism.

The problem:

Memory allocator uses global lock
All threads wait for allocation
Parallel code becomes sequential

The solution:

Pre-allocate at initialization
Clear and reuse during gameplay
Zero allocations = zero lock contention = full parallelism

System-Specific Patterns

Map Rendering

Problem: Update every province mesh every frame. Solution: GPU textures + single draw call.

Province Selection

Problem: Physics raycast against thousands of colliders. Solution: Texture lookup - single read, near-instant.

UI/Tooltips

Problem: Expensive calculations every frame. Solution: Frame-coherent caching - compute once, reuse within frame.

History System

Problem: Unbounded event accumulation. Solution: Tiered compression (recent=full, medium=compressed, old=summary).

Game State Updates

Problem: Process all provinces every tick. Solution: Dirty flags - update only what changed.

Memory Layout

Structure by Access Pattern

Array of Structures (AoS): When operations need multiple fields together.

Most simulation operations
Cache line fits entire struct
Default choice for grand strategy

Structure of Arrays (SoA): When iterating single field across all elements.

Rare in practice
Profile before splitting
Don't optimize prematurely

The Real Enemy: Pointers

Pointers scatter data across memory → cache misses.

Bad: References in hot structures Good: Value types only, contiguous layout

Anti-Patterns

Anti-Pattern	Problem	Solution
"It works for now"	O(n) scales poorly	Design for scale
Invisible O(n²)	Hidden quadratic complexity	Pre-compute adjacencies
Death by thousand cuts	Many small allocations	Pre-allocate pools
Allocator.Temp in hot path	Malloc lock	Persistent allocators, reuse
Update everything	Processing unchanged data	Dirty flags
Premature SoA	Splitting data used together	Profile first
Float in simulation	Non-deterministic	Fixed-point math
Interface-typed collections	Boxing allocations	Generic wrapper pattern

Key Trade-offs

Decision	Benefit	Cost
Pre-allocation	Zero runtime allocation	Higher initial memory
Hot/cold split	Cache efficiency	Access complexity
Fixed-size buffers	Bounded memory	May need reallocation if undersized
Dirty flags	Process only changes	Flag management overhead
GPU for visuals	Massive parallelism	GPU programming complexity

Decision Framework

Before adding a collection:

Is this accessed every frame? → Pre-allocate
Is this temporary? → Clear and reuse
Is this in hot path? → Must be persistent
Can this grow unbounded? → Use ring buffer

Before optimizing memory layout:

Is this core simulation state? → Keep compact
Do operations need multiple fields? → Keep together (AoS)
Have you profiled? → Don't optimize without data
Is complexity worth it? → Reconsider if marginal gains

Validation

Regular profiling at target scale:

Frame time budget allocation
Memory usage bounds
Selection response time
Zero allocations during gameplay (profiler confirmed)

Any allocation in hot path = critical bug.

Summary

Compact simulation state - cache-friendly
GPU for visuals - single draw call, compute shaders
Fixed-point math - deterministic
Pre-allocation - zero gameplay allocations
Dirty flags - update only changes
Ring buffers - bounded growth
Profile before optimizing - data-driven decisions

Pattern 4 (Hot/Cold Separation): Data temperature
Pattern 10 (Frame-Coherent Caching): UI optimization
Pattern 11 (Dirty Flags): Update minimization
Pattern 12 (Pre-Allocation): Zero allocations

Architecture prevents late-game collapse. Design for scale from day one.

Table of Contents