Performance Architecture Guide
Status: Production Standard
Core Principle
Design for the end state from day one. Late-game performance collapse is preventable.
The Problem
Grand strategy games face compounding performance challenges:
Early game: Few active provinces, minimal history, simple relationships. Fast.
Late game: Many provinces, extensive history, complex webs. Slow - even when PAUSED.
Root causes:
- Data accumulation without cleanup
- O(n²) algorithms that scale poorly
- Memory fragmentation
- Cache misses from scattered data
- UI touching entire game state every frame
Core Principles
Principle 1: Design for Scale
Wrong: "We'll optimize when it becomes a problem" Right: "Architecture assumes worst-case from day one"
Profile at target scale regularly. Don't wait for problems.
Principle 2: Hot/Cold Data Separation
Data "temperature" = access frequency, not importance.
Hot: Every frame/tick → Compact structs, contiguous arrays Warm: Occasional → Can stay in main struct if space permits Cold: Rare → Separate storage, loaded on-demand
Benefit: Hot data fits in cache. Cold data doesn't pollute it.
Principle 3: Fixed-Size Data Structures
Dynamic growth is the enemy.
Bad: Unbounded lists that grow forever Good: Ring buffers with automatic compression
Result: Bounded memory regardless of game length.
Principle 4: Pre-Allocation (Zero Allocations During Gameplay)
Industry lesson: Malloc lock destroys parallelism.
The problem:
- Memory allocator uses global lock
- All threads wait for allocation
- Parallel code becomes sequential
The solution:
- Pre-allocate at initialization
- Clear and reuse during gameplay
- Zero allocations = zero lock contention = full parallelism
System-Specific Patterns
Map Rendering
Problem: Update every province mesh every frame. Solution: GPU textures + single draw call.
Province Selection
Problem: Physics raycast against thousands of colliders. Solution: Texture lookup - single read, near-instant.
UI/Tooltips
Problem: Expensive calculations every frame. Solution: Frame-coherent caching - compute once, reuse within frame.
History System
Problem: Unbounded event accumulation. Solution: Tiered compression (recent=full, medium=compressed, old=summary).
Game State Updates
Problem: Process all provinces every tick. Solution: Dirty flags - update only what changed.
Memory Layout
Structure by Access Pattern
Array of Structures (AoS): When operations need multiple fields together.
- Most simulation operations
- Cache line fits entire struct
- Default choice for grand strategy
Structure of Arrays (SoA): When iterating single field across all elements.
- Rare in practice
- Profile before splitting
- Don't optimize prematurely
The Real Enemy: Pointers
Pointers scatter data across memory → cache misses.
Bad: References in hot structures Good: Value types only, contiguous layout
Anti-Patterns
| Anti-Pattern | Problem | Solution |
|---|---|---|
| "It works for now" | O(n) scales poorly | Design for scale |
| Invisible O(n²) | Hidden quadratic complexity | Pre-compute adjacencies |
| Death by thousand cuts | Many small allocations | Pre-allocate pools |
| Allocator.Temp in hot path | Malloc lock | Persistent allocators, reuse |
| Update everything | Processing unchanged data | Dirty flags |
| Premature SoA | Splitting data used together | Profile first |
| Float in simulation | Non-deterministic | Fixed-point math |
| Interface-typed collections | Boxing allocations | Generic wrapper pattern |
Key Trade-offs
| Decision | Benefit | Cost |
|---|---|---|
| Pre-allocation | Zero runtime allocation | Higher initial memory |
| Hot/cold split | Cache efficiency | Access complexity |
| Fixed-size buffers | Bounded memory | May need reallocation if undersized |
| Dirty flags | Process only changes | Flag management overhead |
| GPU for visuals | Massive parallelism | GPU programming complexity |
Decision Framework
Before adding a collection:
- Is this accessed every frame? → Pre-allocate
- Is this temporary? → Clear and reuse
- Is this in hot path? → Must be persistent
- Can this grow unbounded? → Use ring buffer
Before optimizing memory layout:
- Is this core simulation state? → Keep compact
- Do operations need multiple fields? → Keep together (AoS)
- Have you profiled? → Don't optimize without data
- Is complexity worth it? → Reconsider if marginal gains
Validation
Regular profiling at target scale:
- Frame time budget allocation
- Memory usage bounds
- Selection response time
- Zero allocations during gameplay (profiler confirmed)
Any allocation in hot path = critical bug.
Summary
- Compact simulation state - cache-friendly
- GPU for visuals - single draw call, compute shaders
- Fixed-point math - deterministic
- Pre-allocation - zero gameplay allocations
- Dirty flags - update only changes
- Ring buffers - bounded growth
- Profile before optimizing - data-driven decisions
Related Patterns
- Pattern 4 (Hot/Cold Separation): Data temperature
- Pattern 10 (Frame-Coherent Caching): UI optimization
- Pattern 11 (Dirty Flags): Update minimization
- Pattern 12 (Pre-Allocation): Zero allocations
Architecture prevents late-game collapse. Design for scale from day one.