Sparse Data Structures

Status: Production Standard

Core Problem

Dense arrays scale with POSSIBLE items, not ACTUAL items.

When mods multiply item types (30 buildings → 500 buildings), dense arrays waste memory and iteration time checking empty slots.

Industry example: HOI4's equipment system became 16x slower with popular mods because every province checked all 500 equipment slots, even when most were empty.

Core Principle

Store only what exists. Iterate only over actual items.

Use sparse structures when:

Base game has N items, mods can add 5-10x more
Most entities don't have most items
Memory should scale with usage, not possibility

Use dense structures when:

Data is ALWAYS present (owner, terrain)
Fixed at engine level, mods don't multiply
Every entity has every field

Sparse vs Dense Decision

Data Type	Most Entities Have	Mod Multiplier	Pattern
Owner/Terrain	ALL	N/A	Dense (always present)
Buildings	0-5 of 500 types	5-10x	Sparse
Modifiers	0-3 of 1000 types	10-20x	Sparse
Trade Goods	1-2 of 100 types	2-3x	Sparse
Development	ALL	N/A	Dense (always present)

Rule: If mods multiply count AND most entities don't have most items → Sparse

Architecture: Three Layers

Layer 1: Definitions (Static)

Type definitions loaded at startup, never modified during gameplay.

Loaded from data files
Each definition gets unique numeric ID
Mods add definitions, never remove

Layer 2: Instance Storage (Sparse)

Which entities have which items. Pre-allocated at initialization.

MultiHashMap pattern: entity ID → item IDs
Memory scales with actual items, not possible items
Zero allocations during gameplay

Layer 3: Query API

Fast lookups and iteration over sparse data.

Existence check: "Does province X have building Y?"
Get all: "What buildings does province X have?"
Filter: "Which provinces have building Y?"

Storage Pattern: MultiHashMap

Why MultiHashMap:

One key (entity ID) → multiple values (item IDs)
Burst-compatible native container
Fast iteration over actual items only
Memory scales with usage

Access patterns:

O(m) existence check where m = items per entity (typically <10)
O(m) iteration over entity's items
O(n) filter queries (cacheable per frame)

Memory Comparison

Dense approach (naive):

10k provinces × 500 building types × 1 byte = 5 MB
Iteration: 5 million checks, 99% wasted

Sparse approach:

10k provinces × 5 avg buildings × 4 bytes = 200 KB
Iteration: Only actual buildings

Savings: 96% memory reduction, iteration proportional to content

Pre-Allocation Strategy

At initialization:

Estimate capacity: entities × expected items × safety margin
Pre-allocate MultiHashMap with fixed capacity
Warn if approaching 80% capacity
Zero allocations during gameplay

Gameplay adds/removes entries without allocation (reuses pre-allocated capacity).

Mod Compatibility

Dynamic ID Assignment

Base game loads definitions first (IDs 1-30)
Mods load after (IDs 31-80, 81-150, etc.)
Sequential assignment during loading

Save Compatibility

Save files store string ID + data
Load looks up definition by string ID
Graceful degradation if mod missing

Key Trade-offs

Aspect	Dense	Sparse
Memory	Scales with capacity	Scales with usage
Lookup	O(1) array index	O(m) hash lookup
Iteration	Checks all slots	Only actual items
Mod scaling	Gets worse	Unchanged
Complexity	Simple	Hash table overhead

When NOT to Use Sparse

Data every entity has (owner, terrain, development)
Fixed types that mods don't multiply
Performance-critical lookups needing O(1) access
Simple presence/absence (use bitfield instead)

Anti-Patterns

Dense arrays for sparse data: Wastes memory, iterates empty slots, scales poorly with mods.

Dictionary per entity: Allocations everywhere, not Burst compatible.

Linear search through flat list: O(n) for existence check.

Success Criteria

Mods don't degrade performance - 30→500 types: no slowdown
Memory scales with content - Not with possible combinations
Zero gameplay allocations - All storage pre-allocated
Simple query API - Has/Get/Filter cover all cases

Pattern 8 (Sparse Collections): Core pattern
Pattern 12 (Pre-Allocation): Zero-allocation requirement
Pattern 10 (Frame-Coherent Caching): Cache expensive filter queries

Store what exists. Iterate what's present. Scale with content, not possibility.

Table of Contents