💻AI Training Process

How We Trained 20 Unique Fighting AIs

Creating autonomous AI fighters that deliver exciting, unpredictable battles required a sophisticated multi-stage training pipeline combining supervised learning, reinforcement learning, and evolutionary algorithms.


🎯 Training Objectives

Our goal was to create AI fighters that:

  1. Fight intelligently - Make strategic decisions based on game state

  2. Show personality - Each fighter has a unique combat style

  3. Adapt dynamically - Learn opponent patterns during combat

  4. Create entertainment - Produce exciting, varied battles


📚 Training Pipeline Overview

┌──────────────────────────────────────────────────────────────┐
│                    STAGE 1: Data Collection                  │
│  → 50,000+ simulated fights                                  │
│  → Expert gameplay recordings                                │
│  → Combat scenario library                                   │
└──────────────────────────┬───────────────────────────────────┘

┌──────────────────────────────────────────────────────────────┐
│                STAGE 2: Supervised Pre-training              │
│  → Train base combat network                                 │
│  → Learn fundamental mechanics                               │
│  → 10,000 epochs on labeled data                             │
└──────────────────────────┬───────────────────────────────────┘

┌──────────────────────────────────────────────────────────────┐
│              STAGE 3: Reinforcement Learning                 │
│  → Self-play against trained agents                          │
│  → Reward shaping for combat effectiveness                   │
│  → 100,000+ training iterations                              │
└──────────────────────────┬───────────────────────────────────┘

┌──────────────────────────────────────────────────────────────┐
│            STAGE 4: Personality Specialization               │
│  → Evolutionary algorithms for diversity                     │
│  → Fine-tune each fighter's behavior                         │
│  → 20 unique combat strategies                               │
└──────────────────────────┬───────────────────────────────────┘

┌──────────────────────────────────────────────────────────────┐
│                STAGE 5: Tournament Testing                   │
│  → Round-robin evaluation                                    │
│  → Balance adjustments                                       │
│  → Performance optimization                                  │
└──────────────────────────────────────────────────────────────┘

🔬 Stage 1: Data Collection

Simulation Framework

We built a high-speed simulator capable of running 1000+ fights per hour with accelerated game logic (10x speed mode).

Data Collected:

  • State vectors: Position, velocity, health, orientation (every frame)

  • Action sequences: Button inputs (A/D/W/F) with timestamps

  • Outcomes: Win/loss, damage dealt, survival time

  • Strategic patterns: Distance management, attack timing, dodge success rate

Expert Demonstrations

  • 500+ manually played fights by skilled players

  • Labeled "optimal actions" for training scenarios

  • Edge case handling (wall collisions, simultaneous hits)

Scenario Library

Created 1000+ unique combat scenarios:

  • Close-range brawls

  • Long-range positioning

  • Low-health survival situations

  • Aggressive rushdown vs defensive play

  • Counter-attack opportunities

Total Dataset Size: 2.3TB of fight data


🧠 Stage 2: Supervised Pre-Training

Base Neural Network Architecture

Strategic Network (Transformer):

Tactical Network (Feedforward):

Training Process

Phase 1: Imitation Learning

  • Learn from expert demonstrations

  • Supervised learning on labeled fight data

  • Goal: Achieve 80%+ action prediction accuracy

Phase 2: Behavior Cloning

  • Clone successful fighting patterns

  • Train on high-win-rate combat sequences

  • Regularization to prevent overfitting

Results:

  • Base network achieved 85% win rate vs random agent

  • Demonstrated understanding of fundamental mechanics

  • Ready for reinforcement learning phase


🎮 Stage 3: Reinforcement Learning

Self-Play Training Loop

We used Proximal Policy Optimization (PPO) for stable training:

Reward Function Design

The reward function is critical for learning effective combat:

Training Infrastructure

Hardware:

  • 4x NVIDIA A100 GPUs (80GB VRAM each)

  • 128 CPU cores for parallel simulation

  • 512GB RAM for experience buffer

Training Stats:

  • Total training time: 14 days

  • Total fights simulated: 2.8 million

  • Experience buffer size: 1M transitions

  • Policy updates: 450,000 iterations

Learning Curves:

Opponent Modeling

During training, each AI learned to:

  1. Track opponent patterns: Attack frequency, movement tendencies

  2. Predict next actions: Anticipate attacks based on distance/stance

  3. Exploit weaknesses: Adapt strategy mid-fight

  4. Counter-adapt: Respond when opponent changes strategy


🧬 Stage 4: Personality Specialization

To create 20 unique fighters, we used evolutionary algorithms to diversify behaviour.

Genetic Algorithm for Diversity

Genome Encoding:

Evolution Process

  1. Initial Population: 100 random DNA variations

  2. Fitness Evaluation: Each variant fights 50 tournaments

  3. Selection: Top 20% by entertainment value (not just win rate!)

  4. Crossover: Combine traits from successful fighters

  5. Mutation: 10% random trait variation

  6. Repeat: 20 generations

Fitness Function (maximizes entertainment):

Final Fighter Roster

After evolution, we selected 20 fighters with distinct personalities:

Aggressive Types:

  • Morpheus (Aggression: 0.85) - Relentless pressure, high combo focus

  • Saint (Aggression: 0.78) - Calculated aggression, punish mistakes

Defensive Types:

  • GhostHash (Defensive: 0.82) - Elusive movement, counter-attack

  • TheMiner (Defensive: 0.75) - Patient, wall positioning

Balanced Types:

  • NeoNode (Balanced) - Adaptive, reads opponent

  • CipherKid (Balanced) - Technical, frame-perfect execution

Specialist Types:

  • BitSamurai (Combo: 0.91) - Chain attacks, high damage

  • DarkWallet (Risk: 0.88) - High-risk/high-reward plays


🔧 Stage 5: Fine-Tuning & Optimization

Balance Adjustments

We ran extensive tournament testing to ensure:

  1. No dominant strategy: All 20 fighters have 45-55% overall win rate

  2. Rock-paper-scissors dynamics: Counter-matchups exist

  3. Skill expression: Better AI wins more consistently

Performance Optimization

Model Compression:

  • Pruned tactical network: 784-256-128-64-5 → 512-128-32-5

  • Quantization: FP32 → FP16 (50% size reduction)

  • Knowledge distillation: Compress strategic model

  • Result: 3x faster inference, 95% accuracy retained

Inference Optimization:

  • TensorRT compilation for GPU inference

  • ONNX export for cross-platform compatibility

  • Batch processing for multi-fight simulations

  • Result: 8ms decision latency (down from 45ms)


📊 Training Results & Validation

Performance Metrics

Against Random AI:

  • Win rate: 98.7%

  • Average fight duration: 12.3s

  • Damage efficiency: 11.2 hits to kill

Against Each Other (Round-Robin):

  • Most balanced fighter: Symbol (49.8% win rate)

  • Most aggressive: Morpheus (52.1% win rate)

  • Most defensive: GhostHash (47.9% win rate)

  • Most entertaining: BitSamurai (highest action variety)

Validation Tests

Robustness:

  • ✅ Handles edge cases (corner traps, simultaneous hits)

  • ✅ Recovers from disadvantage (low HP comebacks)

  • ✅ Adapts to opponent changes mid-fight

Entertainment Value:

  • ✅ Average fight duration: 67 seconds (target: 45-90s)

  • ✅ Action variety score: 8.7/10

  • ✅ Comeback rate: 23% of fights decided in final 20%


🚀 Future Improvements

Planned Enhancements

  1. Meta-Learning: AI learns from past tournament results

  2. Human Feedback: Incorporate viewer preferences

  3. Seasonal Updates: Retrain with new strategies

  4. Community Champions: User-submitted AI variants

Research Directions

  • Multi-Agent Learning: Train fighters against full roster simultaneously

  • Curriculum Learning: Progressive difficulty in training scenarios

  • Hierarchical RL: More sophisticated strategy layers

  • Transfer Learning: Apply to other game genres

Last updated