18. Mixture-of-Experts
2025-04-13
Mixture-of-Experts: The Basics
- Chapter 18 from GenAI Handbook
- Goal: Learn MoE for efficient LLMs
- Key idea: Use only some parameters per task
- Builds on: Scaling laws, pretraining
How MoE Works
- Unlike dense models (e.g., Llama3), MoE is sparse
- Experts: Specialized sub-networks per layer
- Router: Picks a few experts per input
- Result: Big model, less compute
Mixture of Experts
Why It Matters
- Efficiency: Less memory/time than dense models
- Scale: Grow “knowledge” without slowing down
- Examples: Mixtral (8x7B, 8x22B), maybe GPT-4
18. Mixture-of-Experts Bryan Tegomoh 2025-04-13