References
Primary and implementation sources used by these notes:
DeepSeek-AI. DeepSeek-V3 Technical Report. arXiv:2412.19437.
https://arxiv.org/abs/2412.19437DeepSeek-AI. DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model. arXiv:2405.04434.
https://arxiv.org/abs/2405.04434Dai et al. DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models. arXiv:2401.06066.
https://arxiv.org/abs/2401.06066DeepSeek-AI. DeepSeek-V3 official repository.
https://github.com/deepseek-ai/DeepSeek-V3DeepSeek-AI. DeepSeek-V3 inference model implementation.
https://github.com/deepseek-ai/DeepSeek-V3/blob/main/inference/model.pyDeepSeek-AI. DeepSeek-MoE official repository.
https://github.com/deepseek-ai/DeepSeek-MoE