December 26, 2024:
DeepSeek Unveils 671B Parameter AI Model - DeepSeek has open-sourced DeepSeek-V3, a new AI model with 671 billion parameters using a mixture of experts (MoE) architecture. This design reduces hardware needs by activating only the necessary neural network from its multiple 34-billion parameter networks. DeepSeek-V3 outperforms other leading LLMs, achieving top scores in coding, math, and text processing benchmarks.
The model incorporates innovations like multihead latent attention and multitoken prediction for improved performance. Despite challenges in MoE training, DeepSeek developed solutions to ensure consistent output quality. The model is available now on Hugging Face.