December 26, 2024:
DeepSeek-V3 Outshines Llama and Qwen Models - Chinese AI startup DeepSeek has unveiled DeepSeek-V3, a 671B parameter open-source model that surpasses Llama-3.1 and Qwen. Employing a mixture-of-experts architecture, the model enhances parameter activation efficiency. Key innovations, such as auxiliary loss-free load-balancing and multi-token prediction, boost performance, resulting in top scores on Chinese and math-centric benchmarks.
Trained economically at $5.57 million, DeepSeek-V3 challenges closed-source models, indicating significant progress in bridging the gap between open and closed-source AI solutions. It is available on GitHub and as an API, providing enterprises with a competitive AI tool.