December 13, 2024:
Microsofts Phi-4 Model Excels with Synthetic Training - Microsoft introduced Phi-4, a compact language model focused on math problem-solving, primarily trained on synthetic data. It features an advanced tokenizer and attention mechanism, handling up to 4,000 tokens. Phi-4's training included 50 synthetic datasets totaling 400 billion tokens from adapted web content and code snippets.
This novel training method enabled Phi-4 to surpass larger models like GPT-4o and Llama 3.3 in benchmarks such as GPQA and MATH, highlighting synthetic data's role in improving reasoning abilities. Phi-4 is accessible via Azure AI Foundry, with a planned release on Hugging Face.