Microsoft unveils Phi-3, the next generation of its small language models
Sciences et technologies

Microsoft unveils Phi-3, the next generation of its small language models

Large language models (LLMs) have impressive capabilities in a variety of areas, but small language models (SLMs) are an attractive alternative for businesses that can cost-effectively use them for specific tasks. Microsoft, which introduced the Phi-1 SLM in June 2023, introduced the Phi-3 family of open models on April 23. The smallest of them, Phi-3 mini, is already available, has 3.8 billion parameters and, thanks to its small size, can be deployed locally on a phone or computer.

Microsoft introduces Phi-3 models as “the most high-performance and cost-effective models for small languages.”

The Phi-3 Mini is a dense decoder transducer model tuned using Supervised Fine Tuning (SFT) and Direct Preference Optimization (DPO) to ensure compliance with human preferences and safety guidelines. It is available in Azure AI Studio, Hugging Face and Ollama.

It trained for seven days on 512 NVIDIA H100 Tensor Core GPUs. NVIDIA also told us that it was available to try on ai.nvidia.com, where it would be packaged as NVIDIA NIM. “a microservice with a standard application programming interface that can be deployed anywhere”.

In their technical report, the researchers explain that “The innovation lies entirely in our training data set, an enhanced version of what is used for PHI-2, consisting of highly filtered web data and synthetic data.“.

The model, trained on 3.3 trillion tokens, was also consistent in terms of reliability, security and chat format. Its popup, which can range in size from 4,000 to 128,000 tokens, allows it to ingest and analyze large text content (documents, web pages, code, etc.). According to Microsoft, Phi-3-mini exhibits strong reasoning and logical abilities, making it a good candidate for analytical tasks.

Consistent performance despite small size

Microsoft shared on its blog the performance of the Phi-3 mini, as well as the performance of the Phi-3-small (7B) and Phi-3-medium (14B), which will soon be available and trained on 4.8 trillion tokens.

The performance of the Phi-3 models was compared with that of Phi-2, Mistral-7b, Gemma-7B, Llama-3-instruct-8b, Mixtral-8x7b, GPT-3.5 Turbo and Claude-3 Sonnet. All reported figures are produced using the same pipeline, so they are effectively comparable.

Phi-3-mini outperforms Gemma-7B and Mistral-7B in some tests such as MMLU, while the significantly more efficient Phi-3-small and Phi-3-medium outperform much larger models including GPT-3.5 Turbo . However, due to their small size, Phi-3 models are less competitive for factual knowledge-oriented tasks, such as those assessed in TriviaQA.

However, their capabilities in many other areas make them especially useful in scenarios where model size and available resources are critical factors, such as resource-constrained environments or applications that require fast response times.

Hi, I’m laayouni2023