Microsoft has introduced a new multimodal AI model within its Phi-3 series, called the Phi-3-Vision SLM. This model can process both text and images, although its responses are limited to text. Designed as a small-scale model, the Phi-3-Vision can process data locally on the device. It features a parameter range of 4.2 billion. During the launch, Microsoft CEO Satya Nadella highlighted the model’s hybrid nature, explaining that it can operate on-device when the necessary hardware is available and fallback to the cloud when it is not. The Phi-3-Vision model is currently available for preview in Azure AI Playground and Azure AI Studio for developers.
Microsoft also announced Team Copilot, an expansion of its AI-powered chatbot for Microsoft 365. Similar to Google’s new Gemini Teammate, which integrates with Workspace apps, Team Copilot works across Microsoft productivity apps such as Teams, Loop, Planner, and more. This AI agent can automate tasks like facilitating meetings, managing agendas, tracking time, and taking notes. It can also appear in chats unprompted to provide important information, track action items, and address unresolved issues. Microsoft stated that Team Copilot will be available for preview later this year with additional features.
Additionally, the Microsoft Edge browser is set to receive an AI-powered video translation feature, capable of translating videos from YouTube, LinkedIn, Reuters, and Coursera in real time. According to a report, the feature currently supports translation from English to Hindi, German, Spanish, Italian, and Russian. It will be available soon, with plans to support more languages and video-streaming platforms in the future.