Service: Azure AI Foundry โ Multimodal Generative AI ๐ค๐ผ๏ธ
๐งฉ Problem
You need to create a chat-based application that can understand and respond to prompts containing both text and images (vision-enabled interaction).
๐ Solution with Azure
Use Azure AI Foundry to deploy and interact with multimodal generative AI models, such as: * ๐ง Microsoft Phi-4-multimodal-instruct * ๐ OpenAI GPT-4o * โก OpenAI GPT-4o-mini
These models can process text + image inputs and return appropriate, context-aware responses.
๐ ๏ธ Componenti richiesti
- Azure AI Foundry portal for model deployment and testing
- Multimodal model (e.g., GPT-4o)
- Chat playground to test image+text prompts
- API endpoint for submitting multi-part messages
- SDK for Python or .NET (Azure AI Model Inference or OpenAI API)
๐๏ธ Architettura / Sviluppo
- Deploy a multimodal model from the Azure AI Foundry portal
- Test prompts using the built-in chat playground by uploading an image and adding a text prompt
- Develop a client application that:
- Connects to the model endpoint
- Sends prompts as multi-part messages
- Receives and processes the response
Multi-part JSON prompt format:
{
"messages": [
{
"role": "system",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": [
{
"type": "text",
"text": "Describe this picture:"
},
{
"type": "image_url",
"image_url": {
"url": "https://....."
}
}
]
}
]
}
If using local image data:
๐ Best practice / Considerazioni
- โ Use multimodal prompt format (text + image) for richer, context-aware interaction
- ๐ Choose the right model (GPT-4o, Phi-4, etc.) based on performance and availability
- ๐ค For local images, use base64 encoding in data URL format
- โ๏ธ Use SDKs to simplify interaction with REST APIs in production apps
- ๐งช Always validate model output in the chat playground before integrating
๐ Domande simulate d'esame
Q1. Which Azure service allows you to create chat-based apps using text and image input?
A. Azure AI Foundry (with a multimodal model)
Q2. What is the correct JSON structure for sending a prompt with both text and image?
A. A multi-part message containing both type: "text"
and type: "image_url"
items
Q3. You need to develop a vision-enabled chatbot in .NET. Which resource do you use?
A. The Azure AI Model Inference or OpenAI .NET SDK
Q4. Which model types support vision-based chat in Azure AI Foundry?
A. Microsoft Phi-4-multimodal-instruct, OpenAI GPT-4o, and GPT-4o-mini
Q5. What's a required step to submit a local image file in a prompt?
A. Encode the image as base64 and include it in a data URL