Azure Multimodal Chat Application Guide
๐ฏ Problem
You need to build a chat application that accepts both text and audio inputs, and uses multimodal generative AI to understand and respond.
โ Solution with Azure
Use multimodal models in Azure AI Foundry, like:
* Microsoft Phi-4-multimodal-instruct
* OpenAI gpt-4o
* OpenAI gpt-4o-mini
These models support text + audio input and can generate intelligent responses.
๐งฉ Componenti richiesti
- โ Azure AI Foundry (portal access)
- โ A deployed multimodal model
- โ Chat Playground (for testing)
- โ Python or .NET SDK (for app development)
- โ Proper formatting of multi-part messages (JSON structure)
๐ ๏ธ Architettura / Sviluppo
๐น Deploy a Multimodal Model
- Go to Azure AI Foundry portal
- Select a model like
gpt-4o
orphi-4-multimodal-instruct
- Deploy the model
- Test in Chat Playground with audio + text prompts:
- Upload audio file
- Combine with text to form a prompt
๐น Structure of Audio-Based Prompt (JSON)
{
"messages": [
{
"role": "system",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": [
{
"type": "text",
"text": "Transcribe this audio:"
},
{
"type": "audio_url",
"audio_url": {
"url": "https://..."
}
}
]
}
]
}
๐จ Alternatively, use base64-encoded binary audio data:
๐น Develop Audio-Enabled Chat App
- Use Python or .NET SDK for:
- Azure AI Model Inference
- OpenAI API
- Your client application should:
- Connect to the model endpoint
- Submit multi-part prompts (text + audio)
- Receive and process the model's response
๐ Prompt Submission Options
- โ Text + Audio URL (hosted audio file)
- โ Text + Base64 binary audio (inline submission)
๐ง Best Practice / Considerazioni
- ๐ Ensure audio files are in supported format (e.g., MP3)
- ๐ If using base64, avoid large files (limits may apply)
- ๐ Secure any URLs and ensure CORS/permissions are handled if using remote files
- ๐งช Test prompts using Chat Playground before coding
โ Domande simulate d'esame
-
Q: What is the correct JSON structure for submitting a multimodal audio prompt? A: A
messages
array with acontent
array including bothtext
andaudio_url
objects. -
Q: Which models in Azure AI Foundry support audio-based prompts? A:
Microsoft Phi-4-multimodal-instruct
,OpenAI gpt-4o
, andOpenAI gpt-4o-mini
. -
Q: How can you submit local audio data directly in a prompt? A: By encoding it in base64 and using a
data:
URL format in theaudio_url
. -
Q: Which tools can be used to test audio prompts before writing application code? A: Azure AI Foundry Chat Playground.