Azure AI Foundry Exercise Summary
โ Exercise Overview: Audio-Enabled Chat App with Azure AI Foundry
๐ Objective
Build a simple AI application using the Phi-4-multimodal-instruct model to summarize customer voice messages (audio) for a produce supplier company.
๐ง Steps Completed
1. Azure AI Foundry Project Setup
- Signed into Azure AI Foundry
- Deployed the
Phi-4-multimodal-instruct
model to a new project - Noted the endpoint and deployment name
2. Environment Preparation
- Used Azure Cloud Shell (PowerShell) in classic mode
- Cloned the repo:
https://github.com/MicrosoftLearning/mslearn-ai-language
- Installed dependencies:
- For Python:
azure-identity
,azure-ai-projects
,azure-ai-inference
- For C#: relevant NuGet packages
3. Configuration
- Edited configuration files:
.env
(Python)appsettings.json
(C#)- Added your Azure AI project endpoint and model deployment name
4. Code Enhancements
- Initialized the AIProjectClient and chat client
- Built a request to send audio and text prompts to the deployed model
- Used example audio files:
avocados.mp3
fresas.mp3
5. Execution
- Logged in with
az login
- Ran the app with prompts:
- "Can you summarize this customer's voice message?"
- "Is it time-sensitive?"
- Observed model responses based on audio inputs
๐ง Key Concepts Learned
- Multimodal prompts: Combining text and audio in the same request
- Audio input formats: Using
audio_url
(with hosted MP3s) - Azure AI Inference SDK: Simplified interaction with deployed AI models
- Statelessness: No chat history maintained unless you implement it
๐ ๏ธ Next Steps & Ideas
- ๐ Implement chat history tracking
- ๐ค Let users record and upload live audio via UI
- ๐งช Test with your own audio files
- ๐ง Try different system messages (e.g., more formal, time-aware, etc.)
- ๐ Experiment with different multimodal models
- ๐ Build a web interface for the application
- ๐ Add analytics and logging capabilities