๐ฏ Azure AI Vision Image Analysis Study Guide
๐ Problem
You need to analyze images to automatically:
- ๐ Generate descriptive captions in natural language
- ๐ท๏ธ Suggest tags representing objects, scenery, or actions
- ๐ฏ Detect and locate objects or people in the image
You want to implement this solution using Azure services with appropriate architecture, components, and configurations.
โ๏ธ Solution with Azure
Use Azure AI Vision (Computer Vision) service to analyze images and extract information. Provision an Azure AI Vision resource and connect to it through REST API or SDK (Python, .NET, etc.).
๐ง Key capabilities include:
- ๐ CAPTION: Generate a natural language description of the image
- ๐ท๏ธ TAGS: Identify objects, scenery, setting, and actions
- ๐ฆ OBJECTS: Locate objects with bounding boxes
- ๐ฅ PEOPLE: Locate people with bounding boxes
- ๐ DENSE_CAPTIONS: Generate detailed captions for detected objects
- โ๏ธ SMART_CROPS: Suggest crop regions for a specified aspect ratio
- ๐ค READ: Extract readable text from images (OCR)
๐งฉ Components Required
-
๐ Azure AI Vision resource, provisioned in one of the following ways:
- Azure AI Foundry project โ AI Foundry hub โ multi-service resource (includes AI Vision)
- Azure AI services multi-service resource
- Standalone Computer Vision resource (includes free tier for testing)
-
๐ฑ Client app (Python, .NET, etc.) using:
- REST API
- SDK (e.g.,
azure.ai.vision.imageanalysis
for Python)
-
๐ Authentication:
- Key-based (authorization key)
- Microsoft Entra ID token
- (Production) Managed identity or Azure Key Vault for securing credentials
๐๏ธ Architecture / Development
1. ๐ Provision the resource:
- Create AI Vision / multi-service resource
- Obtain endpoint (e.g.,
https://<resource_name>.cognitiveservices.azure.com/
) - Obtain key or set up Entra ID access
2. ๐ Connect client app:
Example in Python (key-based):
from azure.ai.vision.imageanalysis import ImageAnalysisClient
from azure.ai.vision.imageanalysis.models import VisualFeatures
from azure.core.credentials import AzureKeyCredential
client = ImageAnalysisClient(
endpoint="<YOUR_RESOURCE_ENDPOINT>",
credential=AzureKeyCredential("<YOUR_AUTHORIZATION_KEY>")
)
result = client.analyze(
image_data=<IMAGE_DATA_BYTES>,
visual_features=[VisualFeatures.CAPTION, VisualFeatures.TAGS],
gender_neutral_caption=True,
)
3. ๐ผ๏ธ Image requirements:
- Format: JPEG, PNG, GIF, BMP
- Size: < 4 MB
- Dimensions: > 50 x 50 pixels
4. ๐ค Submit image:
- Upload image bytes
- Or provide URL using
analyze_from_url
5. ๐ฅ Receive response (JSON with captions, tags, bounding boxes, confidence scores, etc.)
Example JSON excerpt:
{
"denseCaptionsResult": {
"values": [
{
"text": "a house in the woods",
"confidence": 0.705,
"boundingBox": { "x": 0, "y": 0, "w": 640, "h": 640 }
}
]
}
}
โญ Best Practice / Considerations
- ๐ Use Microsoft Entra ID authentication + managed identity for production security
- ๐๏ธ Secure keys in Azure Key Vault if key-based authentication is required
- ๐ค For collaborative or multi-AI service solutions, prefer Azure AI Foundry projects
- ๐ Ensure image size, format, and dimensions meet requirements to avoid API errors
- โก Specify only necessary visual features to reduce processing time and cost
๐ Simulated Exam Questions
1๏ธโฃ You need to generate a natural language caption and identify tags for an image using Azure. Which visual features should you request?
- โ CAPTION, TAGS
- โ OBJECTS, PEOPLE
- โ READ, SMART_CROPS
2๏ธโฃ A client app sends images for analysis but fails due to large file size. What is the maximum allowed file size for image analysis in Azure AI Vision?
- โ 4 MB
- โ 10 MB
- โ 50 MB
3๏ธโฃ What is a recommended authentication method for a production client app accessing Azure AI Vision?
- โ Microsoft Entra ID with managed identity
- โ Hardcoded authorization key in app
- โ Anonymous access
4๏ธโฃ Which resource type should you choose if you want to experiment with Azure AI Vision at no cost?
- โ Standalone Computer Vision resource (free tier)
- โ AI Foundry project
- โ Multi-service AI resource