Azure AI Language - Complete Guide
๐งฉ Problem Statement
You need to analyze unstructured text automatically to extract meaning, sentiment, entities, or language from large amounts of textual data such as:
- Emails
- Reviews
- Social media posts
- Documents
Key Requirements:
- Detect the language of each document
- Extract key phrases or entities
- Determine the sentiment expressed
- Link entities to a knowledge base like Wikipedia
๐ก Solution with Azure
Azure AI Language provides a comprehensive set of APIs and SDKs offering:
- Language Detection
- Key Phrase Extraction
- Sentiment Analysis
- Named Entity Recognition & Entity Linking
๐งฉ Required Components
- Azure AI Language resource (created via Azure Portal or Azure AI Foundry)
- API endpoint and key
- REST client or SDK (e.g., Python, C#)
- JSON-formatted request payloads
๐ Architecture & Development
๐น Provisioning
- Create an Azure AI Language resource
- Use the endpoint and key to authenticate your app
- Send HTTP requests using REST or use SDKs that wrap those calls
๐น Language Detection
Automatically identifies the language of submitted text documents.
Sample Request:
{
"kind": "LanguageDetection",
"parameters": { "modelVersion": "latest" },
"analysisInput": {
"documents": [
{ "id": "1", "text": "Hello world", "countryHint": "US" },
{ "id": "2", "text": "Bonjour tout le monde" }
]
}
}
Sample Response:
{
"documents": [
{ "id": "1", "detectedLanguage": { "name": "English", "confidenceScore": 1 } },
{ "id": "2", "detectedLanguage": { "name": "French", "confidenceScore": 1 } }
]
}
Important Notes:
- Multilingual inputs return the dominant language, with lower confidence scores if mixed
- If detection fails, the result shows "name": "(Unknown)"
and "confidenceScore": 0.0
๐น Key Phrase Extraction
Identifies the main concepts or key points in a document.
Sample Request:
{
"kind": "KeyPhraseExtraction",
"parameters": { "modelVersion": "latest" },
"analysisInput": {
"documents": [
{
"id": "1",
"language": "en",
"text": "You must be the change you wish to see in the world."
}
]
}
}
Sample Response:
๐น Sentiment Analysis
Determines if the sentiment is positive, neutral, negative, or mixed.
Sample Request:
{
"kind": "SentimentAnalysis",
"parameters": { "modelVersion": "latest" },
"analysisInput": {
"documents": [
{ "id": "1", "language": "en", "text": "Good morning!" }
]
}
}
Sample Response:
{
"documents": [
{
"id": "1",
"sentiment": "positive",
"confidenceScores": { "positive": 0.89, "neutral": 0.1, "negative": 0.01 }
}
]
}
Sentiment Rules:
- All sentences neutral โ document = neutral
- Positive + neutral โ document = positive
- Negative + neutral โ document = negative
- Mixed sentiments โ document = mixed
๐น Entity Linking
Disambiguates named entities (e.g., "Venus") and links them to external sources like Wikipedia.
Sample Request:
{
"kind": "EntityLinking",
"parameters": { "modelVersion": "latest" },
"analysisInput": {
"documents": [
{ "id": "1", "language": "en", "text": "I saw Venus shining in the sky" }
]
}
}
Sample Response:
{
"documents": [
{
"id": "1",
"entities": [
{
"name": "Venus",
"url": "https://en.wikipedia.org/wiki/Venus"
}
]
}
]
}
๐ง Best Practices & Considerations
- Max 1,000 documents per request
- Max 5,120 characters per document
- Use
countryHint
for better language detection accuracy - Multilingual content is handled by picking the dominant language
- Unknown or corrupt inputs will result in
(Unknown)
language withconfidenceScore: 0
๐ฏ Exam Simulation Questions
Q: Which field in the Language Detection response shows model confidence?
โ
confidenceScore
Q: What happens if a document contains multiple languages? โ The service returns the language with the highest representation, with reduced confidence.
Q: Which service should you use to link "Venus" to the correct Wikipedia article? โ Entity Linking
Q: What is the maximum supported text size per document? โ 5,120 characters
Q: What does a confidence score of 0.0 and language name (Unknown) mean? โ The service couldn't determine the language (e.g., due to unreadable input)