Extract Custom Entities with Azure AI Language Service ๐
๐งฉ Problem
You need to build, train, deploy, and test a custom entity extraction model using Azure AI Language Service for unstructured text (classified ads in this case).
๐ก Solution with Azure
Use Azure AI Language Service - Custom Named Entity Recognition (Custom NER) via:
- Azure portal + Language Studio for project creation and labeling
- Azure SDK (C# / Python) to consume the model from a client application
โ๏ธ Components Required
- Azure AI Language resource (Custom Text Classification & Extraction feature enabled)
- Azure Storage account (blob storage)
- Azure Role assignment: Storage Blob Data Contributor
- Sample ads dataset: https://aka.ms/entity-extraction-ads
- Language Studio
- Visual Studio Code with:
- Azure AI Text Analytics SDK Azure.AI.TextAnalytics 5.3.0
- Git clone of repository: https://github.com/MicrosoftLearning/mslearn-ai-language
๐๏ธ Architecture / Development
1๏ธโฃ Provision Azure AI Language Resource
- Create Language resource in Azure portal
- Enable Custom named entity recognition extraction
- Supported regions: East US, West Europe, UK South, etc.
- Pricing tier: F0 (free) or S (standard)
- Create new storage account (Standard LRS)
- Assign role: Storage Blob Data Contributor to your user
2๏ธโฃ Upload Training Data
- Download sample ads: https://aka.ms/entity-extraction-ads
- Upload files to blob container classifieds (anonymous read access enabled)
3๏ธโฃ Create Project in Language Studio
- Project type: Custom Named Entity Recognition
- Project name: CustomEntityLab
- Language: English (US)
- Container: classifieds
- Files are not pre-labeled: select "No, I need to label my files as part of this project."
4๏ธโฃ Label Data
Create 3 entities: - ItemForSale - Price - Location
Label text in files:
File | Entity | Example |
---|---|---|
Ad 1.txt | ItemForSale | face cord of firewood |
Ad 1.txt | Location | Denver, CO |
Ad 1.txt | Price | $90 |
Label all ads and save labels.
5๏ธโฃ Train Model
- Model name: ExtractAds
- Split type: Automatically split the testing set
- Start training
6๏ธโฃ Evaluate Model
- Review model metrics (precision, recall, F1 score)
- Use Model performance and check testing documents
- Analyze failed extractions to guide improvements
7๏ธโฃ Deploy Model
- Deployment name: AdEntities
- Deploy trained model ExtractAds
8๏ธโฃ Develop Client Application
Clone repo:
Open project: Labfiles/05-custom-entity-recognition/custom-entities
in VS Code.
Install SDK:
C#:
Python:
Configure app settings:
- C#: appsettings.json
- Python: .env
Set: aiSvcKey
, aiSvcEndpoint
, projectName
, deploymentName
.
9๏ธโฃ Add Code to Extract Entities
Import namespaces:
C#:
Python:
from azure.core.credentials import AzureKeyCredential
from azure.ai.textanalytics import TextAnalyticsClient
Create client:
C#:
AzureKeyCredential credentials = new(aiSvcKey);
Uri endpoint = new(aiSvcEndpoint);
TextAnalyticsClient aiClient = new(endpoint, credentials);
Python:
credential = AzureKeyCredential(ai_key)
ai_client = TextAnalyticsClient(endpoint=ai_endpoint, credential=credential)
Extract entities:
C#:
RecognizeCustomEntitiesOperation operation = await aiClient.RecognizeCustomEntitiesAsync(
WaitUntil.Completed, batchedDocuments, projectName, deploymentName);
Python:
operation = ai_client.begin_recognize_custom_entities(
batchedDocuments, project_name=project_name, deployment_name=deployment_name)
document_results = operation.result()
๐ Test Application
Run the app:
C#:
Python:
Output: Lists recognized entities with confidence score for each document.
๐ง Best Practice / Considerations
- Assign role Storage Blob Data Contributor to avoid authorization errors
- Use high-quality, diverse real-world data for training
- Avoid ambiguous entity definitions
- Use confusion matrix and model metrics to iteratively improve
- Secure storage access in production (avoid anonymous access)
โ Exam-like Sample Questions
Question 1:
Which task kind should be specified when submitting extraction jobs for custom entity recognition?
A. CustomTextClassification
B. CustomEntityRecognition
C. RecognizeEntities
โ Answer: B
Question 2:
Which dataset was used in this lab for custom entity extraction?
A. Classified ads
B. News articles
C. Financial statements
โ Answer: A
Question 3:
Which entities were defined for extraction?
A. Category, Price, Description
B. ItemForSale, Price, Location
C. ItemName, Country, Contact
โ Answer: B
Question 4:
What SDK version was installed?
A. 4.2.0
B. 5.3.0
C. 3.1.0
โ Answer: B
Question 5:
Which split type was used for training in this lab?
A. Manual split
B. No split
C. Automatic split
โ Answer: C