Azure AI Foundry Content Safety Guide
π Problem
You need to detect and moderate harmful, offensive, or inappropriate contentβwhether user-generated or AI-generatedβacross various formats (text, image, multimodal) and languages, ensuring compliance with regulations and protecting user safety and brand integrity.
β Solution with Azure
Use Azure AI Foundry Content Safety, an AI-powered content moderation service that analyzes text, images, and multimodal inputs to detect content that falls into four core categories: * Violence * Hate speech * Sexual content * Self-harm
It replaces the deprecated Azure Content Moderator and provides more advanced, multilingual and multimodal content safety tools.
π§© Required Components
- Azure AI Foundry Portal (access via Azure portal)
- Content Safety Studio or Azure AI Content Studio
- Moderation APIs for:
- Text content
- Image content
- Multimodal content (OCR + analysis)
- Custom Categories (optional)
- Prompt Shields API for LLM prompt protection
- Protected Material Detection
- Groundedness Detection API
- Safety System Messages
ποΈ Architecture / Development
π§ Text Moderation
- Input analyzed by NLP across four categories
- Severity levels (0β6) returned per category
- Use blocklists for specific terms
- API returns structured moderation data
πΌοΈ Image Moderation
- Uses Florence foundation model
- Returns severity: safe, low, high
- Developer sets threshold: low, medium, high
- Evaluated per category
π§Ύ Multimodal Moderation
- Uses OCR to extract and analyze text within images
- Same four categories evaluated
π‘οΈ Prompt Shields
- Blocks prompt-based jailbreaks on LLMs
- Applies to user input and embedded document content
π§Ύ Protected Material Detection
- Flags copyright violations in AI-generated content
π Groundedness Detection
- Compares LLM responses to source data
- Optionally returns reasoning for ungrounded flags
ποΈ Custom Categories
- Define categories using positive/negative examples
- Train a model for customized moderation
π Best Practices / Considerations
- Test on real data before deploying
- Continuously monitor accuracy post-deployment
- Use human moderators for edge cases and appeals
- Communicate clearly to users why content is flagged
- Understand AI limits: possibility of false positives/negatives
- Evaluate using:
- β True Positives
- β False Positives
- β True Negatives
- β False Negatives
β Simulated Exam Questions
-
π§ How does Azure AI Foundry Content Safety determine if text should be blocked or approved? β€ By assigning severity levels (0β6) across four categories: violence, hate, sexual content, and self-harm.
-
π§ What model powers image analysis in Foundry Content Safety? β€ Florence foundation model.
-
π§ How does the system evaluate multimodal content? β€ It uses OCR to extract text from images, then analyzes both the image and text across the four categories.
-
π§ Which service should you use to prevent prompt-based jailbreaks in LLM inputs? β€ Prompt Shields.
-
π§ What should you do before deploying Foundry Content Safety in a live environment? β€ Test on real data and plan continuous monitoring.
-
π§ Which functionality detects grounded vs. ungrounded responses from LLMs? β€ Groundedness Detection.
-
π§ Can you define custom moderation rules? How? β€ Yes, using the Custom Categories feature by training with example content.
-
π§ What are the severity levels used for image moderation? β€ Safe, Low, High β combined with threshold settings (Low, Medium, High) to determine action.