Implement Retrieval Augmented Generation (RAG) with Azure OpenAI Models
Azure OpenAI on your data allows developers to implement RAG with supported AI chat models to reference specific sources of data to ground the response.
๐ฏ Learning Objectives
By the end of this module, you'll be able to:
- Describe the capabilities of Azure OpenAI on your data
- Configure Azure OpenAI to use your own data
- Use Azure OpenAI API to generate responses based on your own data
๐ What is RAG?
Retrieval Augmented Generation (RAG) allows supported AI models to retrieve and ground answers on specific external data sources, in addition to their pretrained knowledge. Azure OpenAI integrates with Azure AI Search to:
- Receive user prompt
- Analyze the content and intent
- Query the AI Search index
- Insert retrieved chunks into the prompt
- Call Azure OpenAI model with augmented prompt
- Return grounded response and (optional) citation
๐ง Fine-tuning vs RAG
Feature | Fine-tuning | RAG (Azure OpenAI on your data) |
---|---|---|
Data Training | Required (costly and time-intensive) | Not required |
Prompt size | Useful for large contexts beyond prompt size | Works within token limits |
Flexibility | Customizes the model permanently | Keeps model stateless and flexible |
๐ Add Your Own Data
You can connect your data through:
- Azure AI Studio (Chat playground)
- Blob Storage
- Manual AI Search index connection
Supported File Types
.md
,.txt
,.html
,.pdf
,.docx
,.pptx
๐ If documents contain images, make sure text is extractable.
Best Practices
- Use Azure AI Studio for data chunking & index creation
- Use data preparation scripts for large documents
- Enable semantic search to improve results (may increase cost)
๐ Connect Data in Azure AI Studio
- Go to Chat playground
- Open the Add your data tab
- Click Add a data source
- Follow the wizard to map fields (especially content fields)
๐ฌ Chat with Your Data
You can now chat:
- In Azure AI Studio, or
- Through the API
๐ Token Considerations
- Max system message: ~4000 tokens
- Response limited to 1500 tokens
- Include in token count:
- System message
- Prompt
- History
- Search results
- Response
โ Use prompt engineering techniques like chain-of-thought prompting
๐ก Using the API with Your Data
Request Example
{
"dataSources": [
{
"type": "AzureCognitiveSearch",
"parameters": {
"endpoint": "<your_search_endpoint>",
"key": "<your_search_key>",
"indexName": "<your_search_index>"
}
}
],
"messages": [
{
"role": "system",
"content": "You are a helpful assistant assisting users with travel recommendations."
},
{
"role": "user",
"content": "I want to go to New York. Where should I stay?"
}
]
}
API Endpoint Format
<your_azure_openai_resource>/openai/deployments/<deployment_name>/chat/completions?api-version=<version>
Headers Required
๐ Complete Implementation Example
Python SDK Implementation
from openai import AzureOpenAI
import json
# Initialize the Azure OpenAI client
client = AzureOpenAI(
azure_endpoint="<your_azure_openai_endpoint>",
api_key="<your_api_key>",
api_version="2024-02-15-preview"
)
# Define your data source
data_sources = [
{
"type": "AzureCognitiveSearch",
"parameters": {
"endpoint": "<your_search_endpoint>",
"key": "<your_search_key>",
"indexName": "<your_search_index>"
}
}
]
# Make the request
response = client.chat.completions.create(
model="<your_deployment_name>",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What information do you have about our products?"}
],
extra_body={"dataSources": data_sources}
)
print(response.choices[0].message.content)
cURL Example
curl -X POST \
"<your_azure_openai_endpoint>/openai/deployments/<deployment_name>/chat/completions?api-version=2024-02-15-preview" \
-H "Content-Type: application/json" \
-H "api-key: <your_api_key>" \
-d '{
"dataSources": [
{
"type": "AzureCognitiveSearch",
"parameters": {
"endpoint": "<your_search_endpoint>",
"key": "<your_search_key>",
"indexName": "<your_search_index>"
}
}
],
"messages": [
{
"role": "system",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": "Tell me about our company policies."
}
]
}'
โ Best Practices Summary
- Data Preparation: Ensure your documents are well-structured and text-extractable
- Index Configuration: Use Azure AI Studio for optimal chunking and indexing
- Token Management: Monitor token usage to stay within limits
- Semantic Search: Enable for better retrieval quality (consider cost implications)
- Prompt Engineering: Use techniques like chain-of-thought for better responses
- Citation Handling: Leverage the citation features for transparency and trust
๐ง Troubleshooting Common Issues
- Large Documents: Use data preparation scripts for better chunking
- Poor Results: Check field mappings and enable semantic search
- Token Limits: Reduce context or use summarization techniques
- API Errors: Verify endpoint URLs and API keys are correct