Understand How to Ground Your Language Model
Problem
You want to ensure that your LLM-based agent provides accurate, domain-specific, and factual responses.
How can you ground the language model on your own data to improve reliability?
Solution with Azure
Use Retrieval Augmented Generation (RAG) to ground your language model.
RAG retrieves relevant data based on the userβs prompt, augments the prompt with that data, and then uses the LLM to generate a grounded response.
Required Components
- Azure AI Foundry project
- Grounding data (e.g., documents, files, datasets)
- Supported data sources:
- Azure Blob Storage
- Azure Data Lake Storage Gen2
- Microsoft OneLake
- Uploaded files/folders
- RAG pattern implementation
Architecture / Development
RAG Pattern Steps
- Retrieve: Search for relevant grounding data based on the userβs prompt
- Augment: Add the retrieved data to the prompt
- Generate: Use the LLM to produce a grounded response
Adding Grounding Data in Azure AI Foundry
- Connect to supported storage services (Blob, Data Lake, OneLake)
- Upload files or folders directly to the projectβs storage
- Use the connected data as the source for RAG-based flows
Best Practices / Considerations
- Ensure grounding data is accurate, up-to-date, and relevant
- Use RAG when the LLM lacks domain-specific knowledge
- Monitor the quality of grounded responses using evaluation metrics
- Protect sensitive data when uploading or connecting sources
Sample Exam Questions
- What is the purpose of Retrieval Augmented Generation (RAG)?
A. To fine-tune a language model
B. To generate synthetic training data
C. To ground LLM responses using external data
D. To monitor LLM performance
Correct Answer: C
- Which of the following is NOT a supported data source for grounding in Azure AI Foundry?
A. Azure Blob Storage
B. Microsoft OneLake
C. Google Drive
D. Azure Data Lake Storage Gen2
Correct Answer: C
- What are the three main steps in the RAG pattern?
A. Train, Test, Deploy
B. Retrieve, Augment, Generate
C. Input, Process, Output
D. Connect, Monitor, Evaluate
Correct Answer: B
Make Your Data Searchable with Azure AI Search
Problem
You want your LLM-based agent to retrieve accurate and relevant information from your own data.
How can you make your data searchable and integrate it into your chat flow?
Solution with Azure
Use Azure AI Search integrated with Azure AI Foundry to index your data and enable efficient retrieval using keyword, semantic, vector, or hybrid search.
This allows your LLM application to retrieve relevant context and generate grounded responses.
Required Components
- Azure AI Foundry project
- Azure AI Search
- Data source (e.g., documents, files)
- Embedding model (e.g., Azure OpenAI)
- Search index (text-based or vector-based)
Architecture / Development
Creating a Search Index
- Upload or connect your data to Azure AI Foundry
- Use Azure AI Search to create an index
- Choose the indexing method:
- Keyword search: Exact term matching
- Semantic search: Meaning-based retrieval
- Vector search: Embedding-based similarity
- Hybrid search: Combines keyword + vector + optional semantic ranking
Using Vector Indexes
- Embeddings: Represent text as vectors of floating-point numbers
- Cosine similarity: Measures semantic similarity between query and documents
- Embedding model: Use Azure OpenAI to generate embeddings during indexing
- Benefits:
- Retrieve semantically similar content
- Support multilingual and multimodal data
- Improve relevance in generative AI applications
Querying the Index
- Keyword search: Matches exact terms
- Semantic search: Matches meaning
- Vector search: Matches vector similarity
- Hybrid search: Combines all for best accuracy and flexibility
Best Practices / Considerations
- Use vector search for semantic-rich applications
- Use hybrid search for optimal balance of precision and relevance
- Choose the right embedding model for your data type
- Regularly update your index as data changes
Sample Exam Questions
- What is the purpose of using vector embeddings in Azure AI Search?
A. To compress data
B. To store documents
C. To enable semantic similarity search
D. To encrypt search queries
Correct Answer: C
2. Which search method combines keyword, vector, and semantic ranking?
A. Semantic search
B. Hybrid search
C. Full-text search
D. Indexed search
Correct Answer: B
- What is cosine similarity used for in vector search?
A. To sort documents alphabetically
B. To measure the angle between keyword matches
C. To calculate semantic similarity between vectors
D. To encrypt vector data
Correct Answer: C
Create a RAG-based client application
When you've created an Azure AI Search index for your contextual data, you can use it with an OpenAI model. To ground prompts with data from your index, the Azure OpenAI SDK supports extending the request with connection details for the index. The pattern for using this approach when working with an Azure AI Foundry project is shown in the following diagram. 1. Use an Azure AI Foundry project client to retrieve connection details for the Azure AI Search index and an OpenAI ChatClient object. 2. Add the index connection information to the ChatClient configuration so that it can be searched for grounding data based on the user prompt. 3. Submit the grounded prompt to the Azure OpenAI model to generate a contextualized response. The following code example shows how to implement this pattern.
The following code example shows how to implement this pattern.
from openai import AzureOpenAI
# Get an Azure OpenAI chat client
chat_client = AzureOpenAI(
api_version = "2024-12-01-preview",
azure_endpoint = open_ai_endpoint,
api_key = open_ai_key
)
# Initialize prompt with system message
prompt = [
{"role": "system", "content": "You are a helpful AI assistant."}
]
# Add a user input message to the prompt
input_text = input("Enter a question: ")
prompt.append({"role": "user", "content": input_text})
# Additional parameters to apply RAG pattern using the AI Search index
rag_params = {
"data_sources": [
{
"type": "azure_search",
"parameters": {
"endpoint": search_url,
"index_name": "index_name",
"authentication": {
"type": "api_key",
"key": search_key,
}
}
}
],
}
# Submit the prompt with the index information
response = chat_client.chat.completions.create(
model="<model_deployment_name>",
messages=prompt,
extra_body=rag_params
)
# Print the contextualized response
completion = response.choices[0].message.content
print(completion)
To use a vector-based query, you can modify the specification of the Azure AI Search data source details to include an embedding model; which is then used to vectorize the query text.
rag_params = {
"data_sources": [
{
"type": "azure_search",
"parameters": {
"endpoint": search_url,
"index_name": "index_name",
"authentication": {
"type": "api_key",
"key": search_key,
},
# Params for vector-based query
"query_type": "vector",
"embedding_dependency": {
"type": "deployment_name",
"deployment_name": "<embedding_model_deployment_name>",
},
}
}
],
}
Implement RAG in a Prompt Flow
Problem
You want to build a generative AI application that uses your own data to generate accurate, grounded responses.
How can you implement the Retrieval Augmented Generation (RAG) pattern in a Prompt Flow using Azure AI Foundry?
Solution with Azure
Use Prompt Flow to define a flow that implements the RAG pattern by retrieving relevant data from an index and using it to augment prompts sent to a language model (LLM).
Required Components
- Azure AI Foundry project
- Uploaded data and created search index (via Azure AI Search)
- Prompt Flow with:
- LLM tool
- Index Lookup tool
- Python tool (optional for formatting)
- Prompt variants
- Sample: Multi-round Q&A on your data
Architecture / Development
Flow Structure
- Modify Query with History
- Use an LLM node to combine the userβs latest question with chat history
-
Output: a contextualized query
-
Look Up Relevant Information
- Use the Index Lookup tool to query the search index
-
Retrieve top N relevant documents
-
Generate Prompt Context
- Use a Python node to:
- Iterate over retrieved documents
- Combine content and sources into a single string
-
Output: formatted context string
-
Define Prompt Variants
- Use variants to test different system messages or prompt structures
-
Goal: improve groundedness and factual accuracy
-
Chat with Context
- Use an LLM node to generate a response using the augmented prompt
- Output: final response to user
Best Practices / Considerations
- Use the Multi-round Q&A sample as a starting point
- Ensure your index is optimized for vector or hybrid search
- Use prompt variants to experiment with different grounding strategies
- Format retrieved context clearly to improve LLM comprehension
- Deploy the flow to an endpoint for real-time use
Sample Exam Questions
- Which tool is essential for retrieving data from an index in a RAG-based flow?
A. Python tool
B. Prompt tool
C. Index Lookup tool
D. Variant tool
Correct Answer: C
- What is the purpose of the Python node in a RAG flow?
A. To generate embeddings
B. To format retrieved documents into a prompt context
C. To deploy the flow
D. To store chat history
Correct Answer: B
- What does a prompt variant allow you to do?
A. Change the LLM model
B. Create multiple versions of a prompt for comparison
C. Switch between different flows
D. Encrypt user input
Correct Answer: B