Understand When to Fine-Tune a Language Model

Problem

You want to improve the quality and consistency of responses generated by a language model in your chat application.
How do you decide whether to use prompt engineering, RAG, or fine-tuning?

Solution with Azure

Use prompt engineering for quick improvements, RAG to ground responses in factual data, and fine-tuning when you need consistent behavior, tone, or formatting that prompt engineering alone cannot guarantee.

Required Components

Azure AI Foundry project
Prompt Flow
Base language model
Training dataset (for fine-tuning)
Optional: Azure AI Search (for RAG)

Architecture / Development

Optimization Strategies

Prompt Engineering
Modify prompt structure or system message
Use one-shot or few-shot examples
Fast and easy, but may lack consistency
Retrieval Augmented Generation (RAG)
Retrieve relevant context from external data
Ground responses in factual, domain-specific information
Ideal for knowledge-based applications
Fine-Tuning
Train a base model on your own dataset
Achieve consistent tone, style, and format
Useful when prompt engineering is insufficient

When to Use Fine-Tuning

You need consistent output style or tone
Prompt engineering fails to enforce structure
You want the model to behave in a specific way across all interactions
You have a labeled dataset for training

Best Practices / Considerations

Start with prompt engineering for quick wins
Use RAG when factual accuracy is critical
Fine-tune only when necessary due to higher complexity and cost
Combine strategies (e.g., RAG + fine-tuned model) for best results

Sample Exam Questions

What is the primary goal of fine-tuning a language model?
A. To reduce latency
B. To improve factual accuracy
C. To enforce consistent behavior and style
D. To increase token limits

Correct Answer: C 2. Which technique is best for grounding a model in external data?
A. Prompt engineering
B. Fine-tuning
C. Retrieval Augmented Generation (RAG)
D. Few-shot learning

Correct Answer: C

What is a limitation of prompt engineering?
A. It requires labeled training data
B. It cannot modify system messages
C. It may not consistently enforce output format
D. It is not supported in Azure AI Foundry

Correct Answer: C

Prepare your data to fine-tune a chat completion model

Fine-tuning involves combining a suitable foundation model to use as a base, and with a set of training data that includes example prompts and responses that the model can learn from.

Diagram of a base model plus training data resulting in a fine-tuned model.

When you decide you want to fine-tune a language model, you need to identify the dataset you can use to fine-tune your language model.

Similar to any machine learning model, the quality of the dataset has a large effect on the quality of your model. Though you need less data than when you would train a language model from scratch, you still might need enough data to maximize the consistency of your desired model's behavior. How much data you need depends on your use case.

When you fine-tune a language model for chat completion, the data you use to fine-tune a model is a collection of sample conversations. More specifically, the data should contain three components:

The system message
The user message

The assistant's response The three variables come together in a JSON Lines or JSONL file. For example, one line in such a dataset might look like:

{"messages": [{"role": "system", "content": "You are an Xbox customer support agent whose primary goal is to help users with issues they are experiencing with their Xbox devices. You are friendly and concise. You only provide factual answers to queries, and do not provide answers that are not related to Xbox."}, {"role": "user", "content": "Is Xbox better than PlayStation?"}, {"role": "assistant", "content": "I apologize, but I cannot provide personal opinions. My primary job is to assist you with any issues related to your Xbox device. Do you have any Xbox-related issues that need addressing?"}]}

The dataset should show the model's ideal behavior. You can create this dataset based on the chat history of a chat application you have. A few things to keep in mind when you use real data is to:

Remove any personal or sensitive information. Not only focus on creating a large training dataset, but also ensure your dataset includes a diverse set of examples. You can include multiple turns of a conversation on a single line in the dataset. If you want to fine-tune only on specific assistant messages, you can optionally use the weight key-value pair. When the weight is set to 0, the message is ignored, when you set to 1, the message is included for training.

An example of a multi-turn chat file format with weights:

{"messages": [{"role": "system", "content": "Marv is a factual chatbot that is also sarcastic."}, {"role": "user", "content": "What's the capital of France?"}, {"role": "assistant", "content": "Paris", "weight": 0}, {"role": "user", "content": "Can you be more sarcastic?"}, {"role": "assistant", "content": "Paris, as if everyone doesn't know that already.", "weight": 1}]}

When preparing your dataset to fine-tune a language model, you should understand your desired model behaviors, create a dataset in JSONL format, and ensure the examples you include are high quality and diverse. By preparing your dataset, you have a higher chance that the fine-tuned model improves your chat application's performance.

Explore Fine-Tuning Language Models in Azure AI Studio

Problem

You want to customize a language model to better suit your specific task (e.g., chat completion, classification, translation).
How can you fine-tune a foundation model using Azure AI Studio?

Solution with Azure

Use Azure AI Studio to select a base model from the model catalog and fine-tune it on your own dataset.
You can configure the fine-tuning job through the portal and deploy the resulting model for use in your application.

Required Components

Azure AI Foundry project
Access to Azure AI Studio
Foundation model (e.g., GPT-4, Llama-2-7b)
Training dataset (and optional validation dataset)
Fine-tuning configuration parameters

Architecture / Development

1. Select the Base Model

Navigate to the Model Catalog in Azure AI Studio
Filter models by fine-tuning task (e.g., chat completion)
Consider:
Model capabilities (e.g., BERT for short texts)
Pretraining data (e.g., GPT-2 trained on internet data)
Biases and limitations
Language support
Review model cards (linked to Hugging Face) for detailed info

2. Configure the Fine-Tuning Job

Steps: 1. Select base model
2. Upload/select training data
3. (Optional) Upload validation data
4. Configure advanced options:

Parameter	Description
`batch_size`	Number of examples per training step. Larger sizes = less frequent updates, lower variance.
`learning_rate_multiplier`	Multiplies the base learning rate. Try values between 0.02 and 0.2.
`n_epochs`	Number of full passes through the training data.
`seed`	Controls reproducibility of training results.

Submit the job and monitor its status in the portal
Review input parameters and validation results after completion

3. Deploy and Test

Deploy the fine-tuned model to an endpoint
Test the model’s performance
Integrate with your chat application when satisfied

Best Practices / Considerations

Ensure the model is available in your region and quota
Use validation data to evaluate performance
Start with prompt engineering or RAG before fine-tuning
Fine-tune only when consistent behavior or task-specific adaptation is required

Sample Exam Questions

Which parameter controls how many training examples are used per step?
A. n_epochs
B. learning_rate_multiplier
C. batch_size
D. seed

Correct Answer: C

Where can you find detailed information about a foundation model in Azure AI Studio?
A. Azure CLI
B. Model card linked in the catalog
C. Azure Monitor
D. Azure Key Vault

Correct Answer: B

What is the purpose of the learning_rate_multiplier?
A. To increase the number of epochs
B. To control the size of the training dataset
C. To scale the base learning rate during fine-tuning
D. To set the model’s output format

Correct Answer: C