Skip to content

Understand When to Fine-Tune a Language Model

Problem

You want to improve the quality and consistency of responses generated by a language model in your chat application.
How do you decide whether to use prompt engineering, RAG, or fine-tuning?

Solution with Azure

Use prompt engineering for quick improvements, RAG to ground responses in factual data, and fine-tuning when you need consistent behavior, tone, or formatting that prompt engineering alone cannot guarantee.

Required Components

  • Azure AI Foundry project
  • Prompt Flow
  • Base language model
  • Training dataset (for fine-tuning)
  • Optional: Azure AI Search (for RAG)

Architecture / Development

Optimization Strategies

  1. Prompt Engineering
  2. Modify prompt structure or system message
  3. Use one-shot or few-shot examples
  4. Fast and easy, but may lack consistency
  5. Retrieval Augmented Generation (RAG)
  6. Retrieve relevant context from external data
  7. Ground responses in factual, domain-specific information
  8. Ideal for knowledge-based applications

  9. Fine-Tuning

  10. Train a base model on your own dataset
  11. Achieve consistent tone, style, and format
  12. Useful when prompt engineering is insufficient

When to Use Fine-Tuning

  • You need consistent output style or tone
  • Prompt engineering fails to enforce structure
  • You want the model to behave in a specific way across all interactions
  • You have a labeled dataset for training

Best Practices / Considerations

  • Start with prompt engineering for quick wins
  • Use RAG when factual accuracy is critical
  • Fine-tune only when necessary due to higher complexity and cost
  • Combine strategies (e.g., RAG + fine-tuned model) for best results

Sample Exam Questions

  1. What is the primary goal of fine-tuning a language model?
    A. To reduce latency
    B. To improve factual accuracy
    C. To enforce consistent behavior and style
    D. To increase token limits

Correct Answer: C 2. Which technique is best for grounding a model in external data?
A. Prompt engineering
B. Fine-tuning
C. Retrieval Augmented Generation (RAG)
D. Few-shot learning

Correct Answer: C

  1. What is a limitation of prompt engineering?
    A. It requires labeled training data
    B. It cannot modify system messages
    C. It may not consistently enforce output format
    D. It is not supported in Azure AI Foundry

Correct Answer: C

Prepare your data to fine-tune a chat completion model

Fine-tuning involves combining a suitable foundation model to use as a base, and with a set of training data that includes example prompts and responses that the model can learn from.

Diagram of a base model plus training data resulting in a fine-tuned model.

When you decide you want to fine-tune a language model, you need to identify the dataset you can use to fine-tune your language model.

Similar to any machine learning model, the quality of the dataset has a large effect on the quality of your model. Though you need less data than when you would train a language model from scratch, you still might need enough data to maximize the consistency of your desired model's behavior. How much data you need depends on your use case.

When you fine-tune a language model for chat completion, the data you use to fine-tune a model is a collection of sample conversations. More specifically, the data should contain three components:

  • The system message
  • The user message
  • The assistant's response The three variables come together in a JSON Lines or JSONL file. For example, one line in such a dataset might look like:
    {"messages": [{"role": "system", "content": "You are an Xbox customer support agent whose primary goal is to help users with issues they are experiencing with their Xbox devices. You are friendly and concise. You only provide factual answers to queries, and do not provide answers that are not related to Xbox."}, {"role": "user", "content": "Is Xbox better than PlayStation?"}, {"role": "assistant", "content": "I apologize, but I cannot provide personal opinions. My primary job is to assist you with any issues related to your Xbox device. Do you have any Xbox-related issues that need addressing?"}]}
    
    The dataset should show the model's ideal behavior. You can create this dataset based on the chat history of a chat application you have. A few things to keep in mind when you use real data is to:

Remove any personal or sensitive information. Not only focus on creating a large training dataset, but also ensure your dataset includes a diverse set of examples. You can include multiple turns of a conversation on a single line in the dataset. If you want to fine-tune only on specific assistant messages, you can optionally use the weight key-value pair. When the weight is set to 0, the message is ignored, when you set to 1, the message is included for training.

An example of a multi-turn chat file format with weights:

{"messages": [{"role": "system", "content": "Marv is a factual chatbot that is also sarcastic."}, {"role": "user", "content": "What's the capital of France?"}, {"role": "assistant", "content": "Paris", "weight": 0}, {"role": "user", "content": "Can you be more sarcastic?"}, {"role": "assistant", "content": "Paris, as if everyone doesn't know that already.", "weight": 1}]}
When preparing your dataset to fine-tune a language model, you should understand your desired model behaviors, create a dataset in JSONL format, and ensure the examples you include are high quality and diverse. By preparing your dataset, you have a higher chance that the fine-tuned model improves your chat application's performance.

Explore Fine-Tuning Language Models in Azure AI Studio

Problem

You want to customize a language model to better suit your specific task (e.g., chat completion, classification, translation).
How can you fine-tune a foundation model using Azure AI Studio?

Solution with Azure

Use Azure AI Studio to select a base model from the model catalog and fine-tune it on your own dataset.
You can configure the fine-tuning job through the portal and deploy the resulting model for use in your application.

Required Components

  • Azure AI Foundry project
  • Access to Azure AI Studio
  • Foundation model (e.g., GPT-4, Llama-2-7b)
  • Training dataset (and optional validation dataset)
  • Fine-tuning configuration parameters

Architecture / Development

1. Select the Base Model

  • Navigate to the Model Catalog in Azure AI Studio
  • Filter models by fine-tuning task (e.g., chat completion)
  • Consider:
  • Model capabilities (e.g., BERT for short texts)
  • Pretraining data (e.g., GPT-2 trained on internet data)
  • Biases and limitations
  • Language support
  • Review model cards (linked to Hugging Face) for detailed info

2. Configure the Fine-Tuning Job

Steps: 1. Select base model
2. Upload/select training data
3. (Optional) Upload validation data
4. Configure advanced options:

Parameter Description
batch_size Number of examples per training step. Larger sizes = less frequent updates, lower variance.
learning_rate_multiplier Multiplies the base learning rate. Try values between 0.02 and 0.2.
n_epochs Number of full passes through the training data.
seed Controls reproducibility of training results.
  • Submit the job and monitor its status in the portal
  • Review input parameters and validation results after completion

3. Deploy and Test

  • Deploy the fine-tuned model to an endpoint
  • Test the model’s performance
  • Integrate with your chat application when satisfied

Best Practices / Considerations

  • Ensure the model is available in your region and quota
  • Use validation data to evaluate performance
  • Start with prompt engineering or RAG before fine-tuning
  • Fine-tune only when consistent behavior or task-specific adaptation is required

Sample Exam Questions

  1. Which parameter controls how many training examples are used per step?
    A. n_epochs
    B. learning_rate_multiplier
    C. batch_size
    D. seed

Correct Answer: C

  1. Where can you find detailed information about a foundation model in Azure AI Studio?
    A. Azure CLI
    B. Model card linked in the catalog
    C. Azure Monitor
    D. Azure Key Vault

Correct Answer: B

  1. What is the purpose of the learning_rate_multiplier?
    A. To increase the number of epochs
    B. To control the size of the training dataset
    C. To scale the base learning rate during fine-tuning
    D. To set the model’s output format

Correct Answer: C