Azure AI Speech Service Implementation Guide
๐งฉ Problem
You need to implement an application that can recognize spoken commands and respond with synthesized speech output.
๐ก Solution with Azure
Use Azure AI Speech service: - Speech-to-Text API for recognizing speech - Text-to-Speech API for generating speech output
๐ ๏ธ Components required
- Azure AI Speech resource (created via Azure Portal)
- Visual Studio Code
- Language SDK:
- Microsoft.CognitiveServices.Speech NuGet package (C#)
- azure-cognitiveservices-speech pip package (Python)
- (Optional) Audio input/output libraries:
- System.Media.SoundPlayer (C#)
- playsound (Python)
- Microphone (or use audio file as alternative)
GitHub repository: https://github.com/MicrosoftLearning/mslearn-ai-language
๐๏ธ Architecture / Development
1. ๐ง Provision Azure AI Speech resource
Azure Portal โ Azure AI Services โ Create Speech Service
Required settings: Subscription, Resource group, Region, Name, Pricing tier
After deployment: Get Key and Region from Keys and Endpoint
2. ๐งโ๐ป Set up development environment
- Clone repo: mslearn-ai-language
- Open in VS Code โ trust project if prompted
- Use Labfiles/07-speech โ choose CSharp or Python โ speaking-clock folder
3. ๐ฆ Install SDK
C#
Python
4. โ๏ธ Configure app
Edit configuration file: - C#: appsettings.json - Python: .env
Add key and region
5. ๐ง Initialize SDK
C#:
using Microsoft.CognitiveServices.Speech;
using Microsoft.CognitiveServices.Speech.Audio;
speechConfig = SpeechConfig.FromSubscription(aiSvcKey, aiSvcRegion);
speechConfig.SpeechSynthesisVoiceName = "en-US-AriaNeural";
Python:
import azure.cognitiveservices.speech as speech_sdk
speech_config = speech_sdk.SpeechConfig(ai_key, ai_region)
speech_config.speech_synthesis_voice_name = "en-US-AriaNeural"
6. ๐ค Recognize speech input
Microphone
C#:
using AudioConfig audioConfig = AudioConfig.FromDefaultMicrophoneInput();
using SpeechRecognizer speechRecognizer = new SpeechRecognizer(speechConfig, audioConfig);
Python:
audio_config = speech_sdk.AudioConfig(use_default_microphone=True)
speech_recognizer = speech_sdk.SpeechRecognizer(speech_config, audio_config)
Or from audio file
Install playback:
C#:
Python:
Add code:
C#:
string audioFile = "time.wav";
SoundPlayer wavPlayer = new SoundPlayer(audioFile);
wavPlayer.Play();
using AudioConfig audioConfig = AudioConfig.FromWavFileInput(audioFile);
Python:
from playsound import playsound
audioFile = os.getcwd() + '\\time.wav'
playsound(audioFile)
audio_config = speech_sdk.AudioConfig(filename=audioFile)
7. ๐ Transcribe speech
C#:
SpeechRecognitionResult speech = await speechRecognizer.RecognizeOnceAsync();
if (speech.Reason == ResultReason.RecognizedSpeech) {
command = speech.Text;
Console.WriteLine(command);
}
Python:
speech = speech_recognizer.recognize_once_async().get()
if speech.reason == speech_sdk.ResultReason.RecognizedSpeech:
command = speech.text
print(command)
8. ๐ฃ๏ธ Synthesize speech
C#:
using SpeechSynthesizer speechSynthesizer = new SpeechSynthesizer(speechConfig);
SpeechSynthesisResult speak = await speechSynthesizer.SpeakTextAsync(responseText);
Python:
speech_synthesizer = speech_sdk.SpeechSynthesizer(speech_config)
speak = speech_synthesizer.speak_text_async(response_text).get()
9. ๐ Use alternative voice
C#:
Python:
10. ๐ Use SSML
C#:
string responseSsml = $@"
<speak version='1.0' xmlns='http://www.w3.org/2001/10/synthesis' xml:lang='en-US'>
<voice name='en-GB-LibbyNeural'>
{responseText}
<break strength='weak'/>
Time to end this lab!
</voice>
</speak>";
SpeechSynthesisResult speak = await speechSynthesizer.SpeakSsmlAsync(responseSsml);
Python:
responseSsml = " \
<speak version='1.0' xmlns='http://www.w3.org/2001/10/synthesis' xml:lang='en-US'> \
<voice name='en-GB-LibbyNeural'> \
{} \
<break strength='weak'/> \
Time to end this lab! \
</voice> \
</speak>".format(response_text)
speak = speech_synthesizer.speak_ssml_async(responseSsml).get()
๐ง Best practice / Considerations
- Ensure correct region and key in configuration
- Use neural voices for more natural synthesis
- Use SSML to control prosody, pauses, and emphasis
- Provide fallback for missing microphone by using audio file
- Handle recognition result errors (No match, Cancelled)
- Use await/.get() as appropriate for async methods
โ Exam-style questions
Q1. How do you configure speech recognition using the default microphone in Python? - ๐ speech_sdk.SpeechRecognizer(audio_input="mic") - ๐ speech_sdk.AudioConfig(use_default_microphone=True) โ - ๐ AudioConfig.FromWavFileInput("mic.wav") - ๐ SpeechSynthesizer(speech_config, audio=True)
Q2. What must you do to synthesize speech with a custom voice in Azure AI Speech? - ๐ Change your subscription tier - ๐ Use SSML - ๐ Set SpeechSynthesisVoiceName in the config โ - ๐ Install a separate neural voice package
Q3. Where do you find the key and region for your Azure AI Speech resource? - ๐ Azure AI Studio - ๐ Speech Studio โ Deployment tab - ๐ Keys and Endpoint page in the Azure Portal โ - ๐ Billing page in Azure