Speech Recognition API – iOS 10

In this blog post we are going to see about the introduction of Speech recognition API and how to incorporate it in iOS 10. Speech recognition API was introduced in iOS 10 to analyse speech and convert it into texts. It can be used in mobile apps where the user needs to type a lot. The apps which lets the user write notes, send email – text messages and search can use this which in turn saves a huge amount of time for the user. Some of the popular apps like dictionary.com uses speech recognition technology to search words from the database. Similarly app like Vlingo uses search, messaging, voice dialling and directions from the speech recognition. Also the social media platforms will also uses it as a major tool.

The API uses two types of input: 1. Live audio spoken by the user (for example if an user says “Cheese” in a camera app to take a picture). 2. Prerecorded audio files(for example a song can be given as an input and the lyrics can be displayed). This API allows the app to work completely based on the user’s voice input like Voice Navigation (Maps), Voice purchasing (shopping online) etc., It uses SFSpeechRecognizer framework to convert the speech into text. The audio input will be given to the SFSpeechRecognizer and it then converts the audio into transcribed text. Apple has mentioned that it has a support to as major as 50 languages. but to avail few languages internet connectivity is needed.

Another good thing is the usage of NSLinguisticTagger. It automtically recognises English text, although it supports many language it is used widely to classify the words into verbs, noun and adjective so that we can concentrate only on the specific words needed.

Let us now delve into the process of Speech Recognition.

Process of Speech Recognition: 

  1. Explain why we need speech recognition in our info.plist file NSSpeechRecognitionUsageDescription key.
  2. Request authorisation from the user using SFSpeechRecognizer.requestAuthorization

//Sample code:

SFSpeechRecognizer.requestAuthorization { authStatus in
OperationQueue.main().addOperation {
switch authStatus {
case .authorized:
//Go further with the process
case .denied:
// Print corresponding error message
case .restricted:
// Print corresponding error message
case .notDetermined:
// Print corresponding error message
}
}
}

3.Create recognition request using SFSPeechURLRecognitionRequest for pre-recorded audio   and SFSPeechAudioBufferRecognitionRequest for live audio.

//Sample code:

let recognitionRequest = SFSpeechAudioBufferRecognitionRequest() // Live audio
let recognitionRequest = SFSpeechURLRecognitionRequest(url: path) // Prerecorded audio

4.Give the created request to recogniser using SFSpeechRecognitionTask to get the result in text.

//Sample code:

recognitionTask = speechRecognizer.recognitionTask(with: recognitionRequest) { result, error in
if let result = result {
print(result.bestTranscription.formattedString) // Transcribed text
}
}

Rajtharan G,
iOS Junior Developer,
Mallow Technologies.

1 Comment

  1. flashscribe ai

    Great share! Thanks for the informational insights.

Leave a Comment

Your email address will not be published. Required fields are marked *