
This post deals completely with how Apple has allowed its developers to integrate Machine Learning into their apps, be it iOS, watchOS, macOS, or tvOS. Apple has introduced a few frameworks namely – Vision, NLP, and CoreML. Apple intends its developers to use these frameworks for Integrating Machine Learning with less consideration on the working of the frameworks. Apple also indicates that the developers need not necessarily be masters of Machine Learning to integrate it. Rather, the user experience which the developers try to achieve using the app needs to concentrate more.
Basically, machine learning is used to make machines, in our case apps, acquire the facility to perform activities at real time without being programmed but using their own “brains”. It consists of making the machine learn and then use the learned properties and behaviours in real time. What does it mean for apps? It brings in a variety of uses and new features for the apps – starting from predicting the next word in the keyboard to analysing a real-time video and the objects present in it. This was shown in the WWDC session on Machine Learning. For a few more examples –
- In Photos – People recognition, scene recognition
- In keyboard – next word prediction, smart responses
- In Apple Watch also, these can be applied – smart response, handwriting prediction
- Image recognition in real time
- Creating new contents
The process of making a machine learn involves 2 steps – Training, and Inference.
Training involves – learning the algorithm, note the possible behaviours and make a collection of all the behaviours in one entity. In the case of Machine Learning in iOS – Apple has introduced the concept of Model – MLModel. A model is the collection of all the behaviours, properties, features, and whatever you think of, into a single entity.
Models play the pivotal role here. Machine Learning in iOS cannot be without the Models. Model can be thought of as the result of training a machine, or set of code. Model consists of all the functions that can be used in the next step – Inference.
Inference involves – passing an object as an input to the model and getting the result – this result is to be used in our apps in the way want it to be.
This process of training and inference is more of an empirical form, i.e, observation and inference rather than theoretical representation.
As already said, Apple insists its developers to concentrate more on inference, than on training because constructing models can be a huge task and there are a lot of teams contributing to generate models. Possibly, we can get the models of the type we want. If we don’t get it, then we can make our models.
As described in the session videos – Machine Learning is followed in a layered approach – the top layer is the App where the user experiences the machine learning concept
The app accesses the Vision, NLP, GamePlayKit frameworks which form the second layer – domain specific frameworks. Vision is used for processing image, video, audio – like face detection, tracking objects, etc. NLP relies more on text processing, identifying languages, etc. Most of the basic machine learning features can be accessed using these domain specific frameworks.
Although, there are certain features which do not fall in both the categories – these are achieved using the third layer – CoreML. This is also a new framework added by Apple, which is the base machine learning framework involving both deep and standard learning processes. It takes input in the form of numbers, text, image, etc and can be used for captioning an image, describing a live video, etc.
These three frameworks work by taking in the input, processing the input using the available models and producing the result. Where Vision and NLP use the models provided by Apple, CoreML uses any model.
All these frameworks are built on top of Accelerate and MPS frameworks, which form the fourth and final layer. These can be used for math operations, and also to create custom models. As Apple insisted, we need not concentrate on them. Rather, the previous frameworks will do most of the work.
Speaking of Models, there are a variety of types – predicting text, Sentiment analysis, music tagging, handwriting recognition, style transfer, scene classification, translation, etc. The MLModel supports tree ensembles, Feed forward neural networks, recurrent neural networks, generalised linear models, vector machines, convolutional neural networks. Where do we get the models? Apple has provided 4 models in their site – https://developer.apple.com/machine-learning/ and more are to come. To create our own custom models, Apple has introduced CoreML Tools.
- It is a python package.
- Takes models and converts them to MLModels
- It has got 3 layers – Convertors, Core ML bindings & Converter Library, and Core ML specification
- Converters – convert models from other formats to the one in the form which CoreML accepts
- Core ML bindings – get the prediction and the results from Python package for the models
- Converter Library – this is a high-level API used in building converters
- Core ML specification – writing new models on our own.
Now, enough of this theory. Let’s get started on how to achieve a basic machine learning feature in the app.
Our aim here is to provide an image and get a description of the image using the model provided by Apple – Inceptionv3
Achieving machine learning is super-simple and involves just 2 steps:
- Include the model you want and add it to the app’s target.
- Code
You may consider creating a new model as a step prior to including it in the app. When the model is dropped in the Xcode project, Xcode detects it as an MLModel once we add it to our app’s target. And then, we can start coding!!
We are trying to use the Inceptionv3.mlmodel which is available in the above website. Once the model is included in the app and added as target, it looks like:
The app contains an imageView, a label describing the image, and a button to choose an image.
What we aim is achieved by the following code:
let model = Inceptionv3() </span>if let prediction = try? model.prediction(image: image as) {//Make sure the image is in CVPixelBufferFormat <span style="font-weight: 400;"> descriptionLabel.text = prediction.classLabel </span><span style="font-weight: 400;">} else { </span> descriptionLabel.text = "Oops. Error in processing!!" <span style="font-weight: 400;">}</span>
If you think it takes a hard time to convert UIImage to CVPixelBuffer and then doing the above step, we have another approach.
First, we need to import the required frameworks:
import CoreML
import Vision
Then the following needs to be done:
- Get the model from VNCoreMLModel
- Create request by providing the obtained model
- In the completion of the request, using the response, obtain the results in the form [VNClassificationObservation]
- The first of this result array gives the top result – which has identifier – description, and confidence – probability of the description matching the image
<span style="font-weight: 400;">if let model = try? VNCoreMLModel(for: Inceptionv3().model) { //Get the model </span> let request = VNCoreMLRequest(model: model) { [weak self] response, error in //Create a request using the model if let results = response.results as? [VNClassificationObservation], let topResult = results.first { //Using the response, get the result DispatchQueue.main.async { [weak self] in self?.descriptionLabel.text = "\(Int(topResult.confidence * 100))% it's \(topResult.identifier)” //Update the label <span style="font-weight: 400;">} </span> } } <span style="font-weight: 400;">//The following is to perform the request</span> <span style="font-weight: 400;"> let handler = VNImageRequestHandler(ciImage: CIImage(image: imageView.image!)!)</span> <span style="font-weight: 400;"> DispatchQueue.global(qos: .userInteractive).async {</span> <span style="font-weight: 400;"> </span> <span style="font-weight: 400;"> do {</span> <span style="font-weight: 400;"> </span> <span style="font-weight: 400;"> try handler.perform([request])</span> <span style="font-weight: 400;"> </span> <span style="font-weight: 400;"> } catch {</span> <span style="font-weight: 400;"> </span> <span style="font-weight: 400;"> print(error)</span> <span style="font-weight: 400;"> </span> <span style="font-weight: 400;"> }</span> <span style="font-weight: 400;"> </span> <span style="font-weight: 400;"> }</span> <span style="font-weight: 400;"> }</span>
And then, run!
On giving a few of the default images present in the simulator and a few other common images – we get the following output. Have a look at it and enjoy! You can also develop the sample project from here
–
Sriram K,
Junior iOS Developer,
Mallow Technologies.