Powering an iOS App with ML: How to Get Started Using Create ML and Core ML

Publikováno v Coder stories

14. 3. 2019

14 min.

Powering an iOS App with ML: How to Get Started Using Create ML and Core ML
Vincent Pradeilles

iOS developer @ equensWorldline

All the resources mentioned in this article can be downloaded from this GitHub repository.

Machine learning is currently one of the hottest topics in the tech industry. Being able to automate tasks that previously relied on human operators opens the door to many innovative and exciting possibilities.

Because machine learning algorithms require a fair amount of computing power, it used to be that they could only be implemented on powerful back ends. But as handheld devices became embedded with increasingly powerful central processing units (CPUs), implementing machine learning on them started becoming a viable option. Smartphone manufacturers acted on this opportunity. They started regularly releasing tools that enable developers to integrate machine learning models within their apps. And, as it turns out, being able to run a model directly on a smartphone has some really interesting potential: A user’s private data doesn’t need to leave their device and network-related latency is not a concern. This opens up a host of new use cases, such as the real-time analysis of a video feed.

In 2017, Apple released Core ML, a framework that facilitates the integration of machine learning models into iOS apps. However, iOS developers found that, when they wanted to fully experiment with Core ML, there was an obstacle: To train a model, they had to use unfamiliar tools and languages such as Python. But in 2018, things changed—Apple released Create ML. The purpose of Create ML is very simple: It allows iOS developers to train simple models, using the same tools and language they use to builds apps. The only requirement is that they install macOS Mojave and Xcode 10.

In this article we will show you the steps we followed when experimenting with machine learning on iOS using Core ML and Create ML, highlighting the challenges and limitations we faced, and how they were overcome or turned to our advantage.

Image classification

If you were to ask what the most basic use case for machine learning could be, there’s a good chance the answer you’ll get is image classification. The principle is simple: predicting the predefined category an image should be classified under. In this case, we trained a model to classify pictures using 2 categories–cats and dogs.

To begin, we needed data. In the directory “image_classification” were 1,400 images—700 of dogs and 700 of cats—with 80% of them separated into a “training” directory, those being the ones used to actually train the model, and the remainder in a “test” directory, whose purpose was to evaluate the quality of the model. Inside both those directories were two other directories, named “dog” and “cat,” which contained the actual images.

This naming convention plays a big part in what makes Create ML easy to use: We didn’t need to produce any kind of ad hoc metadata to indicate the category of each picture, as it is done entirely through the name of the directories.

Apple claims that Create ML is able to train a model using as few as 10 images, but of course, the larger the dataset, the better the quality of the model. Note that, in cases where data is scarce, Create ML offers the possibility to extrapolate a dataset by generating flipped or rotated versions of the original images. While not as good as new data, this trick can have a positive impact on training quality.

So how did we start training our model? First, we opened Xcode and created a new playground. When asked, we made sure to select both “macOS” and “Single View.” Then we deleted all the content in the playground and pasted in the following code:

import CreateMLUI let builder = MLImageClassifierBuilder()builder.showInLiveView()

We then ran the playground and opened the Assistant Editor. When you do this, you should see the below interface.


As you can see on the right, you are asked to “Drop Images To Begin Training.” To do so, just drag and drop the “training” directory inside the dotted area. As soon as that’s done, CreateML starts working on the images.

To get more details on the training process, you can open the Console.

Extracting image features from full dataset.Analyzing and extracting image features.+------------------+--------------+------------------+| Images Processed | Elapsed Time | Percent Complete |+------------------+--------------+------------------+| 1                | 1.40s        | 0%               || 2                | 1.46s        | 0%               || 3                | 1.51s        | 0.25%            || 4                | 1.57s        | 0.25%            || 5                | 1.64s        | 0.25%            || 10               | 1.96s        | 0.75%            || 50               | 4.48s        | 4.25%            |

So what’s happening here? To understand it, we need to give you some information on how Create ML works.

Create ML uses a technique called “transfer learning.” The idea behind it is pretty simple: Inside Core ML, there is a generic model that has been trained on an enormous amount of images. This generic model is very good at classifying images, because it has learnt which features it must look at within an image in order to classify it. So when we provided Create ML with our cat and dog images, the first thing it did was extract those same features from the images.

It’s important to note that, while a powerful technique, transfer learning has its limitations. Namely, it only works when the data used to re-train a model shares the same kind of features as the original data. If, say, you take a model trained to identify daily-life objects and attempt to re-train it using images of celestial bodies, there’s a good chance you’ll end up with poor results.

This process is pretty heavy on resources, and on our computer it took a little longer than a minute to complete.

Create ML then moves on to the second part of the training process: Logistic regression. The name sounds complicated, but its principle is also quite easy to understand. Once Create ML has extracted the relevant features from the images, it must figure out how to use them to classify an image under the possible categories. To make an analogy, it needs to find where to draw the line that will separate, in our case, dog images from cat ones. This process is much quicker and shouldn’t last more than a few seconds.

Once this is all over, Create ML presents you with a report card on how the training went.


You can see that there are two metrics here: “Training” and “Validation.” Training is pretty simple—it indicates how successful the model was using its own training set. Here, our model made a correct prediction 99% of the time when presented with images from its own training set.

Validation is a bit more complex. When we started training our model, a random 5% of the training images were taken aside. These images were then used to make sure that the training process wasn’t fixating on irrelevant features to perform its classification. Here we had a 98% accuracy on the validation set, which indicated that our model mostly did its job correctly with images that were not directly part of its training process.

Below the report card, you’ll see that you are asked to “Drop Images To Begin Testing.” It is now time to use the images inside the “test” directory. To begin the testing process, you once again only need to drag and drop the directory over the dotted area. Create ML then uses the model trained on those new images to test whether it is able to correctly classify them. A detailed report in the Console follows, see below example.

Number of examples: 280Number of classes: 2Accuracy: 98.57%******CONFUSION MATRIX******----------------------------------True\Pred cat  dog  cat       138  2    dog       2    138  ******PRECISION RECALL******----------------------------------Class Precision(%)   Recall(%)      cat   98.57          98.57          dog   98.57          98.57  

From this, we could see that our model was right more than 98% of the time, which was pretty nice! The confusion matrix shows us that two pictures of cats were incorrectly classified as being dogs, and vice versa.

To see which images the model failed on, you need to click on the small button indicated in the below example.


Remember that, during the training process, a validation set is chosen at random within the training set, so it’s perfectly normal if you don’t get exactly the same results as we did. Nevertheless, you should see that the misclassified images are often of poor quality (very dark or badly cropped). But you might also find more interesting cases, as with the image above. It’s easy to understand why this one has been misclassified as a picture of a cat, as the dog’s fur looks rather feline.

To conclude the training process, the only thing left to do is to save the model. This can be done by clicking the little arrow at the top of the view and then following the instructions.


Using a model

Once you have a trained model, how about trying to embed it within an iOS app?

In our case, we integrated a video stream captured directly from the camera of an iPhone within our model, in order to predict in real time whether the device was pointed at a dog or at a cat.

The code needed to make this work consists mostly of boilerplate, so we won’t go into all of its details here. You only need to open the project inside the directory “CatDogClassifier” to see that everything was already implemented. When you read through the code, you’ll understand that every time the video feed produces a frame, our model predicts its category, displaying back the result along with the confidence of the prediction.

Before you start running the app, a word of advice: Remember that, when we trained our model, it learnt to classify images between the two categories of “cat” and “dog.” This means that our model lives in a universe where any image is either of a cat or a dog, and no other kind of image exists. So don’t be alarmed if you point your phone at a pen and see that the model classifies it as a dog. It doesn’t mean that the model is broken, but only that, according to its inner logic, your pen looks more like a dog than a cat.

If you have pets at home, you can of course point your phone at them and try to see if the model recognizes them correctly. But if you don’t, try the second-best solution: Display a picture from the testing dataset on the screen of your computer and have a look at how well the model classifies them.

Here you can see how it went when we tried it.

As long as the images offered good-enough quality, the model classified them correctly with a very high level of confidence. On the other hand, when the quality of the image dropped, as it did with the fourth image, the quality of the prediction followed the same path. This illustrates one of the most important principles of machine learning: “Garbage in, garbage out.”

Going further

Our model was able to quickly achieve some convincing results, so we wondered how we could apply the principle to a fully fledged app.

Image classification can actually be applied to a broad variety of topics, so there should be no shortage of use cases. You could go for something fun and create a game where players compete to take pictures of as many different kinds of objects as possible in a limited time.

You could also use it to make existing processes easier. For instance, a car-renting app could use Core ML when a user takes a picture of damage to his car to predict which part of the car is in the picture and prefill the corresponding form fields.

Another use some have suggested is the improvement of the accessibility of apps for visually impaired users. Through image classification, it becomes possible to predict the content of an image and use this prediction to offer a textual description.

The important thing to bear in mind in such cases is that predictions must only be used when we are confident they will improve the user’s experience. It’s always better to discard a potentially incorrect prediction that a model produced with low confidence, rather than forcing it onto the user and letting them deal with the unintended consequences.

Equally crucial to bear in mind is how important the quality of your dataset is. It is the key input that the learning algorithms will be relying on, so the better-quality data it contains, the better the results will be. Once again: “Garbage in, garbage out.”

Sentiment analysis

As we’ve seen, machine learning is a very effective technique for performing image classification. But what happens when you try to move from images to text?

You might instinctively think that running a classification algorithm on text is a simpler problem than running it on images. In this second example of how Create ML can be leveraged, we attempted to perform a sentiment analysis, taking our data from a CSV file that contained more than 10,000 sentences extracted from online movie reviews.


As you can see, the sentences were labelled to indicate whether they express a positive or negative sentiment and we used them to train a model that, given a sentence relating to a movie, could predict whether it expressed a positive or negative sentiment.

Once again, we ran Create ML using a playground, but this time, we didn’t have a graphical interface to guide us, so we had to write some code!

Looking at the content of the working implementation of the training process inside the directory “text_classification”, we saw the first thing we needed to do was parse the data inside our CSV file (if you want to take a look at it, it’s stored as a resource of the playground):

var data = try MLDataTable(contentsOf: Bundle.main.url(forResource: "movie-sentences", withExtension: "csv")!,options: MLDataTable.ParsingOptions(contains Header: true, delimiter: ";"))

Then, as with image classification, we split our dataset into two parts: “training” and “test”.

let (trainingData, testingData) = data.randomSplit(by: 0.8, seed: 5)

Finally, we trained the model using our training data.

let sentimentClassifier = try MLTextClassifier(trainingData: trainingData, textColumn: "text", labelColumn: "class")

As before, we wanted to evaluate how the training process went, so we computed the same metrics as before.

// Training accuracy as a percentagelet trainingAccuracy = (1.0 - sentimentClassifier.trainingMetrics.classificationError) * 100 // Validation accuracy as a percentagelet validationAccuracy = (1.0 - sentimentClassifier.validationMetrics.classificationError) * 100

You should find the training accuracy to be about 99%, meaning that the dataset was correctly learnt. Things should be less impressive with the validation accuracy, which is likely to stall at 75%. This shows that the model probably has trouble extrapolating what it learnt from sentences outside its training set.

To make sure, we ran the model using our test data.

let evaluationMetrics = sentimentClassifier.evaluation(on: testingData) // Evaluation accuracy as a percentagelet evaluationAccuracy = (1.0 - evaluationMetrics.classificationError) * 100

The evaluation accuracy should also fit at about 75%. This means that, when confronted with unknown sentences, our model is right about 3 times out of 4.

This might look like a good enough result—after all, 75% is well above average—but remember that we are classifying sentences between only two categories (positive and negative sentiments). Therefore, trying to guess the right classification at random would, on average, achieve an accuracy of 50%.Taking this information into account, our 75% accuracy doesn’t look particularly impressive. So how does this translate once it’s embedded inside an app?

Real-time text analysis

To save you time, you’ll find that a demo project called “SentimentAnalysis” already implemented in the GitHub repository.

Let’s review how we integrated our text-classification model.

The principle of the app is very straightforward: It features a single text field, inside which the user is invited to express his opinion about a movie. As they type, the model tries to predict the sentiment of the user’s input and displays its prediction just below the text field.

Performing the prediction in code does not pose any particular challenge. We began by instantiating the prediction model.

let sentimentPredictor = try NLModel(mlModel: MovieReview().model)

Then we used it to perform the prediction, as below.

let sentimentPredictionOutput = sentimentPredictor.predictedLabel(for: predictionInput)

Here, you’ll notice one limitation when comparing this model with image classification: It is not able to provide us with the confidence with which it made its prediction.

All that is left to do is display the information back to the user.

let predictionIsPositive = sentimentPredictionOutput == "Pos"let displayableResult = predictionIsPositive ? "positive 👍" : "negative 👎"self.sentimentLabel.text = "It seems that your opinion is \(displayableResult)"

Now, let’s look at how the app behaves.

You’ll notice that, even though the model got it right 3 out of 4 times, it also failed on a sentence that, to a human, felt very easy to classify correctly.

As mentioned earlier, text classification is far from being simple. Here, sentiment analysis proved itself to be a rather tricky topic, on which our learning process performed much worse than it did when we were classifying images.

However, this does not mean that text classification cannot bring value to your app. Models not being able to make super-accurate predictions might not be a problem if we only use their predictions to offer discreet suggestions. For instance, a messaging app might rely on text classification to populate a content-aware suggestion of emojis. When the suggestion makes sense, users are happy to have the right emoji at their disposal, but when a suggestion fail to be relevant, the user probably doesn’t pay that much attention to it, and the failure mostly goes unnoticed.

Using Create ML for the right purpose

For iOS developers, a significant amount of effort used to be required to start experimenting with machine learning. As we’ve seen, Create ML has completely changed the game and we are now able to effortlessly train and embed models within apps, using tools and language already familiar to us.

Still, it’s important to note that, while Create ML is perfectly suited for prototyping, it might fall a little short when it comes to training production models. This is due to one very simple reason: Create ML relies on the capabilities of consumer-grade computers. If you train models in the cloud instead, you have access to more computing power.

You also need to keep in mind that machine learning is a field that requires a lot of expertise to be successfully mastered. We may have been able to experiment with it, but it is important that we do not delude ourselves: It did not make us machine learning specialists. There are a lot of machine learning algorithms out there, and they often require some fine parameter tuning in order to produce optimal results. So if you’re serious about training models for your app, do yourself a favor and seek the opinion of an expert.

Fortunately, as iOS developers, we do not need to be machine learning specialists. Most of the time, our value will be in our ability to successfully implement an innovative use case by embedding an already-trained model. As we saw, such integrations can take many forms and will most likely come with technical challenges of their own when it comes to keeping a user experience smooth.

If you want to go further with Create ML, here is the link to its full documentation.

Meanwhile, to get a better picture of all the ways a model can be integrated in an app, the documentation of Core ML makes an interesting read.

Finally, to get a glimpse of what can be achieved with more complex models, this project shows how multiple objects can be identified and tracked in real time via a video stream.


Machine learning on mobile devices is still in its infancy. On iOS, less than two years have elapsed since the initial introduction of Core ML. If you look back at what iOS apps looked like two years after the original iPhone was released, you’ll see how rough and sloppy they feel when compared to modern apps. And this makes perfect sense: As new technologies become available, it takes time for the industry to get a good grasp of how they can be efficiently leveraged.

Some popular apps have already been able to embed machine learning within their core use case. For instance, Google Translate relies on a machine learning model to perform offline translations. But there is still plenty of room left to innovate! Most apps generate a fair amount of data when users interact with them, so your apps are almost certain to hold the potential to offer innovative features powered by machine learning.

This article is part of Behind the Code, the media for developers, by developers. Discover more articles and videos by visiting Behind the Code!

Want to contribute? Get published!

Follow us on Twitter to stay tuned!

Illustration by Blok

Probíraná témata
Hledáte svou další pracovní příležitost?

Více než 200 000 kandidátů našlo práci s Welcome to the Jungle

Prozkoumat pracovní místa