# Introduction

In the previous blog post, we described one of the courses that we followed at Kaggle to learn how machine learning works and how to implement it..

Since our prototype will need support for images, we have arrived at the next chapter of the tutorials on Kaggle. This chapter is “deep learning”, which is a sub-type of machine learning that is needed to recognize patterns in images.

This blog post will describe what we did during this deep learning course, as well as give a summary of the various topics within this course. The main focus of this post is to provide an overview of the knowledge that was gained to implement into the prototype later on.

# Hands-on experience

The tutorial that we followed was completely focused on giving us a hands-on experience. After we received a tiny bit of information about what convolutions are and how computer vision works, we were tasked to start using existing models by writing code.

One of the first things that we learned, was how to use an already trained model to make predictions based on input data. In this task, we were able to use ResNet50 with pre-trained weights to predict dog breeds.

# Data augmentation

One of the tricky things about machine learning is that it requires a lot of training data to produce accurate predictions. It is usually quite difficult to obtain enough images to train the model.

To solve this issue, there is a very easy trick. The number of images in the training set can be heavily increased by flipping the image and shifting the image both up and down, as well as left and right. By doing this, the training set can expand rapidly, even with just a few images in the base training set.

# Transfer learning

Soon after we experimented with the first deep learning model, the tutorial showed us a very interesting technique called transfer learning.

When training a model from scratch, you need a lot of data to determine weights and biases and it takes a lot of time to perfect the model. This is where transfer learning comes in. Transfer learning solves this by using most of the existing weights and biases of an already trained model. This trained model can already resolve a lot of distinct features in the images such as edges and horizontal lines, which the new model also needs to use.

Instead of having to train hundreds of thousands of weights and biases, most of these are copied into the new model, after which the new model adds only one more layer that does the predictions, for which it has to train the weights and biases.

This is an excellent technique when trying to train a model that is trying to accomplish the same thing as an already existing model and it greatly reduces the time it takes to get a model up and running. It might be very interesting for us to experiment with different existing models to decrease the time it takes for our model to learn.

# Building our own model

Of course, not every new application of deep learning has been done before. Most deep learning projects simply require an entirely new model and can’t use any pre-trained weights and biases for its application. For models like this, you need to create them yourself and train them based on your own data.

In the tutorial, we trained our first model that didn’t rely on transfer learning. This model recognizes different types of clothing.

import numpy as np from tensorflow.python import keras from tensorflow.python.keras.models import Sequential from tensorflow.python.keras.layers import Dense, Flatten, Conv2D img_rows, img_cols = 28, 28 num_classes = 10 def prep_data(raw, train_size, val_size): y = raw[:, 0] out_y = keras.utils.to_categorical(y, num_classes) x = raw[:,1:] num_images = raw.shape[0] out_x = x.reshape(num_images, img_rows, img_cols, 1) out_x = out_x / 255 return out_x, out_y fashion_file = "../input/fashionmnist/fashion-mnist_train.csv" fashion_data = np.loadtxt(fashion_file, skiprows=1, delimiter=',') x, y = prep_data(fashion_data, train_size=50000, val_size=5000) fashion_model = Sequential() fashion_model.add(Conv2D(12, kernel_size=(3, 3), activation='relu', input_shape=(img_rows, img_cols, 1))) fashion_model.add(Conv2D(12, kernel_size=(3, 3), activation='relu')) fashion_model.add(Conv2D(12, kernel_size=(3, 3), activation='relu')) fashion_model.add(Flatten()) fashion_model.add(Dense(100, activation='relu')) fashion_model.add(Dense(num_classes, activation='softmax')) fashion_model.compile(loss=keras.losses.categorical_crossentropy, optimizer='adam', metrics=['accuracy']) fashion_model.fit(x, y, batch_size=100, epochs=4, validation_split=0.2)

The output from this particular piece of code was the following:

The validation accuracy ended up being 0.90, so the trained model got the prediction right roughly 90 percent of the time.

# Theory behind deep learning

After implementing our own model and improving it with a few optimization techniques, we moved on to the actual theory behind the algorithms that we had just experimented with.

During this course, we learned the basics of deep learning. In this chapter, we will summarize what we learned. We do this, so that we have to actively think about the contents of what we just learned and can understand it better.

In deep learning, you have a “neural network”. This network consists of layers of “neurons”. There’s an input layer, an N number of hidden layers (called convolutions) and an output layer. Each of the neurons in one layer, is connected to each neuron in the next layer.

Each of the connections between neurons have a specific weight attached to them, causing the value to be changed in the next neuron with an activation function, until it gets to the output layer. The final value in each neuron in the output layer represents the prediction for that specific neuron to be the correct value.

When a model is being trained, what it really does, is determine the best possible combination of weights in every connection between neurons. It does this by taking a look at the output of the network and comparing that to the validation data. The difference in expected result and actual output is called a cost function, which would ideally be 0.

Obviously, it would take a whole bunch of processing power to change each weight randomly and wait for some luck to be on your side. Instead, math is used to influence the weights.

During training, the weights are changed in small increments in each iteration. By using math, the direction of the cost function can be found. This causes the algorithm to know whether to increase or decrease the weight. By repeatedly doing this, eventually the weight ends up at the lowest cost. This technique is called gradient descent.

Eventually, after a lot of processing, the ideal weights for the network are established and the model can be used for predictions.