Image recognition with Tensorflow: Model components

Adapted from Tensorflow Image Classification Walkthrough.

This is the first part of the Image Recognition workshop series, where we will be making a first pass at a model to differentiate flower types by images.

Machine learning and deep learning

Machine learning (ML) and deep learning (DL) are tools to made predictions. We train models up on data; they become good at making predictions on those data, and then we feed them new information and see how they do on the new information.

../../_images/machine-vs-deep-learning.jpg

We use ML and DL to model complex systems with complicated relationships. If your data is relatively simple, machine learning and deep learning are likely not the best choice; will take longer and may do worse than more simple models.

Setup

First, we import tensorflow and other libraries that we will be using for the analysis. These libraries contain various tools not found in base Python that we will need.

Library Info

matplotlib is used for creating figures. Numpy is used for mathematical operations. PIL is used to visualize image files (jpg, png, etc.).

We will be using several modules within tensorflow. keras is a neural network framework built into tensorflow, and we will relying on it for the models we make.

tensorflow makes use of tensors: multi-dimension arrays of data. If you are familiar with numpy arrays, tensors are similar, but optimized for deep learning.

import matplotlib.pyplot as plt
import numpy as np
import PIL
import tensorflow as tf

from tensorflow import keras
from tensorflow.keras import layers
from tensorflow.keras.models import Sequential

Our first step is to download the data set and decompress it. The data contains 5 subdirectories for our flower types: daisy, dandelion, roses, sunflowers, tulips.

File pathing and downloads

import pathlib

pathlib allows us to use and interact with file paths.

dataset_url = "https://storage.googleapis.com/download.tensorflow.org/example_images/flower_photos.tgz"

The data we are using is directly from tensorflow. It is located online in a compressed file format. We create a string variable dataset_url that points at the URL where the file is stored. This file can be uncompressed into a folder containing our image files.

data_dir = tf.keras.utils.get_file('flower_photos', origin=dataset_url, untar=True)

Here, we are using a function we imported. Inside of tf, there is a submodule called keras, which itself contains a submodule called utils. We are using a function called get_file() from utils. This function will download the compressed data at our URL (what we feed into origin). flower_photos is name we’ll give to the folder when we uncompress. untar=True makes sure that we uncompress the file right when we download.

We save the path where we saved the folder as data_dir.

data_dir = pathlib.Path(data_dir)

We finally convert the path to folder to a pathlib.Path object, which will allow us easy access to the files in those folders.

import pathlib
dataset_url = "https://storage.googleapis.com/download.tensorflow.org/example_images/flower_photos.tgz"
data_dir = tf.keras.utils.get_file('flower_photos', origin=dataset_url, untar=True)
data_dir = pathlib.Path(data_dir)

Let’s display our sample size to see how many images we’re working with.

Pathing

Because data_dir is a pathlib.Path object, we use its .glob() method (stands for global) to find files with that directory. Inside of the folder we downloaded, there are 5 folders, one for each type of flower. .glob('*/*.jpg') means “from all subfolders (*/), grab all files that end in ‘.jpg’. The”*” is a wildcard that match any number of characters.

.glob() gives us all the file names in a generator. We convert it to a list to make it easier to handle, and then we take the len to see how many files there are in all of our folders.

image_count = len(list(data_dir.glob('*/*.jpg')))
print(image_count)
3670

We use the PIL library we imported to open and view images.

Reading image paths

Here, instead of looking at all the files with (*/*.jpg), we pick only the files with roses in them with roses/*.jpg.

PIL.Image.open() displays an image file, given a file path. Here, we take the path to first rose file, and cast the path as a string to be usable by the PIL function.

roses = list(data_dir.glob('roses/*.jpg'))
print(len(roses))
PIL.Image.open(str(roses[0]))
641
../../_images/rose0.png

Here, we take a look at another rose.

PIL.Image.open(str(roses[1]))
../../_images/rose1.png

We can do the same for the tulips images.

tulips = list(data_dir.glob('tulips/*.jpg'))
PIL.Image.open(str(tulips[0]))
../../_images/tulip0.png
PIL.Image.open(str(tulips[1]))
../../_images/tulip1.png

Let’s breakdown our data by type of flower to see if we have imbalanced data -> more of some types of flowers than others.

If we have an overwhelming amount of one type, our model will primarily be trained on that type, which will make differentiating between types difficult.

daisy = list(data_dir.glob('daisy/*.jpg'))
sunflowers = list(data_dir.glob('sunflowers/*.jpg'))
dandelion = list(data_dir.glob('dandelion/*.jpg'))

print('roses:',len(roses))
print('tulips:',len(tulips))
print('daisy',len(daisy))
print('sunflower',len(sunflowers))
print('dandelion',len(dandelion))
roses: 641
tulips: 799
daisy 633
sunflower 699
dandelion 898

Load data as a keras dataset

While our data is accessible to Python, we need to do some extra steps to make it usable in tensorflow. For instance, we’ll need to make sure all images have the same height and width.

We then will split our data into two subsets: training and validation. The training subset is used to construct the model, and the validation subset is used to see how well our model performs.

../../_images/train-and-test-1-min-1.webp

Training-validation split

For our purposes, we are going to use the function image_dataset_from_directory() from tf.keras.utils. We first call this function with subset="training" to grab the training set.

We make the training-validation split 80-20 to make sure we use most of the data for training, but that we still have enough left over for validation. We specify the split with validation-split.

We also are going to specify a batch size, which helps with loading images into memory. For us, 32 images will be loaded in at once.

We are going to reformat our images to be square: 180x180. This pipeline requires that all images be identical in size and shape. This does mean we will squish images that were not square already, and we lose some pixel density on larger images. We specify this in the argument image_size as a tuple.

batch_size = 32
img_height = 180
img_width = 180

train_ds = tf.keras.utils.image_dataset_from_directory(
data_dir,
validation_split=0.2,
subset="training",
seed=123,
image_size=(img_height, img_width),
batch_size=batch_size)
Found 3670 files belonging to 5 classes.
Using 2936 files for training.

We can run the same command again to get the validation set. Beyond specifying subset="validation", make sure to keep the parameters validation_split and seed, the same as for the training set to ensure the split is complementary to the training set. image_size and batch_size should also be kept the same for consistency.

val_ds = tf.keras.utils.image_dataset_from_directory(
data_dir,
validation_split=0.2,
subset="validation",
seed=123,
image_size=(img_height, img_width),
batch_size=batch_size)
Found 3670 files belonging to 5 classes.
Using 734 files for validation.

Let’s save the names of our flowers (our class names), and print them out using the .class-name attribute.

class_names = train_ds.class_names
print(class_names)
['daisy', 'dandelion', 'roses', 'sunflowers', 'tulips']

Now that we’ve loaded our data into tensorflow, let’s visualize it again after re-sizing images.

Visualizing images with matplotlib

While we can visualize our images in a similar way to before, here we use matplotlib to organize our images into a 3x3 grid.

First, we define a 10x10 figure.

plt.figure(figsize=(10, 10))

We can grab a single batch from our data with train_ds.take(1). Converting to a list lets us pull out individual images and labels.

images, labels = list(train_ds.take(1))[0]

We iterate over a range of 9 for our 9 total images.

ax = plt.subplot(3, 3, i + 1) makes sure we are plotting on the right set of axes for our image.

for i in range(9):
    ax = plt.subplot(3, 3, i + 1)

plt.imshow() displays image data. We take our images and convert them into sets of integers from 0 to 255, which plt.imshow() needs to display images. The significance of this range is discussed in greater detail below.

plt.imshow(images[i].numpy().astype("uint8"))
plt.title(class_names[labels[i]])
plt.axis("off")
plt.figure(figsize=(10, 10))
images, labels = list(train_ds.take(1))[0]

for i in range(9):
ax = plt.subplot(3, 3, i + 1)
plt.imshow(images[i].numpy().astype("uint8"))
plt.title(class_names[labels[i]])
plt.axis("off")
../../_images/matplotlib_gallery.png

Image batching

Each batch has 32 images, 180x180 pixels, with RGB data.

Each image has an accompanying label, as well.

for image_batch, labels_batch in train_ds:
    print(image_batch.shape)
    print(labels_batch.shape)
    break
(32, 180, 180, 3)
(32,)

We’re also going to configure the dataset for performance.

dataset.cache() keeps images in memory so that we don’t need to load them each epoch. dataset.prefetch() prepares images ahead of time while current image is being worked on. Uses additional memory.

AUTOTUNE = tf.data.AUTOTUNE

train_ds = train_ds.cache().shuffle(1000).prefetch(buffer_size=AUTOTUNE)
val_ds = val_ds.cache().prefetch(buffer_size=AUTOTUNE)

Image data and normalization

You can think of image data as a series of numerical values that are interpreted to create something visual.

Here is a simplification, where we have a 2D array of zeros and ones. Zero is interpreted as black (no color), and one is interpreted as white (max color).

../../_images/bitmap.png

In reality, images don’t just contain black and white pixels: they have pixels that values for red, green, and blue (RGB) at different intensities.

Each pixel has 3 values for RGB intensity, combining to look like a single color to our eyes.

../../_images/rgb_colors.png

The intensity values go from 0 through 255, which we can see in our own data by looking at the maximum and minimum values of an image.

first_image = image_batch[0]
print(np.min(first_image), np.max(first_image))
0.0 255.0

Neural networks like input values to be small, so we transform them to be between 0.0 and 1.0.

../../_images/three_d_array.png

Here, we test this out by create a normalization layer and then check to make sure the normalization works.

normalization_layer = layers.Rescaling(1./255)
normalized_ds = train_ds.map(lambda x, y: (normalization_layer(x), y))
image_batch, labels_batch = next(iter(normalized_ds))
first_image = image_batch[0]
# Notice the pixel values are now in `[0,1]`.
print(np.min(first_image), np.max(first_image))
0.0 1.0

Building a model

Now that we gone through our preprocessing workflow, we are going to construct a basic Keras model, which contains several layers. Layers take information, process them in some way, and then pass the output on to different layers.

We are going to build a sequential model, which puts layers in a defined order, and feeds data through the layers in that order. Each layer will have a single tensor input and a single tensor output.

../../_images/multi_layer_model.jpg

We are starting our basic model with the following layers:

  • Rescaling layer: works like above example. Our data contains 3 dimensions: x position, y position, and RGB channel

  • Flatten layer: removes dimensional component into a single dimension; only reformats our data

  • Dense layer: a layer that is fully connected to the previous layer.

    • Our Dense layer has 32 neurons or nodes. Every node is receives information about all pixels

    • Use relu activation. Activation functions determines how strongly each neuron “fires” -> to what degree each node gets used to make predictions

  • ends with another Dense layer with 5 nodes, one for each class. It will contain the odds of the images being each flower type.

num_classes = len(class_names)

model = Sequential([
layers.Rescaling(1./255, input_shape=(img_height, img_width, 3)),
layers.Flatten(),
layers.Dense(32, activation='relu'),
layers.Dense(num_classes)
])

We then compile our model with model.compile(), adding in a few more important options.

Loss is how the training process determines how well it is doing. We want loss to be as close to zero as possible. There are many possible loss functions; here we use one called sparse categorical cross entropy.

Our optimizer tries to decide how to make changes to our model to decrease loss.

In the example below, the optimizer is trying to find the lowest point on the parabola. It tries to take larger steps when it’s far away from the minimum, and smaller steps when it’s near.

If it takes steps that are too large, however, the model may have a hard time finding the minimum loss due to overshooting.

../../_images/gradient-descent-learning-rate.png

Reality is more complicated than this simple case. Here we show a more complicated gradient. It contains many places for the minimization process to get stuck (local minima). Therefore, making sure our step size is large enough to get out of local minima is also important.

../../_images/gradient-descent-3d.png

We also will keep track of the accuracy of our model. This is the proportion of images that the model correctly classifies. The model does not use this information; it is purely for us, the users.

Loss and accuracy metrics

We use a type of loss called sparse categorical cross entropy. However, there are many different kinds of loss we can use.

There are also different metrics we can track besides accuracy, as well. If we add them to the list, we can track multiple metrics at the same time.

We use an optimizer called “Adam” commonly used in neural networks. Other optimers can be usedm as well.

model.compile(optimizer='adam',
            loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
            metrics=['accuracy'])

We can print model summary, which shows our layers and how many parameters we have for each layer.

model.summary()
Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #
=================================================================
 rescaling_1 (Rescaling)     (None, 180, 180, 3)       0

 flatten (Flatten)           (None, 97200)             0

 dense (Dense)               (None, 32)                3110432

 dense_1 (Dense)             (None, 5)                 165

=================================================================
Total params: 3,110,597
Trainable params: 3,110,597
Non-trainable params: 0
_________________________________________________________________

We are going to run the model for 10 epochs. An epoch is one iteration through the model pipeline where the model can adjust itself throughout. This means that we will pass our entire data set through our model 10 times. After the first epoch, future epochs will build upon the model created in prior epochs and refine it to minimize the loss.

Here, we use model.fit() to actually fit the model that we have defined. We will call the output of the model fitting history, as it will store a record of the fitting process over time.

epochs=10
history = model.fit(
  train_ds,
  validation_data=val_ds,
  epochs=epochs
)
Epoch 1/10
92/92 [==============================] - 1s 7ms/step - loss: 3.8806 - accuracy: 0.2016 - val_loss: 1.6087 - val_accuracy: 0.2398
Epoch 2/10
92/92 [==============================] - 0s 4ms/step - loss: 1.6073 - accuracy: 0.2459 - val_loss: 1.6065 - val_accuracy: 0.2398
Epoch 3/10
92/92 [==============================] - 0s 4ms/step - loss: 1.6051 - accuracy: 0.2459 - val_loss: 1.6048 - val_accuracy: 0.2398
Epoch 4/10
92/92 [==============================] - 0s 4ms/step - loss: 1.6034 - accuracy: 0.2459 - val_loss: 1.6036 - val_accuracy: 0.2398
Epoch 5/10
92/92 [==============================] - 0s 4ms/step - loss: 1.6022 - accuracy: 0.2459 - val_loss: 1.6028 - val_accuracy: 0.2398
Epoch 6/10
92/92 [==============================] - 0s 4ms/step - loss: 1.6014 - accuracy: 0.2459 - val_loss: 1.6023 - val_accuracy: 0.2398
Epoch 7/10
92/92 [==============================] - 0s 4ms/step - loss: 1.6009 - accuracy: 0.2459 - val_loss: 1.6021 - val_accuracy: 0.2398
Epoch 8/10
92/92 [==============================] - 0s 4ms/step - loss: 1.6005 - accuracy: 0.2459 - val_loss: 1.6019 - val_accuracy: 0.2398
Epoch 9/10
92/92 [==============================] - 0s 4ms/step - loss: 1.6003 - accuracy: 0.2459 - val_loss: 1.6019 - val_accuracy: 0.2398
Epoch 10/10
92/92 [==============================] - 0s 4ms/step - loss: 1.6002 - accuracy: 0.2459 - val_loss: 1.6017 - val_accuracy: 0.2398

We can visualize the results of our model in matplotlib, looking both at the training and validation sets. For each we look at accuracy, as well as loss.

Here is an example of how we want our plot to look:

../../_images/good_training.png

Here, the validation accuracy slowly increases to be around 75%. It is a little smaller than the training accuracy, because we are always more accurate on the data that the model has already seen than on new data.

When we look at training and validation loss, the absolute values are less important. However, we want to see loss decrease as we train the model. Smaller loss is better. We will see validation loss be larger than training loss, similar to how validation accuracy is always smaller than training accuracy.

Model history

We saved the record of the fitting process and the resulting model in a variable called history. This variable has an attribute .history, which is a dictionary containing information about our fitting.

For instance, history.history['accuracy'] contains the training accuracy across our epochs, while history.history['val_accuracy'] contains the validation accuracy. Likewise, history.history['loss'] is the training loss, and history.history['val_loss'] is the validation loss.

We then plot each one in their own subplots.

acc = history.history['accuracy']
val_acc = history.history['val_accuracy']

loss = history.history['loss']
val_loss = history.history['val_loss']

epochs_range = range(epochs)

plt.figure(figsize=(8, 8))
plt.subplot(1, 2, 1)
plt.plot(epochs_range, acc, label='Training Accuracy')
plt.plot(epochs_range, val_acc, label='Validation Accuracy')
plt.ylim(0, 1)
plt.legend(loc='lower right')
plt.title('Training and Validation Accuracy')

plt.subplot(1, 2, 2)
plt.plot(epochs_range, loss, label='Training Loss')
plt.plot(epochs_range, val_loss, label='Validation Loss')
plt.legend(loc='upper right')
plt.title('Training and Validation Loss')
plt.show()
../../_images/model1_training.png

Our model isn’t performing particularly well. Next time, we will go over ways to fix it to be more accurate.

Homework: TensorFlow Playground

Using TensorFlow Playground, create the best model possible for the spiral data set.

We will be judging models based on their test loss and the number of epochs it takes to get that loss.

You should experiment with using different features, different numbers of nodes and layers, and other settings to create the best model.