Image recognition with Tensorflow: Model components =================================================== Adapted from `Tensorflow Image Classification Walkthrough `__. This is the first part of the Image Recognition workshop series, where we will be making a first pass at a model to differentiate flower types by images. Machine learning and deep learning ---------------------------------- Machine learning (ML) and deep learning (DL) are tools to made predictions. We train models up on data; they become good at making predictions on those data, and then we feed them new information and see how they do on the new information. .. image:: /_static/images/machine-learning/image-recognition/machine-vs-deep-learning.jpg :width: 50% :align: center We use ML and DL to model complex systems with complicated relationships. If your data is relatively simple, machine learning and deep learning are likely not the best choice; will take longer and may do worse than more simple models. Setup ----- First, we import ``tensorflow`` and other libraries that we will be using for the analysis. These libraries contain various tools not found in base Python that we will need. .. admonition:: Library Info ``matplotlib`` is used for creating figures. Numpy is used for mathematical operations. ``PIL`` is used to visualize image files (jpg, png, etc.). We will be using several modules within tensorflow. keras is a neural network framework built into tensorflow, and we will relying on it for the models we make. ``tensorflow`` makes use of tensors: multi-dimension arrays of data. If you are familiar with numpy arrays, tensors are similar, but optimized for deep learning. .. tab:: Python .. code:: python import matplotlib.pyplot as plt import numpy as np import PIL import tensorflow as tf from tensorflow import keras from tensorflow.keras import layers from tensorflow.keras.models import Sequential Our first step is to download the data set and decompress it. The data contains 5 subdirectories for our flower types: daisy, dandelion, roses, sunflowers, tulips. .. admonition:: File pathing and downloads .. tab:: Python .. code:: python import pathlib `pathlib `_ allows us to use and interact with file paths. .. tab:: Python .. code:: python dataset_url = "https://storage.googleapis.com/download.tensorflow.org/example_images/flower_photos.tgz" The data we are using is directly from tensorflow. It is located online in a compressed file format. We create a string variable ``dataset_url`` that points at the URL where the file is stored. This file can be uncompressed into a folder containing our image files. .. tab:: Python .. code:: python data_dir = tf.keras.utils.get_file('flower_photos', origin=dataset_url, untar=True) Here, we are using a function we imported. Inside of ``tf``, there is a submodule called ``keras``, which itself contains a submodule called ``utils``. We are using a function called `get_file() `__ from `utils `__. This function will download the compressed data at our URL (what we feed into ``origin``). ``flower_photos`` is name we’ll give to the folder when we uncompress. ``untar=True`` makes sure that we uncompress the file right when we download. We save the path where we saved the folder as ``data_dir``. .. tab:: Python .. code:: python data_dir = pathlib.Path(data_dir) We finally convert the path to folder to a ``pathlib.Path`` object, which will allow us easy access to the files in those folders. .. tab:: Python .. code:: python import pathlib dataset_url = "https://storage.googleapis.com/download.tensorflow.org/example_images/flower_photos.tgz" data_dir = tf.keras.utils.get_file('flower_photos', origin=dataset_url, untar=True) data_dir = pathlib.Path(data_dir) Let’s display our sample size to see how many images we’re working with. .. admonition:: Pathing Because ``data_dir`` is a pathlib.Path object, we use its ``.glob()`` method (stands for global) to find files with that directory. Inside of the folder we downloaded, there are 5 folders, one for each type of flower. ``.glob('*/*.jpg')`` means “from all subfolders (\*/), grab all files that end in ‘.jpg’. The”\*” is a wildcard that match any number of characters. ``.glob()`` gives us all the file names in a ``generator``. We convert it to a ``list`` to make it easier to handle, and then we take the ``len`` to see how many files there are in all of our folders. .. tab:: Python .. code:: python image_count = len(list(data_dir.glob('*/*.jpg'))) print(image_count) .. tab:: Output .. code:: none 3670 We use the PIL library we imported to open and view images. .. admonition:: Reading image paths Here, instead of looking at all the files with ``(*/*.jpg)``, we pick only the files with roses in them with ``roses/*.jpg``. `PIL.Image.open() `__ displays an image file, given a file path. Here, we take the path to first rose file, and cast the path as a string to be usable by the PIL function. .. tab:: Python .. code:: python roses = list(data_dir.glob('roses/*.jpg')) print(len(roses)) PIL.Image.open(str(roses[0])) .. tab:: Output .. code:: none 641 .. image:: /_static/images/machine-learning/image-recognition/rose0.png :align: center Here, we take a look at another rose. .. tab:: Python .. code:: python PIL.Image.open(str(roses[1])) .. tab:: Output :new-set: .. image:: /_static/images/machine-learning/image-recognition/rose1.png :align: center We can do the same for the tulips images. .. tab:: Python .. code:: python tulips = list(data_dir.glob('tulips/*.jpg')) PIL.Image.open(str(tulips[0])) .. tab:: Output :new-set: .. image:: /_static/images/machine-learning/image-recognition/tulip0.png :align: center .. tab:: Python :new-set: .. code:: python PIL.Image.open(str(tulips[1])) .. tab:: Output :new-set: .. image:: /_static/images/machine-learning/image-recognition/tulip1.png :align: center Let’s breakdown our data by type of flower to see if we have imbalanced data -> more of some types of flowers than others. If we have an overwhelming amount of one type, our model will primarily be trained on that type, which will make differentiating between types difficult. .. tab:: Python .. code:: python daisy = list(data_dir.glob('daisy/*.jpg')) sunflowers = list(data_dir.glob('sunflowers/*.jpg')) dandelion = list(data_dir.glob('dandelion/*.jpg')) print('roses:',len(roses)) print('tulips:',len(tulips)) print('daisy',len(daisy)) print('sunflower',len(sunflowers)) print('dandelion',len(dandelion)) .. tab:: Output .. code:: none roses: 641 tulips: 799 daisy 633 sunflower 699 dandelion 898 Load data as a keras dataset ---------------------------- While our data is accessible to Python, we need to do some extra steps to make it usable in tensorflow. For instance, we’ll need to make sure all images have the same height and width. We then will split our data into two subsets: **training** and **validation**. The training subset is used to construct the model, and the validation subset is used to see how well our model performs. .. image:: /_static/images/machine-learning/image-recognition/train-and-test-1-min-1.webp :align: center .. admonition:: Training-validation split For our purposes, we are going to use the function `image_dataset_from_directory() `__ from `tf.keras.utils `__. We first call this function with ``subset="training"`` to grab the training set. We make the training-validation split 80-20 to make sure we use most of the data for training, but that we still have enough left over for validation. We specify the split with ``validation-split``. We also are going to specify a batch size, which helps with loading images into memory. For us, 32 images will be loaded in at once. We are going to reformat our images to be square: 180x180. This pipeline requires that all images be identical in size and shape. This does mean we will squish images that were not square already, and we lose some pixel density on larger images. We specify this in the argument ``image_size`` as a tuple. .. tab:: Python .. code:: python batch_size = 32 img_height = 180 img_width = 180 train_ds = tf.keras.utils.image_dataset_from_directory( data_dir, validation_split=0.2, subset="training", seed=123, image_size=(img_height, img_width), batch_size=batch_size) .. tab:: Output .. code:: none Found 3670 files belonging to 5 classes. Using 2936 files for training. We can run the same command again to get the validation set. Beyond specifying ``subset="validation"``, make sure to keep the parameters ``validation_split`` and ``seed``, the same as for the training set to ensure the split is complementary to the training set. ``image_size`` and ``batch_size`` should also be kept the same for consistency. .. tab:: Python .. code:: python val_ds = tf.keras.utils.image_dataset_from_directory( data_dir, validation_split=0.2, subset="validation", seed=123, image_size=(img_height, img_width), batch_size=batch_size) .. tab:: Output .. code:: none Found 3670 files belonging to 5 classes. Using 734 files for validation. Let’s save the names of our flowers (our class names), and print them out using the ``.class-name`` attribute. .. tab:: Python .. code:: python class_names = train_ds.class_names print(class_names) .. tab:: Output .. code:: none ['daisy', 'dandelion', 'roses', 'sunflowers', 'tulips'] Now that we’ve loaded our data into tensorflow, let’s visualize it again after re-sizing images. .. admonition:: Visualizing images with matplotlib While we can visualize our images in a similar way to before, here we use ``matplotlib`` to organize our images into a 3x3 grid. First, we define a 10x10 figure. .. tab:: Python .. code:: python plt.figure(figsize=(10, 10)) We can grab a single batch from our data with ``train_ds.take(1)``. Converting to a list lets us pull out individual images and labels. .. tab:: Python .. code:: python images, labels = list(train_ds.take(1))[0] We iterate over a range of 9 for our 9 total images. ``ax = plt.subplot(3, 3, i + 1)`` makes sure we are plotting on the right set of axes for our image. .. tab:: Python .. code:: python for i in range(9): ax = plt.subplot(3, 3, i + 1) ``plt.imshow()`` displays image data. We take our images and convert them into sets of integers from 0 to 255, which ``plt.imshow()`` needs to display images. The significance of this range is discussed in greater detail below. .. tab:: Python .. code:: python plt.imshow(images[i].numpy().astype("uint8")) plt.title(class_names[labels[i]]) plt.axis("off") .. tab:: Python .. code:: python plt.figure(figsize=(10, 10)) images, labels = list(train_ds.take(1))[0] for i in range(9): ax = plt.subplot(3, 3, i + 1) plt.imshow(images[i].numpy().astype("uint8")) plt.title(class_names[labels[i]]) plt.axis("off") .. tab:: Output :new-set: .. image:: /_static/images/machine-learning/image-recognition/matplotlib_gallery.png :align: center Image batching ~~~~~~~~~~~~~~ Each batch has 32 images, 180x180 pixels, with RGB data. Each image has an accompanying label, as well. .. tab:: Python .. code:: python for image_batch, labels_batch in train_ds: print(image_batch.shape) print(labels_batch.shape) break .. tab:: Output .. code:: none (32, 180, 180, 3) (32,) We’re also going to configure the dataset for performance. ``dataset.cache()`` keeps images in memory so that we don’t need to load them each epoch. ``dataset.prefetch()`` prepares images ahead of time while current image is being worked on. Uses additional memory. .. tab:: Python .. code:: python AUTOTUNE = tf.data.AUTOTUNE train_ds = train_ds.cache().shuffle(1000).prefetch(buffer_size=AUTOTUNE) val_ds = val_ds.cache().prefetch(buffer_size=AUTOTUNE) Image data and normalization ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ You can think of image data as a series of numerical values that are interpreted to create something visual. Here is a simplification, where we have a 2D array of zeros and ones. Zero is interpreted as black (no color), and one is interpreted as white (max color). .. image:: /_static/images/machine-learning/image-recognition/bitmap.png :align: center In reality, images don’t just contain black and white pixels: they have pixels that values for red, green, and blue (RGB) at different intensities. Each pixel has 3 values for RGB intensity, combining to look like a single color to our eyes. .. image:: /_static/images/machine-learning/image-recognition/rgb_colors.png :align: center The intensity values go from 0 through 255, which we can see in our own data by looking at the maximum and minimum values of an image. .. tab:: Python .. code:: python first_image = image_batch[0] print(np.min(first_image), np.max(first_image)) .. tab:: Output .. code:: none 0.0 255.0 Neural networks like input values to be small, so we transform them to be between 0.0 and 1.0. .. image:: /_static/images/machine-learning/image-recognition/three_d_array.png :align: center Here, we test this out by create a normalization layer and then check to make sure the normalization works. .. tab:: Python .. code:: python normalization_layer = layers.Rescaling(1./255) normalized_ds = train_ds.map(lambda x, y: (normalization_layer(x), y)) image_batch, labels_batch = next(iter(normalized_ds)) first_image = image_batch[0] # Notice the pixel values are now in `[0,1]`. print(np.min(first_image), np.max(first_image)) .. tab:: Output .. code:: none 0.0 1.0 Building a model ---------------- Now that we gone through our preprocessing workflow, we are going to construct a basic Keras model, which contains several **layers**. Layers take information, process them in some way, and then pass the output on to different layers. We are going to build `a sequential model `__, which puts layers in a defined order, and feeds data through the layers in that order. Each layer will have a single tensor input and a single tensor output. .. image:: /_static/images/machine-learning/image-recognition/multi_layer_model.jpg :align: center We are starting our basic model with the following layers: - `Rescaling layer `__: works like above example. Our data contains 3 dimensions: x position, y position, and RGB channel - `Flatten layer `__: removes dimensional component into a single dimension; only reformats our data - `Dense layer `__: a layer that is fully connected to the previous layer. - Our Dense layer has 32 neurons or nodes. Every node is receives information about all pixels - Use ``relu`` activation. Activation functions determines how strongly each neuron “fires” -> to what degree each node gets used to make predictions - ends with another Dense layer with 5 nodes, one for each class. It will contain the odds of the images being each flower type. .. tab:: Python .. code:: python num_classes = len(class_names) model = Sequential([ layers.Rescaling(1./255, input_shape=(img_height, img_width, 3)), layers.Flatten(), layers.Dense(32, activation='relu'), layers.Dense(num_classes) ]) We then compile our model with ``model.compile()``, adding in a few more important options. Loss is how the training process determines how well it is doing. We want loss to be as close to zero as possible. There are many possible loss functions; here we use one called sparse categorical cross entropy. Our optimizer tries to decide how to make changes to our model to decrease loss. In the example below, the optimizer is trying to find the lowest point on the parabola. It tries to take larger steps when it’s far away from the minimum, and smaller steps when it’s near. If it takes steps that are too large, however, the model may have a hard time finding the minimum loss due to overshooting. .. image:: /_static/images/machine-learning/image-recognition/gradient-descent-learning-rate.png :align: center Reality is more complicated than this simple case. Here we show a more complicated gradient. It contains many places for the minimization process to get stuck (local minima). Therefore, making sure our step size is large enough to get out of local minima is also important. .. image:: /_static/images/machine-learning/image-recognition/gradient-descent-3d.png :align: center We also will keep track of the accuracy of our model. This is the proportion of images that the model correctly classifies. The model does not use this information; it is purely for us, the users. .. admonition:: Loss and accuracy metrics We use a type of loss called sparse categorical cross entropy. However, there are `many different kinds of loss `__ we can use. There are also `different metrics we can track `__ besides accuracy, as well. If we add them to the list, we can track multiple metrics at the same time. We use an optimizer called “Adam” commonly used in neural networks. `Other optimers `__ can be usedm as well. .. tab:: Python .. code:: python model.compile(optimizer='adam', loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True), metrics=['accuracy']) We can print model summary, which shows our layers and how many parameters we have for each layer. .. tab:: Python .. code:: python model.summary() .. tab:: Output .. code:: none Model: "sequential" _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= rescaling_1 (Rescaling) (None, 180, 180, 3) 0 flatten (Flatten) (None, 97200) 0 dense (Dense) (None, 32) 3110432 dense_1 (Dense) (None, 5) 165 ================================================================= Total params: 3,110,597 Trainable params: 3,110,597 Non-trainable params: 0 _________________________________________________________________ We are going to run the model for 10 **epochs**. An **epoch** is one iteration through the model pipeline where the model can adjust itself throughout. This means that we will pass our entire data set through our model 10 times. After the first epoch, future epochs will build upon the model created in prior epochs and refine it to minimize the **loss**. Here, we use ``model.fit()`` to actually fit the model that we have defined. We will call the output of the model fitting ``history``, as it will store a record of the fitting process over time. .. tab:: Python .. code:: python epochs=10 history = model.fit( train_ds, validation_data=val_ds, epochs=epochs ) .. tab:: Output .. code:: none Epoch 1/10 92/92 [==============================] - 1s 7ms/step - loss: 3.8806 - accuracy: 0.2016 - val_loss: 1.6087 - val_accuracy: 0.2398 Epoch 2/10 92/92 [==============================] - 0s 4ms/step - loss: 1.6073 - accuracy: 0.2459 - val_loss: 1.6065 - val_accuracy: 0.2398 Epoch 3/10 92/92 [==============================] - 0s 4ms/step - loss: 1.6051 - accuracy: 0.2459 - val_loss: 1.6048 - val_accuracy: 0.2398 Epoch 4/10 92/92 [==============================] - 0s 4ms/step - loss: 1.6034 - accuracy: 0.2459 - val_loss: 1.6036 - val_accuracy: 0.2398 Epoch 5/10 92/92 [==============================] - 0s 4ms/step - loss: 1.6022 - accuracy: 0.2459 - val_loss: 1.6028 - val_accuracy: 0.2398 Epoch 6/10 92/92 [==============================] - 0s 4ms/step - loss: 1.6014 - accuracy: 0.2459 - val_loss: 1.6023 - val_accuracy: 0.2398 Epoch 7/10 92/92 [==============================] - 0s 4ms/step - loss: 1.6009 - accuracy: 0.2459 - val_loss: 1.6021 - val_accuracy: 0.2398 Epoch 8/10 92/92 [==============================] - 0s 4ms/step - loss: 1.6005 - accuracy: 0.2459 - val_loss: 1.6019 - val_accuracy: 0.2398 Epoch 9/10 92/92 [==============================] - 0s 4ms/step - loss: 1.6003 - accuracy: 0.2459 - val_loss: 1.6019 - val_accuracy: 0.2398 Epoch 10/10 92/92 [==============================] - 0s 4ms/step - loss: 1.6002 - accuracy: 0.2459 - val_loss: 1.6017 - val_accuracy: 0.2398 We can visualize the results of our model in matplotlib, looking both at the training and validation sets. For each we look at accuracy, as well as loss. Here is an example of how we want our plot to look: .. image:: /_static/images/machine-learning/image-recognition/good_training.png :align: center Here, the validation accuracy slowly increases to be around 75%. It is a little smaller than the training accuracy, because we are always more accurate on the data that the model has already seen than on new data. When we look at training and validation loss, the absolute values are less important. However, we want to see loss decrease as we train the model. Smaller loss is better. We will see validation loss be larger than training loss, similar to how validation accuracy is always smaller than training accuracy. .. admonition:: Model history We saved the record of the fitting process and the resulting model in a variable called ``history``. This variable has an attribute ``.history``, which is a dictionary containing information about our fitting. For instance, ``history.history['accuracy']`` contains the training accuracy across our epochs, while ``history.history['val_accuracy']`` contains the validation accuracy. Likewise, ``history.history['loss']`` is the training loss, and ``history.history['val_loss']`` is the validation loss. We then plot each one in their own subplots. .. tab:: Python .. code:: python acc = history.history['accuracy'] val_acc = history.history['val_accuracy'] loss = history.history['loss'] val_loss = history.history['val_loss'] epochs_range = range(epochs) plt.figure(figsize=(8, 8)) plt.subplot(1, 2, 1) plt.plot(epochs_range, acc, label='Training Accuracy') plt.plot(epochs_range, val_acc, label='Validation Accuracy') plt.ylim(0, 1) plt.legend(loc='lower right') plt.title('Training and Validation Accuracy') plt.subplot(1, 2, 2) plt.plot(epochs_range, loss, label='Training Loss') plt.plot(epochs_range, val_loss, label='Validation Loss') plt.legend(loc='upper right') plt.title('Training and Validation Loss') plt.show() .. tab:: Output :new-set: .. image:: /_static/images/machine-learning/image-recognition/model1_training.png :align: center Our model isn’t performing particularly well. Next time, we will go over ways to fix it to be more accurate. Homework: TensorFlow Playground ------------------------------- Using `TensorFlow Playground `__, create the best model possible for the spiral data set. We will be judging models based on their test loss and the number of epochs it takes to get that loss. You should experiment with using different features, different numbers of nodes and layers, and other settings to create the best model.