Algorithm for Artificial Neural Networks in Jupyter

alnavar8
Jul 1, 2020
5 min read

Updated: Jul 3, 2020

As the title suggests, this post includes the algorithm for ANN, specifically Convolution Neural Network.

Algorithm:

Import NumPy, TensorFlow and other required libraries
Read the data from a csv file
Check the shape and datatype
Reshape
Split the data into training and testing dataset
Using to_categorical in keras to get a binary class matrix from vector class
Define the model architecture
Compile the model using optimizer
Increasing the dataset using ‘ImageDataGenerator’
Fit the model
Evaluate the model
Plot the loss and accuracy
Print the Confusion Matrix
Print the model summary

The first step of the CNN algorithm is to import the necessary libraries such as Keras in TensorFlow and NumPy as shown below. These Libraries include Features that are useful for building neural networks such as the Layers, the Compile and Fit Model, Model Architecture and Optimizers. TensorFlow is a mathematical Library mainly designed for Machine Learning Application. NumPy, also known as Numerical Python helps us in performing matrix multiplications. This comes in handy in NN as we need to represent the entire image in the form of matrix and perform computations.

#importing the necessary libraries

import numpy as np

import pandas as pd

from tensorflow import keras

from tensorflow.keras.layers import Conv2D, MaxPool2D, Dense, Dropout,  Flatten

from tensorflow.keras.models import Sequential

from tensorflow.keras.preprocessing.image import ImageDataGenerator

from tensorflow.keras.utils import to_categorical

import matplotlib.pyplot as plt

import seaborn as sns

from sklearn.metrics import confusion_matrix

Next, we need to import the dataset. TensorFlow has in-built datasets for basic applications to verify our model. These can be imported easily using load_text and can be split into training data and testing data by using train_test_split command as shown below. If we have a single dataset then it can be split into two subsets of default size. The image data is provided along with the corresponding class and the CNN learns the weights in order to create the mapping from the image data in the class. Where as in the test file only the image data is provided without the class label during Training.

data = pd.read_csv("hmnist_64_64_L.csv")

# Splitting train and test

from sklearn.model_selection import train_test_split

x_train, x_val, y_train, y_val = train_test_split(X, Y, test_size = 0.2, random_state=42)

When it comes to a NN we don’t have the number of neurons changing dynamically because the dense layer has fixed number of neurons, hence we need to ensure that all the images are of the same size, if not, then we perform padding and Resize. Hence our next step is to check the size of the data.

#Displaying the size of split

print("x_train.shape: ",x_train.shape)

print("x_val.shape: ",x_val.shape)

print("y_train.shape: ",y_train.shape)

print("y_val.shape: ",y_val.shape)

The labels provided are usually in the form of a digit or a word, this has to be converted into a vector value, where the number of digits in the vector is equal to the number of classes, for example, the digit 7 can be represented as [0,0,0,0,0,0,1,0] if there are 8 classes. This is called a one hot vector and it implemented using the command to_categorical.

Once the data is pre-processed by split into the required size and reshaped, we can go ahead with creating the architecture of our model. The model that we generally use in Neural Networks is a Sequential Model. This is considered when we need to create the model layer-by layer. Each of the layers have weights that correspond to the layer. The add() function is used to all new layers to the model as shown.

#CNN architecture

model = Sequential()

model.add(Conv2D(filters = 128, kernel_size = (5,5),padding = 'same',activation ='relu', input_shape = (64,64,1)))

model.add(MaxPool2D(pool_size=(2,2)))

model.add(Dropout(0.25))

model.add(Conv2D(filters = 64, kernel_size = (3,3),padding = 'same',activation ='relu'))

model.add(MaxPool2D(pool_size=(2,2)))

model.add(Dropout(0.25))

model.add(Conv2D(filters = 64, kernel_size = (3,3),padding = 'same',activation ='relu'))

model.add(MaxPool2D(pool_size=(2,2)))

model.add(Dropout(0.25))

model.add(Flatten())

model.add(Dense(256,activation = "relu"))

model.add(Dense(64,activation = "relu"))

model.add(Dense(32,activation = "relu"))

model.add(Dense(8, activation = "softmax"))

In the first layer, a 2D convolution is performed using Conv2D. The number of kernels used with it’s dimension and an activation function can be defined. Next we perform Batch normalization by which the inputs of each layer are used to calculate the mean and variance of the layers.

Next, we use MaxPool2D i.e. max pooling is performed to reduce the variance and the computational complexity is reduced as it reduces the size of the image. These layers are followed by Dropout. The dropout is used to prevent overfitting by ignoring some neurons randomly. The dropout is inversely proportional to the accuracy. Hence we need to consider an optimal value. Flatten is used to transform the 2D matrix of convolution layer into a vector to be fed into the fully connected layer. Dense refers to fully connected layer i.e. every neuron is connected to every other neuron in the previous layer. These layers can be layered in any order keeping in mide the architecture of CNN i.e. Convolution layer is followed by Pooling and this Pooling layer is followed by Fully Connected Layer, more the layers, more is the feature extraction.

#Compile the model

model.compile(optimizer = "Adam" , loss = "categorical_crossentropy", metrics=["accuracy"])

Model is compiled using model.compile. It includes the Optimizer, the loss function and the metrics. Optimizer is nothing but the gradient descent algorithm where the goal is to minimize the cost function. Adam optimizer is used generally. It is an adaptive learning rate optimizer algorithm which is specifically designed for neural networks. It uses adaptive learning rates methods to find learning rates for each of the parameters. The default learning rate of this optimizer is 0.001. The loss function generally used is the categorical cross entropy which is mainly used for single label classifications i.e. when only one category is valid for each data point. The metrics generally used is accuracy. Other metrics that can be used are precision, recall, or F1 score.

#Increasing the dataset

datagen = ImageDataGenerator( rotation_range=0.5, zoom_range = 0.5,         width_shift_range=0.5, height_shift_range=0.5, horizontal_flip=True, vertical_flip=True)

datagen.fit(x_train)

The image data generator is used to increase the size of the dataset by performing minute changes for the purpose of training, such as rotate, flip and zoom as shown.

#Fit the model

history = model.fit_generator(datagen.flow(x_train,y_train, batch_size=200), epochs = 20, validation_data = (x_val,y_val), steps_per_epoch=500)

The block fit_generator is the most time consuming one and its run time depends on the number of epochs, the number of trainable parameters, the batch size.

#Evaluate the model

final_loss, final_acc = model.evaluate(x_val, y_val, verbose=0)

print("Final loss: {0:.4f}, final accuracy: {1:.4f}".format(final_loss, final_acc))

The plot of Loss and Accuracy is plotted. Here, if the training loss decreases, it means the network is learning, also if there is no much difference between the training loss and validation loss it indicates that there is no overfitting. When the loss is very low, accuracy is very high there really is no bias or variance problem.

#confusion matrix

y_hat = model.predict(x_val)

y_pred = np.argmax(y_hat, axis=1)

y_true = np.argmax(y_val, axis=1)

cm = confusion_matrix(y_true, y_pred)

print(cm)

In the end, we can observe the Confusion Matrix where in the errors are shown. Ideally, all the non-diagonal elements have to be zero. And the entire model summary can be displayed to observe the total layers, the trainable parameters and the non-trainable parameters.

This algorithm remains same for any ANN, the only difference will be with respect to module architecture. This code can be executed to create your very own Neural Network.

Learnt Something new? Then hit the heart! How else would I know you like my content? Don't forget to comment down below and tell me what changes would you make in your model architecture. Also, subscribe to THE AI STUDIO for more AI related Content.

Algorithm for Artificial Neural Networks in Jupyter

Recent Posts

Comments