Convolutional Neural Networks Model building and Freezing from scratch

Alex Aman
Towards Data Science
5 min readOct 12, 2020

--

If you can’t explain it simply, you don’t understand it well enough - Einstein, the Man and His Achievement By G. J. Whitrow, Dover Press 1973.

CNN Model made from scratch, using the most popular Kaggle dataset Fruits-360 and obtaining 98% accuracy.

Step 1- Importing Dataset From Kaggle to Google Colab

Login to your Kaggle account and go to My Account, and download Kaggle.json file by clicking on CREATE NEW API. Then on Google colab upload the same API by following this code gist

  • Directly import the whole dataset to google colab and unzip the same

Step 2- Preparing Train and Test Data

  • The data already contains Train and Test folders with images of fruits and vegetables inside. We just have to define the path to the folders,
  • Now we have to define Dimensions of the images into NumPy-array, so we can further scale down the images to 32x32 in the next steps
  • Let's Find out, how many types of fruits and vegetables are there in train data

Here I create a flow to scale down images dimensions and zoom them a little for better results

  • This code will show the final results of the scaled-down, zoomed images

Step 3 CNN building and Model tuning

Please have a basic understanding of pixel matrices, RGB channels, and color matrices and ANN’s for further reading

In beginner terms, with CNN what we do is, we add extra layers between our Dense layers or in other words do matrix multiplication of the matrix we defined in CNN with the matrix created by the Dense layer and moving to each pixel and filling the output matrix, so the output matrix will have all the values where Ai recorded some change in image, like change in shape or contour. This multiplication and adding at each pixel is called convolution,to record edges or shapes in the images.

What actually happening here is, with a filter matrix we defined certain color pixels and it is “convolving” with all other colors in pixels of the image, and thus finding the patterns, Imagine like the filter is black color and we shade it over a painting, and what we get in output is painting all black, but some dark colors are highlighted over black giving us the shapes, which are registered in the output matrix.

source — https://stackoverflow.com/questions/52067833/how-to-plot-an-animated-matrix-in-matplotlib

Intuitively when we keep on convoluting our CNN filter matrix with dense layer matrix on image pixels, and we will get a lot of different results/numbers at the curves, or points or some shapes that are different from the rest of the images, which in turn will help the AI or ANN to find a shape, edges, contour.
And as we go deeper into the network and add more CNN layer matrix and do matrix product going deeper and deeper,

Photo by Charles Deluvio on Unsplash

the results/numbers will change more drastically, or you can say the shapes, curves will get more clearer to AI and AI will able to learn the difference between an apple or a pineapple. In layman terms, We can say AI is drawing the image by tracing a real painting and CNN are the tracers or Highlighters. In the code below, 32 and 64 are the filters and matrix sizes for CNN are 3x3

Selecting a Network Architecture for CNN

You can do cross-validation among various Network architecture already built and studied through Mathematical proofs like AlexNet, LeNet-5, VGG or try to create your own Network architecture which does not blow up gradients and also don't overfit the data. Here I used a basic CNN architecture, for further advanced models, I suggest studying network architectures and filters/kernel building with features like degree rotations, horizontal/vertical filters, etc.

Padding

Padding is a process of adding additional boundaries to the pixel matrix, so during the convolutional process, the information on the corners is not lost, for example adding a boundary of 0’s around the matrix. As you can see in the image, without padding we would have used the corners with information only once, but now with padding , the original corners will be used more times than once during convoluting.

Pooling

We used a layer of max pool after getting output from convolving, What max pool does is, simply reduce the dimensions of the matrix or in other words reduce the size of the output image from the convolutional output for a faster and more accurate process.

To have basic intuition, of what happens in pooling is,we define a matrix for example 2x2 matrix, and it will divide the output matrix into 2x2 multiple matrices and will keep only points where it seems that there is some shape, or object recorded by figuring out the max value in particular 2x2 matrix and then this will further help in reducing the dimension of the output image for faster processing to go deeper into the network

source- https://www.youtube.com/watch?v=ZjM_XQa5s6s ,deep lizard

Please refer to the video by AndrewNg for more reference on Classic Networks for CNN's to understand image dimension changes after adding layers

  • Now its time to add input and output layers, and tune the ANN model
  • Save the model to test it on Random Images from the Internet and see whether the model you trained can identify fruits and vegetable
model.save_weights(“cnn_fruit.h5”)

Results

By adding CNN to our model, I was able to get to 98%Accuracy, After making this your first project, I think you will have a basic intuitive understanding of CNN and can delve more deeply into Mathematical portions, fundamentals, and Network selections and building.

cnn_fruit.h5 Results
CNN Custom Architecture Results

Freezing Custom CNN model for integrating into a web app,

This code will freeze your custom made model into Keras Checkpoint format and then you can make inference graphs for the app integration

For, complete jupyter notebook and code, you can view my repository over github.comhttps://github.com/Alexamannn/CNN_from_scratch

--

--