Image - CNN clarification

I want to clarify a few points in Convulutional Neural Network, I am implementing image recognition in CNN using Keras.

1 - Reducing size of images can help the model to learn faster. I read some blogs, in which they reduce the image size from (150,150) to (32,32) .

2 - Increasing no. of layers and nodes can help increases the accuracy or not. I started to train the model at one CNN layer. But the accuracy is low. Then I add two more CNN layer, it gives high accuracy about 74. After that, I add one more layer it gives accuracy in the same range of 70s.

3 - Is there a way to look images after every layer in CNN using keras. It will help to study the image in CNN.

Thank you

2 answers

  • answered 2019-09-10 03:16 Pavan Kumar Polavarapu

    Let me attempt to answer your queries.

    1. Reducing size of the image size helps model learn faster and reduce the memory requirement. 150 X 150 pixels require more number of nodes in a single layer of neural network and thereby more memory. I am not sure about the accuracy stats of Squashing versus Center Cropping.

    2. Increasing number of nodes may not directly relate to improvement in accuracy. Bagging and Boosting can improve accuracy way better than increasing number of layers in many scenarios. Drop out is a problem in deep neural networks but with proper data normalization along with proper activation functions and sufficient training data should help improve accuracy with multiple layers.

    3. Short answer is yes. You can do it by reshaping the feature vectors to the original image shape at the end of each layer for which you have to define your own neural network.

  • answered 2019-09-10 04:09 eugen

    Here are my inputs regarding your questions:

    1) In an ideal world, you should be able to take a photo/video of anything with a camera as big as it fits your needs. However, this approach is not practical, although can result in a quite good performance. Actually, the more features you provide sans noise the more accurate is your neural net. However, where will you use such a network with such a big camera as its input generating machine? For all practical purposes you use a camera with a lens of a size not bigger than a tennis ball. At the same time, it is extremely expensive to train images with sizes bigger than 64x64 pixels. You can hardly afford a batch of 2 with a single GPU with any descent neural net. At the same time, this will increase the training time and you may end up waiting days for it to finish.

    2) The more layers do not produce better results and there are other ways to achieve that. One problem with this is that the more layers you have you face the exploding or vanishing gradient problem. Also make sure you are regularizing your data, providing enough of training data and that its distribution is similar to the valid/test set distribution. So as you can see there are many aspects to the accuracy of a neural net and these were just some.

    3) Yes you can do it. It will take some time to write it thoroughly so I am leaving you a link from medium. It is more detailed and has what you need: