Image classification using Convolutional neural network - machine-learning

I'm trying to classify hotel image data using Convolutional neural network..
Below are some highlights:
Image preprocessing:
converting to gray-scale
resizing all images to same resolution
normalizing image data
finding pca components
Convolutional neural network:
Input- 32*32
convolution- 16 filters, 3*3 filter size
pooling- 2*2 filter size
dropout- dropping with 0.5 probability
fully connected- 256 units
dropout- dropping with 0.5 probability
output- 8 classes
Libraries used:
Lasagne
nolearn
But, I'm getting less accuracy on test data which is around 28% only.
Any possible reason for such less accuracy? Any suggested improvement?
Thanks in advance.

There are several possible reasons for low accuracy on test data, so without more information and a healthy amount of experimentation, it will be impossible to provide a concrete answer. Having said that, there are a few points worth mentioning:
As #lejlot mentioned in the comments, the PCA pre-processing step is suspicious. The fundamental CNN architecture is designed to require minimal pre-processing, and it's crucial that the basic structure of the image remains intact. This is because CNNs need to be able to find useful, spatially-local features.
For detecting complex objects from image data, it's likely that you'll benefit from more convolutional layers. Chances are, given the simple architecture you've described, that it simply doesn't possess the necessary expressiveness to handle the classification task.
Also, you mention you apply dropout after the convolutional layer. In general, the research I've seen indicates that dropout is not particularly effective on convolutional layers. I personally would recommend removing it to see if it has any impact. If you do wind up needing regularization on your convolutional layers, (which in my experience is often unnecessary since the shared kernels often already act as a powerful regularizer), you might consider stochastic pooling.
Among the most important tips I can give is to build a solid mechanism for measuring the quality of the model and then experiment. Try modifying the architecture and then tuning hyper-parameters to see what yields the best results. In particular, make sure to monitor training loss vs. validation loss so that you can identify when the model begins overfitting.

After 2012 Imagenet, all convolutional neural networks which performs good(state of the art) are adding more convolutional neural network, they even use zero padding to increase the convolutional neural network.
Increase the number of convolutional neural network.
Some says that dropout is not that effective on CNN, however it is not bad to use, but
You should lower the dropout value, you should try it(May be 0.2).
Data should be analysed. If it is low,
You should use data augmentation techniques.
If you have more data in one of the labels,
You are stuck with the imbalanced data problem. But you should not consider it for now.
You can
Fine-Tune from VGG-Net or some other CNN's should be considered.
Also, don't convert to grayscale, after image-to-array transformation, you should just divide 225.
I think that you learned CNN from some tutorial(MNIST) and you think that you should turn it to grayscale.

Related

What are the benefits of fully convolutional neural networks, compared to CNNs with pooling?

I've been reading up a bit on different CNNs for object detection, and have found that most of the models I'm looking at are fully convolutional networks, like the latest YOLO versions and retinanet.
What are the benefits of FCNs over conventional CNNs with pooling, apart from FCNs having less different layers? I've read https://arxiv.org/pdf/1412.6806.pdf and as I read it the main interest of that paper was to simplify the networks structure. Is this the sole reason that modern detection/classification networks don't use pooling, or are there other benefits?
With FCNs we avoid the use of dense layers, which means less parameters and because of that we can make the network learn faster.
If you avoid pooling, your output will be of the same height/width of your input. But our goal is to reduce the size of the convolutions because it is much more computationally efficient. Also, with pooling we can go deeper, as we go through higher layers individual neurons “see” more of the input. In addition, it helps to propagate information across different scales.
Usually those networks consists of a down-sampling path to extract all the necessary features and an up-sampling path to reconstruct high-level features back to the original image dimensions.
There are some architecture like "The all convolutional net" by. Springenberg, that avoids in a sense pooling in favor of speed and simplicity. In this paper the author replaced all pooling operations with stride-2 convolutions and used a global average pooling at the output layer. The global averaging pooling operation reduce the dimension of the given input.

Inception V3 Image Classification

How can I understand what features is the Google Inception V3 model using to classify a set of images, what features or pixels of the images are more significant for classifying them?
For instance, if the classifier were to distinguish between a Cheetah and a Leopard, it would probably do so by judging based on their spots. How can I determine what aspects of my images the classifier values most?
Your question is not easily answerable, Neural nets in general compose of hierarchical features where in the initial layers the neural net may learn to detect edges and blobs and in the deeper layers it learn more abstract features, so in a n class classification problems, where n might be a large number it is notoriously difficult to interpret what exactly the network learns and uses to classify images. Having said that Obviously work has been done,But i will refer you to https://distill.pub/2017/feature-visualization/, this should help you a bit

Facial recognition and classifying unknowns with neural networks

As far as I understand, neural networks aren't good at classifying 'unknowns', i.e. items that do not belong to a learned class. But how do face detection/recognition approaches usually determine that no face is detected/recognised in a region? Is the predicted probability somehow thresholded?
Summary
It is true that neural networks are inherently not good at classifying 'unknowns' because they tend to overfit to the data that they have been trained on, if the underlying structure of the neural network is complex enough. However, there are multiple ways to go about reducing the affects of overfitting. For example, one technique that is used for this is called dropout. Another example can be batch normalization. Despite these techniques, the best way to reduce the affects of overfitting is to use more data.
For the facial recognition example that you have given above, it is common that the models that have been trained have 'seen' a huge amount of data. This means that there are very few 'unknowns' and even if there are, the neural network has learned how to tell if there are facial features present or not. This is because certain structures of neural networks are really good at telling if there is a pattern of features present in the input data. This helps the neural networks to learn if the image that is being input has certain features/patterns in it or not. If the these features are found then the input data is classified as face otherwise it is not.

Stratified sampling for regression

I need to do regression analysis using SVM kernels on the large sets of data. My laptop is not able to handle and it takes hours to finish running. Is there any good way to reduce the dataset size without affecting the (much) quality of the model? Will stratified sampling work?
There are dozens of ways of reducing SVM complexity, probably the easiest ones involve approximating Kernel space projection. In particular libraries such as scikit-learn provides functions to do this kind of explicit projection, which followed by a linear SVM - can be trained realatively fast.

Using Caffe to classify "hand-crafted" image features

Does it make any sense to perform feature extraction on images using, e.g., OpenCV, then use Caffe for classification of those features?
I am asking this as opposed to the traditional way of passing the images directly to Caffe, and letting Caffe do the extraction and classification procedures.
Yes, it does make sense, but it may not be the first thing you want to try:
If you have already extracted hand-crafted features that are suitable for your domain, there is a good chance you'll get satisfactory results by using an easier-to-use machine learning tool (e.g. libsvm).
Caffe can be used in many different ways with your features. If they are low-level features (e.g. Histogram of Gradients), then several convolutional layers may be able to extract the appropriate mid-level features for your problem. You may also use caffe as an alternative non-linear classifier (instead of SVM). You have the freedom to try (too) many things, but my advice is to first try a machine learning method with a smaller meta-parameter space, especially if you're new to neural nets and caffe.
Caffe is a tool for training and evaluating deep neural networks. It is quite a versatile tool allowing for both deep convolutional nets as well as other architectures.
Of course it can be used to process pre-computed image features.

Resources