Downsampling helps to reduce the shape but why downsampling is needed in Unet segmentation architecture?
Most of modern convolution neural network architectures use max-pooling to downsample (such as U-Net), not depending only on the stride to achieve this because pooling introduces a small location invariance to the network architecture and is faster to compute.
This might provide you more information.
Related
I've been reading up a bit on different CNNs for object detection, and have found that most of the models I'm looking at are fully convolutional networks, like the latest YOLO versions and retinanet.
What are the benefits of FCNs over conventional CNNs with pooling, apart from FCNs having less different layers? I've read https://arxiv.org/pdf/1412.6806.pdf and as I read it the main interest of that paper was to simplify the networks structure. Is this the sole reason that modern detection/classification networks don't use pooling, or are there other benefits?
With FCNs we avoid the use of dense layers, which means less parameters and because of that we can make the network learn faster.
If you avoid pooling, your output will be of the same height/width of your input. But our goal is to reduce the size of the convolutions because it is much more computationally efficient. Also, with pooling we can go deeper, as we go through higher layers individual neurons “see” more of the input. In addition, it helps to propagate information across different scales.
Usually those networks consists of a down-sampling path to extract all the necessary features and an up-sampling path to reconstruct high-level features back to the original image dimensions.
There are some architecture like "The all convolutional net" by. Springenberg, that avoids in a sense pooling in favor of speed and simplicity. In this paper the author replaced all pooling operations with stride-2 convolutions and used a global average pooling at the output layer. The global averaging pooling operation reduce the dimension of the given input.
Can I use CNN architecture for binary classification on these types of images (posted below)?
Currently, I am it having 3-Conv + 2-FC layers but not getting good results. I have a sufficient amount of data as well. I tried transfer learning with Inception V3 but it is overfitting in all cases of layer locking.
Is there any separate way of classifying such images because the features to be extracted are limited here.
Semantic Segmentation converts images into a kind of pixel-wise color maps but it is total different paradigm.
Before being fed to the neural network there are kernels applied to images for feature extraction.But, how do we understand that a particular kernel will help to extract the required feature for neural network.
There is absolutely no general answer to this question, no prinicipal method to determine these hyperparameters is known. A conventional approach is to look for similar problems and deep learning architectures which have already been shown to work. Than a suitable architecture can be developed by experimentation. However conventional kernel size's are 3x3, 5x5 and 7x7.
Otherwise, there are paper about this 1 and 2, you may want to take look to see the art of choosing hyper parameters in CNN.
I need to do regression analysis using SVM kernels on the large sets of data. My laptop is not able to handle and it takes hours to finish running. Is there any good way to reduce the dataset size without affecting the (much) quality of the model? Will stratified sampling work?
There are dozens of ways of reducing SVM complexity, probably the easiest ones involve approximating Kernel space projection. In particular libraries such as scikit-learn provides functions to do this kind of explicit projection, which followed by a linear SVM - can be trained realatively fast.
I'm trying to classify hotel image data using Convolutional neural network..
Below are some highlights:
Image preprocessing:
converting to gray-scale
resizing all images to same resolution
normalizing image data
finding pca components
Convolutional neural network:
Input- 32*32
convolution- 16 filters, 3*3 filter size
pooling- 2*2 filter size
dropout- dropping with 0.5 probability
fully connected- 256 units
dropout- dropping with 0.5 probability
output- 8 classes
Libraries used:
Lasagne
nolearn
But, I'm getting less accuracy on test data which is around 28% only.
Any possible reason for such less accuracy? Any suggested improvement?
Thanks in advance.
There are several possible reasons for low accuracy on test data, so without more information and a healthy amount of experimentation, it will be impossible to provide a concrete answer. Having said that, there are a few points worth mentioning:
As #lejlot mentioned in the comments, the PCA pre-processing step is suspicious. The fundamental CNN architecture is designed to require minimal pre-processing, and it's crucial that the basic structure of the image remains intact. This is because CNNs need to be able to find useful, spatially-local features.
For detecting complex objects from image data, it's likely that you'll benefit from more convolutional layers. Chances are, given the simple architecture you've described, that it simply doesn't possess the necessary expressiveness to handle the classification task.
Also, you mention you apply dropout after the convolutional layer. In general, the research I've seen indicates that dropout is not particularly effective on convolutional layers. I personally would recommend removing it to see if it has any impact. If you do wind up needing regularization on your convolutional layers, (which in my experience is often unnecessary since the shared kernels often already act as a powerful regularizer), you might consider stochastic pooling.
Among the most important tips I can give is to build a solid mechanism for measuring the quality of the model and then experiment. Try modifying the architecture and then tuning hyper-parameters to see what yields the best results. In particular, make sure to monitor training loss vs. validation loss so that you can identify when the model begins overfitting.
After 2012 Imagenet, all convolutional neural networks which performs good(state of the art) are adding more convolutional neural network, they even use zero padding to increase the convolutional neural network.
Increase the number of convolutional neural network.
Some says that dropout is not that effective on CNN, however it is not bad to use, but
You should lower the dropout value, you should try it(May be 0.2).
Data should be analysed. If it is low,
You should use data augmentation techniques.
If you have more data in one of the labels,
You are stuck with the imbalanced data problem. But you should not consider it for now.
You can
Fine-Tune from VGG-Net or some other CNN's should be considered.
Also, don't convert to grayscale, after image-to-array transformation, you should just divide 225.
I think that you learned CNN from some tutorial(MNIST) and you think that you should turn it to grayscale.