I know it might be a silly question but am kind of new to machine learning and ANN.
Is there any difference between Deep convolutional neural network and Dense Convolutional neural network?
Thanks in advance!
Dense CNN is a type of Deep CNN in which each layer is connected with another layer deeper than itself.
What does that mean ?
In normal CNN each layer is only connected to its siblings. Consider 4 layers,output from L1 is connected to only L2, output from L2 is connected only to L3, output from L3 is connected only to L4.
In a dense CNN, consider 4 layers, output from L1 is connected to L2, L3, L4, output from L2 is connected to L3, L4, output from L3 is connected to L4.
Here is a figure to illustrate it (source of the image is from this paper):
Why do we need to do this ?
Nowadays we have neural networks with 100 layers or even more. Neural networks are trained using backpropagation. In this algorithm, gradient (derivative) of the cost function is used to update the weights of each layer. With each new layer, the value of gradient diminishes, specially if you are using sigmoid. This results in longer time to train or sometimes it doesn't train at all. This problem is also known as vanishing gradient. Direct connection in Dense CNN solves this problem.
Dense CNN are also less prone to overfitting as compared to normal CNN.
For more read this paper, it's pretty easy to follow.
Related
my CNN network
Above is my config of the network.
l am training a CNN network on picture size of 192*192.
my target is a classification network of 11 kinds.
However, the loss and the accuracy on testing dataset appears to be very unstable. l have to run 15+ epochs to get a stable accuracy and loss. The maximum accuracy is only 50%.
What can l do to improve the performance?
I would recommend you to first refer to models which are widely known like VGG-16, LeNET or VGG-19 and check out the way how the conv2D and max-pooling layers are placed.
Start with a very basic model without any batch normalization and Leaky ReLU layers. You just keep the conv2D and max pooling layers and train your model for a few epochs.
Next, try other activations like ReLU to TanH. Try Changing the max pooling to average pooling.
If you are solving a classification problem then use the softmax layer at the end. Also, introduce Dense layer(s) after flattening.
Your dataset should be large and also the target should be one-hot encoded if you wish to use the softmax layer.
I am training a fully convolutional network with Encoder-Decoder architecture for the task of Image Segmentation and currently am using the Binary Cross Entropy loss for foreground/background prediction.
I was trying to search and read about why Cross entropy loss is used instead of L1 or L2 losses. Cross entropy loss fails to capture the overall layout of the image, whereas L1 and L2 take the overall image reconstruction into account.
Essentially, semantic segmentation task is a dense classification task, which means you need to classify every single pixel to certain classes. Usually, using cross entropy loss will perform better than l1 or l2 loss in classification tasks. L1 or L2 loss would be more suitable for regression problems. This article well explained the difference between them. Picking Loss Functions - A comparison between MSE, Cross Entropy, and Hinge Loss
There is another kind of loss function for semantic segmentation, which is dice loss as well as its variants. As far as I know, dice based loss dominates in medical imaging segmentation field.
I am training MNIST on 8 layers (1568-784-512-256-128-64-32-10) fully-connected deep neural network with the newly created activation function as shown in the figure below.This function looks a bit similar to the ReLU, however, it gives a litter curve at the "kink".
It was working fine when I used it to train 5 layers, 6 layers and 7 layers fully-connected neural networks. The problem arises when I use it in 8 layers fully-connected neural networks. Where it will only learn at the 1st few epochs then stop learning (Test Loss gives "nan" and Test accuracy drop to 9.8%). Why does this happen?
My other configurations are as follow: Dropout=0.5, Weight initialization= Xavier initialization, Learning rate=0.1
I believe this is called Gradient vanishing problem which usually occurs in deep network. There is no hard and fast rule for solving it. My advice would be to reshape your network architecture
See here [Avoiding vanishing gradient in deep neural networks
In cs231n handout here, it says
New dataset is small and similar to original dataset. Since the data
is small, it is not a good idea to fine-tune the ConvNet due to
overfitting concerns... Hence, the best idea might be to train a
linear classifier on the CNN codes.
I'm not sure what linear classifier means. Does the linear classifier refer to the last fully connected layer? (For example, in Alexnet, there are three fully connected layers. Does the linear classifier the last fully connected layer?)
Usually when people say "linear classifier" they refer to Linear SVM (support vector machine). A linear classifier learns a weight vecotr w and a threshold (aka "bias") b such that for each example x the sign of
<w, x> + b
is positive for the "positive" class and negative for the "negative" class.
The last (usually fully connected) layer of a neural-net can be considered as a form of a linear classifier.
I have seen several different architectures for convolutional neural network (CNN). I am confused which one is the standard and how do I decide what to use. I am not confused by the number of layers being used or the number of parameters involved; I am confused by the COMPONENTS of the network.
Let assume:
CL = convolution layer SL = subsampling layer(pooling) CM = convolution map NN = neural network Softmax = softmax classifier (similar to linear classifier)
Architecture 1
https://www.youtube.com/watch?v=n6hpQwq7Inw
CL,SL,CL,SL,CM,Softmax
Architecture 2 (Do we really need NN at the end again?)
http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=5605630&tag=1
CL,SL,CL, SL, NN, Softmax
Architecture 3
My idea
CL, SL, CL, SL, Softmax
There's no single one-size-suit-all CNN architecture. CNNs are usually designed to efficiently capture features of input data. It's assumed that these features are hierarchical, i.e. high-level features are made of low-level ones. CNN is just fancy feature extraction algorithm, you can put any classifier you want on top of it (NN, Softmax, whatever).
So convolutional layers are used to extract features from input. Subsampling layers, then, downscale the image in order to reduce computation complexity and make it shift-invariant.
Convolution map layer isn't that different from usual convolutional layer, I'm not sure if it's common to make this distinction. Actually, if you want to deal with color information, your input (to the first conv. layer) would be not a single image, but several (3, for example) images, each being a separate feature map.
What classifier to use on top of CNN is completely up to you. You can use Logistic Regression, SVM, NN or any other classification (or regression) algorithm.