If I understand correctly, one neuron per layer is enough since the layer will just be unrolled through time to accomodate a long sequence.
How can a recurrent layer contain several neurons?
Aren't the neurons in one layer essentially the same if they were unrolled through time?
Neural networks (MLP, CNN, RNN) are expected to have multiple neurons through multiple layers. One neuron per layer is hardly enough and will most likely provide a linear solution to the issue being addressed and its architecture will be too small to deal with any kind of real-life situation.
From Brandon Rohrer's video Recurrent Neural Networks and LSTM you can see a very simple structure containing multiple neurons (dots) for a single layer. Imagine this simple model working with only one model? It will prove to perform very poorly.
Related
Thank you for viewing my question. I'm trying to do image classification based on some pre-trained models, the images should be classified to 40 classes. I want to use VGG and Xception pre-trained model to convert each image to two 1000-dimensions vectors and stack them to a 1*2000 dimensions vector as the input of my network and the network has an 40 dimensions output. The network has 2 hidden layers, one with 1024 neurons and the other one has 512 neurons.
Structure:
image-> vgg(1*1000 dimensions), xception(1*1000 dimensions)->(1*2000 dimensions) as input -> 1024 neurons -> 512 neurons -> 40 dimension output -> softmax
However, using this structure I can only achieve about 30% accuracy. So my question is that how could I optimize the structure of my networks to achieve higher accuracy? I'm new to deep learning so I'm not quiet sure my current design is 'correct'. I'm really looking forward to your advice
I'm not entirely sure I understand your network architecture, but some pieces don't look right to me.
There are two major transfer learning scenarios:
ConvNet as fixed feature extractor. Take a pretrained network (any of VGG and Xception will do, do not need both), remove the last fully-connected layer (this layer’s outputs are the 1000 class scores for a different task like ImageNet), then treat the rest of the ConvNet as a fixed feature extractor for the new dataset. For example, in an AlexNet, this would compute a 4096-D vector for every image that contains the activations of the hidden layer immediately before the classifier. Once you extract the 4096-D codes for all images, train a linear classifier (e.g. Linear SVM or Softmax classifier) for the new dataset.
Tip #1: take only one pretrained network.
Tip #2: no need for multiple hidden layers for your own classifier.
Fine-tuning the ConvNet. The second strategy is to not only replace and retrain the classifier on top of the ConvNet on the new dataset, but to also fine-tune the weights of the pretrained network by continuing the backpropagation. It is possible to fine-tune all the layers of the ConvNet, or it’s possible to keep some of the earlier layers fixed (due to overfitting concerns) and only fine-tune some higher-level portion of the network. This is motivated by the observation that the earlier features of a ConvNet contain more generic features (e.g. edge detectors or color blob detectors) that should be useful to many tasks, but later layers of the ConvNet becomes progressively more specific to the details of the classes contained in the original dataset.
Tip #3: keep the early pretrained layers fixed.
Tip #4: use a small learning rate for fine-tuning because you don't want to distort other pretrained layers too quickly and too much.
This architecture much more resembled the ones I saw that solve the same problem and has higher chances to hit high accuracy.
There are couple of steps you may try when the model is not fitting well:
Increase training time and decrease learning rate. It may be stopping at very bad local optima.
Add additional layers that can extract specific features for the large number of classes.
Create multiple two-class deep networks for each class ('yes' or 'no' output class). This will let each network be more specialized for each class, rather than training one single network to learn all 40 classes.
Increase training samples.
I’m trying to figure out if it’s more efficient to run an RNN on the inputs, and then run another RNN on those outputs, repeatedly (one horizontal layer at a time). Or to run one time-step at a time for all layers (one vertical layer at a time).
I know tensorflow's MultiCellRNN class does the latter. Why is this method chosen over the former? Is the former equally efficient? Are there cases where going one time-step at a time for all layers is preferable?
See http://karpathy.github.io/2015/05/21/rnn-effectiveness/ for reference on multi-layer RNNs.
1: How to easily implement an RNN
Use an lstm cell, they're generally better (no vanishing gradient problems) and tensorflow makes it very easy to implement them through:
from tensorflow.python.ops.rnn_cell import BasicLSTMCell
...
cell = BasicLSTMCell( state_dim )
stacked_lstm = tf.nn.rnn_cell.MultiRNNCell([cell]*num_layers, state_is_tuple=True)
find out more on the tensorflow website: https://www.tensorflow.org/tutorials/recurrent/
2:Horizontal or Deep?
Just like you can have a multi layer neural networks, you can also have a multi layer RNN. Think of the RNN cell as a layer within your neural network, a special layer which allows you to remember sequential inputs. From my experience you will still have linear transforms (or depth) within your network, but the question to have multiple layers of lstm cells depends on your network topology, preference, and computational ability. (the more the merrier) The amount of inputs and outputs depends on your problem, and as far as I can remember there is no such thing as multiple Horizontal RNN cells, just depth.
All computation is done depth wise one input at a time.
The multi layer function you referenced is awesome, it handles all computation for you under the hood, just tell it how many cells you want and it does the rest.
Good Luck
If you run everything sequentially, there should not be that much of a performance difference between both approaches (unless I am overseeing something with cache locality here). The main advantage of the latter approach is that you can parallelise the computation for multiple layers.
E.g. instead of waiting for the inputs to propagate through 2 layers, you can already start the computation of the next time step in the first layer while the result from the current time step is propagating through the second layer.
Disclaimer: I would not consider myself a performance expert.
I've been studying machine learning for 4 months, and I understand the concepts behind the MLP. The problem came when I started reading about Convolutional Neural Networks. Let me tell you what I know and then ask what I'm having trouble with.
The core parts of a CNN are:
Convolutional Layer: you have "n" number of filters that you use to generate "n" feature maps.
RELU Layer: you use it for normalizing the output of the convolutional layer.
Sub-sampling Layer: used for "generating" a new feature map that represents more abstract concepts.
Repeat the first 3 layers some times and the last part is a common Classifier, such as a MLP.
My doubts are the following:
How do I create the filters used in the Convolutional Layer? Do I have to create a filter, train it, and then put it in the Conv Layer, or do I train it with the backpropagation algorithm?
Imagine I have a conv layer with 3 filters, then it will output 3 feature maps. After applying the RELU and Sub-sampling layer, I will still have 3 feature maps (smaller ones). When passing again through the Conv Layer, how do I calculate the output? Do I have to apply the filter in each feature map separately, or do some kind of operation over the 3 feature maps and then make the sum? I don't have any idea of how to calculate the output of this second Conv Layer, and how many feature maps it will output.
How do I pass the data from the Conv layers to the MLP (for classification in the last part of the NN)?
If someone knows of a simple implementation of a CNN without using a framework I will appreciate it. I think the best way of learning how stuff works is by doing it by yourself. In another time, when you already know how stuff works, you can use frameworks, because they save you a lot of time.
You train it with backpropagation algorithm, the same way as you train MLP.
You apply each filter separately. For example if you have 10 feature maps in the first layer and the filter shape of one of the feature maps from the second layer is 3*3, then you apply 3*3 filter to each of the ten feature maps in the first layer, weights for each feature map are different, in this case one filter will have 3*3*10 weights.
To understand it easier, keep in mind that a pixel of a non-grayscale image has three values - red, green and blue, so if you're passing images to a convolutional neural network ,then in the input layer you alredy have 3 feature maps(for RGB), so one value in the next layer will be connected too all 3 feature maps in the first layer
You should flatten the convolutional feature maps, for example if you have 10 feature maps with the size of 5*5, then you will have a layer with 250 values and then nothing different from MLP, you connect all of these artificial neurons to all of the artificial neurons in the next layer by weights.
Here someone has implemented convolutional neural network without frameworks.
I would also recommend you those lectures.
CrossPost: https://stats.stackexchange.com/questions/103960/how-sensitive-are-neural-networks
I am aware of pruning, and am not sure if it removes the actual neuron or makes its weight zero, but I am asking this question as if a pruning process were not being used.
On variously sized feedforward neural networks on large datasets with lots of noise:
Is it possible one (or some trivial amount) extra OR missing hidden neurons OR hidden layers make or break a network? Or will its synapse weights simply degrade to zero if it is not necessary and compensate with the other neurons if it is missing one or two?
When experimenting, should input neurons be added one at a time or in groups of X? What is X? Increments of 5?
Lastly, should each hidden layer contain the same number of neurons? This is usually what I see in example. If not, how and why would you adjust their sizes if not relying on using pure experimentation?
I would prefer to overdo it and wait longer for a convergence than if larger networks will adapt itself to the solution. I have tried numerous configurations, but it is still difficult to gauge an optimum one.
1) Yes, absolutely. For example, if you have too less neurons in your hidden layer your model will be too simple and have high bias. Similarly, if you have too many neurons your model will overfit and have high variance. Adding more hidden layers allows you to model very complex problems like object recognition but there are a lot of tricks to make adding more hidden layers work; this is known as the field of deep learning.
2) In a single layered neural network its generally a rule of thumb to start with 2 times as many neurons as the number of inputs. You can determine the increment through binary search; i.e. run through a few different architectures and see how the accuracy changes..
3) No, definitely not - each hidden layer can contain as many neurons as you want it to contain. There is no way other can experimentation to determine their sizes; all of what you mention are hyperparameters which you must tune.
Im not sure if you are looking for a simple answer, but maybe you will be interested in a new neural network regularization technique called dropout. Dropout basically randomely "removes" some of the neurons during training forcing each of the neurons to be good feature detectors. It greatly prevents overfitting and you can go ahead and set the number of neurons to be high without worrying too much. Check this paper out for more info: http://www.cs.toronto.edu/~nitish/msc_thesis.pdf
I would like to implement a Picture Classification using Neural Network. I want to know the way to select the Features from the Picture and the number of Hidden units or Layers to go with.
For now i have an idea of changing the size of image to some 50x50 or smaller so that the number of Features are less and that all inputs have constant size.The features would be RGB value of each of the pixels.Will it be fine or there is some other better way?
Also i decided to go with 1 Hidden Layer with half the number of units as in Inputs. I can change the number to get better results. Or would i require more layers ?
There are numerous image data sets that are successfully learned by neural networks, like
MNIST (here you will find many links to papers)
NORB
and CIFAR-10/100.
Not that you need many training examples. Usually one hidden layer is sufficient. But it can be hard to determine the "right" number of neurons. Sometimes the number of hidden neurons should even be greater than the number of inputs. When you use 2 or more hidden layer you will usually need less hidden nodes and the training will be faster. But when you have to many hidden layers it can be difficult to train the weights in the first layer.
A kind of neural network that is designed especially for images are convolutional neural networks. They usually work much better than multilayer perceptrons and are much faster.
50x50 image features matrix is 2500 features with RGB values. Your neural network may memorize this but most probably will perform poorly on other images.
Therefore this type of problem is more about image-processing , feature extraction. Your features will change according to your requirements. See this similar question about image processing and neural networks
1 layer network will only be suitable for linear problems, are you sure your problem is linear? Otherwise you will need multi layer neural network