Lastly, I started to learn neural networks and I would like know the difference between Convolutional Deep Belief Networks and Convolutional Networks. In here, there is a similar question but there is no exact answer for it. We know that Convolutional Deep Belief Networks are CNNs + DBNs. So, I am going to do an object recognition. I want to know which one is much better than other or their complexity. I searched but I couldn't find anything maybe doing something wrong.
I don't know if you still need an answer but anyway I hope you will find this useful.
A CDBN adds the complexity of a DBN, but if you already have some background it's not that much.
If you are worried about computational complexity instead, it really depends on how you use the DBN part. The role of DBN usually is to initialize the weights of the network for faster convergence. In this scenario, the DBN appears only during pre-training.
You can also use the whole DBN like a discriminative network (keeping the generative power) but the weight initialization provided by it is enough for discriminative tasks. So during an hypothetical real-time utilization, the two system are equal performance-wise.
Also the weight-initialization provided by the first model anyway really helps for difficult task like object recognition (even a good Convolutional Neural network alone doesn't reach good success rate, at least compared to a human) so it's generally a good choice.
Related
As far as I understand, neural networks aren't good at classifying 'unknowns', i.e. items that do not belong to a learned class. But how do face detection/recognition approaches usually determine that no face is detected/recognised in a region? Is the predicted probability somehow thresholded?
Summary
It is true that neural networks are inherently not good at classifying 'unknowns' because they tend to overfit to the data that they have been trained on, if the underlying structure of the neural network is complex enough. However, there are multiple ways to go about reducing the affects of overfitting. For example, one technique that is used for this is called dropout. Another example can be batch normalization. Despite these techniques, the best way to reduce the affects of overfitting is to use more data.
For the facial recognition example that you have given above, it is common that the models that have been trained have 'seen' a huge amount of data. This means that there are very few 'unknowns' and even if there are, the neural network has learned how to tell if there are facial features present or not. This is because certain structures of neural networks are really good at telling if there is a pattern of features present in the input data. This helps the neural networks to learn if the image that is being input has certain features/patterns in it or not. If the these features are found then the input data is classified as face otherwise it is not.
As a beginner in ML and AI, I have come across ANN, RNN and LSTMs, however I would like to know what is the classification among neural networks ranging from the simplest single perceptron feedforward network to the most complicated one.
Maybe you'll be interested in looking at the neural network zoo. It's not a hierarchical classification of all the different neural network types, but it does graphically show a lot of types and also provides short descriptions of them (and also some other models that are not typically considered to be neural networks). I haven't read through it all in detail, so I can't personally vouch for the page's correctness, but it looks good.
I am designing a neural network and am trying to determine if I should write it in such a way that each neuron is its own 'process' in Erlang, or if I should just go with C++ and run a network in one thread (I would still use all my cores by running an instance of each network in its own thread).
Is there a good reason to give up the speed of C++ for the asynchronous neurons that Erlang offers?
I'm not sure I understand what you're trying to do. An artificial neural network is essentially represented by the weight of the connections between nodes. The nodes themselves don't exist in isolation; their values are only calculated (at least in feed-forward networks) through the forward-propagation algorithm, when it is given input.
The backpropagation algorithm for updating weights is definitely parallelizable, but that doesn't seem to be what you're describing.
The usefulness of having neurons in a Neural Network (NN), is to have a multi-dimension matrix which coefficients you want to handle ( to train them, to change them, to adapt them little by little, so as they fit well to the problem you want to solve). On this matrix you can apply numerical methods (proven and efficient) so as to find an acceptable solution, in an acceptable time.
IMHO, with NN (namely with back-propagation training method), the goal is to have a matrix which is efficient both at run-time/predict-time, and at training time.
I don't grasp the point of having asynchronous neurons. What would it offers ? what issue would it solve ?
Maybe you could explain clearly what problem you would solve putting them asynchronous ?
I am indeed inverting your question: what do you want to gain with asynchronicity regarding traditional NN techniques ?
It would depend upon your use case: the neural network computational model and your execution environment. Here is a recent paper (2014) by Plotnikova et al, that uses "Erlang and platform Erlang/OTP with predefined base implementation of actor model functions" and a new model developed by the authors that they describe as “one neuron—one process” using "Gravitation Search Algorithm" for training:
http://link.springer.com/chapter/10.1007%2F978-3-319-06764-3_52
To briefly cite their abstract, "The paper develops asynchronous distributed modification of this algorithm and presents the results of experiments. The proposed architecture shows the performance increase for distributed systems with different environment parameters (high-performance cluster and local network with a slow interconnection bus)."
Also, most other answers here reference a computational model that uses matrix operations for the base of training and simulation, for which the authors of this paper compare by saying, "this case neural network model [ie matrix operations based] becomes fully mathematical and its original nature (from neural networks biological prototypes) gets lost"
The tests were run on three types of systems;
IBM cluster is represented as 15 virtual machines.
Distributed system deployed to the local network is represented as 15 physical machines.
Hybrid system is based on the system 2 but each physical machine has four processor cores.
They provide the following concrete results, "The presented results evidence a good distribution ability of gravitation search, especially for large networks (801 and more neurons). Acceleration depends on the node count almost linearly. If we use 15 nodes we can get about eight times acceleration of the training process."
Finally, they conclude regarding their model, "The model includes three abstraction levels: NNET, MLP and NEURON. Such architecture allows encapsulating some general features on general levels and some specific for the considered neural networks features on special levels. Asynchronous message passing between levels allow to differentiate synchronous and asynchronous parts of training and simulation algorithms and, as a result, to improve the use of resources."
It depends what you are after.
2nd Generation of Neural Networks are synchronous. They perform computations on an input-output basis without a delay, and can be trained either through reinforcement or back-propagation. This is the prevailing type of ANN at the moment and the easiest to get started with if you are trying to solve a problem via machine learning, lots of literature and examples available.
3rd Generation of Neural Networks (so-called "Spiking Neural Networks") are asynchronous. Signals propagate internally through the network as a chain-reaction of spiking events, and can create interesting patterns and oscillations depending on the shape of the network. While they model biological brains more closely they are also harder to make use of in a practical setting.
I think that async computation for NNs might prove beneficial for the (recognition) performance. In fact, the result might be similar (maybe less pronounced) to using dropout.
But a straight-forward implementation of async NNs would be much slower, because for synchronous NNs you can use linear algebra libraries, which make good use of vectorization or GPUs.
Deep learning has been seen as a rebranding of Neural Networks.
Were the issues presented in the paper "Neural Networks and the Bias/Variance Dilemma" by Stuart Geman ever resolved in the architectures in use today?
We learned a lot about NN, in particular:
we now learn better representations due to progress in unsupervised/autoregressive learning, such as restricted boltzman machines, autoencoders, denoising autoencoders, variational autoencoders, which help as stabilize the process, learn from reasonable representations
we have better priors - not neceserly in the strict probabilistic sense, but we know, that for example in image processing a good architecture is the convolutional one, thus we have a smaller (in terms of parameters), but better suited for the problem - models. Consequently we are less prone to overfitting.
we have better optimization techniques and activation functions - which help us with underfitting (we can learn larger networks), in particular - we can learn deeper networks. Why is deep often better then wide? Because again - this is another prior, the assumption that representation should be hierarchical, and it seems to be valid prior for many modern problems (even that not all of them).
dropout, and other techniques brought as better regularization methods (than previously known and used simple weights priors) - which again limits problem with overfitting (variance).
There are many more things that changed, but in general - we were simply able to find better architectures, better assumptions, thus we now search in more narrow class of hypotheses. Consequently - we overfit less (variance), and underfit less (bias) - yet there is still lots to be done!
Next thing is, as #david pointed out, amount of data. We have huge datasets now, we often have access to more data that we can process in a reasonable time, and obviously more data means less variance - even highly overfitting models start to behave well.
Last, but not least - hardware. This is something that every single deep learning expert will tell you - our computers got stronger. We still use the same algorithms, the same architectures (with many little tweaks, but the core is the same), but our hardware is exponentially faster, and this changes a lot.
#lejlot gave a good overview. I want to point to two specific parts of the whole process.
First, neural networks are universal approximators. That means, their bias in principle can be made arbitrarily small. The problem that was rather thought to be severe was overfitting -- too large variance.
Now, a common and successful way in Machine Learning to deal with too large variance is by "averaging it away" over many different predictions -- which should be as uncorrelated as possible. This worked in Random Forests, for instance, and in this way I tend to understand current Neural Networks as well (particular the maxout+dropout stuff). Of course, this is a narrow view -- there is further this whole representational learning stuff, the not explaining-away property, etc. -- but it's one I find suitable for your question regarding the bias/variance tradeoff.
Second point: there is no better way to prevent overfitting than having very much data. And currently we're in the situation to gather a lot of data.
I am currently in the process of learning neural networks and can understand basic examples like AND, OR, Addition, Multiplication, etc.
Right now, I am trying to build a neural network that takes two inputs x and n, and computes pow(x, n). And, this would require the neural network to have some form of a loop, and I am not sure how I can model a network with a loop
Can this sort of computation be modelled on a neural network? I am assuming it is possible.. based on the recently released paper(Neural Turing Machine), but not sure how. Any pointers on this would be very helpful.
Thanks!
Feedforward neural nets are not Turing-complete, and in particular they cannot model loops of arbitrary order. However, if you fix the maximum n that you want to treat, then you can set up an architecture which can model loops with up to n repetitions. For instance, you could easily imagine that each layer could act as one iteration in the loop, so you might need n layers.
For a more general architecture that can be made Turing-complete, you could use Recurrent Neural Networks (RNN). One popular instance in this class are the so-called Long short-term memory (LSTM) networks by Hochreiter and Schmidhuber. Training such RNNs is quite different from training classical feedforward networks, though.
As you pointed out, Neural Turing Machines seem to working well to learn the basic algorithms. For instance, the repeat copy task which has been implemented in the paper, might tell us that NTM can learn the algorithm itself. As of now, NTMs have been used only for simple tasks so understanding its scope by using the pow(x,n) will be interesting given that repeat copy works well. I suggest reading Reinforcement Learning Neural Turing Machines - Revised for a deeper understanding.
Also, recent developments in the area of Memory Networks empower us to perform more complicated tasks. Hence, to make a neural network understand pow(x,n) might be possible. So go ahead and give it a shot!