I want to implement a perceptron network and I have a little problem. The first implementation will be very simple. Just three layers, input, one hidden and the output layer. My problem is that how many synapses is optimal for a hidden node beetwen the input and hidden layer? I think not too economical if every node join to every input nodes.
Thanks for the comments.
In very general setting you plug in every single node of a given layer with every node in the next one. This is called "fully connected layer". Obviously this is not the only option, and with more advanced approaches you will find much more sparse connectomes, like receptive fields, cvonolutional layers etc. For a simple experiments, starting with fully connected layers is preferable, as other connection strategies usually assume something about your data (like spatial-temporal relations of inputs), while fully connected layer is the agnostic, generic approach.
Related
I am new to deep learning and trying to understand the concept behind hidden layers, but i am not clear with following things:
If there are suppose 3 hidden layers. When we take output from all the nodes of 2nd layer as input to all the nodes of 3rd layer then what difference it makes in output of nodes of 3rd layer as they are getting same input + same parameters initialization (as per what I read, I assume that all the nodes of one layer gets same random weight for parameters).
Please correct me if I am thinking in wrong direction.
The simple answer is because of random initialization.
If you started with same weights through out the neural network (NN), then all nodes will produce the same output.
This is because when using backprop algorithm the error is spread out based on the activation strength of each node. If they start out the same then the error will spread equally and hence the nodes in the NN will not be able to learn different features.
So basic random initialization makes sure that each node specializes. Hence after learning, the nodes in the hidden layers will produce different outputs even when the input is the same.
Hope this helps.
I want to classify the input as one of 3 possibilities. Is it better to use 3 networks with one output each or 1 network with 3 outputs?
(i.e. 3 networks that output 0 or 1 or 1 network that outputs a one hot vector of length 3 [1,0,0]
Does the answer change depending on how complex the incoming data is to classify?
At what amount of outputs does it make sense to partition the networks (if ever)? For example, if I want to classify into 20 groups, does it make a difference?
I would say it would make more sense to use a single network with multiple outputs.
The main reason is that hidden layers (I'm assuming you'll have at least one hidden layer) can be interpreted as transforming the data from the original space (feature space) into a different space that is more suitable for the task (classification in your case). For example, when training a network to recognize faces from raw pixels, it might use a hidden layer to first detect simple shapes such as small lines based on pixels, then use another hidden layer to detect simple shapes such as eyes/noses based on the lines from the first layer, etc. (it may not be entirely as ''clean'' as this, but this is an easy-to-understand example).
Such a transformation that a network can learn is typically useful for the classification task, regardless of what class the specific example has. For example, it is useful to be able to detect eyes in images regardless of whether or not the actual image contains a face; if you do indeed detect two eyes, you can classify it as a face, and otherwise you classify it as not being a face. In both cases, you were looking for eyes.
So, by splitting up into multiple networks, you may end up learning quite similar patterns in all networks anyway. Then you might as well have saved yourself the computational effort and just learned it once.
Another disadvantage of splitting up into multiple networks would be that you would probably cause your dataset to become imbalanced (or more imbalanced if it already is imbalanced). Suppose you have three classes, with exactly 1/3 of the dataset belonging to each class. If you use three networks for three binary classification tasks, you suddenly always have 1/3 ''1'' classes and 2/3 ''0'' classes. A network may then become biased towards predicting 0s everywhere, since those are the majority classes in each of the three separate problems.
Note that this is all based on my intuition; the best solution if you have time would be to simply try both approaches and test! I don't think I have ever seen someone using multiple networks for a single classification task in practice though, so if you only have time for one approach I'd recommend going for a single network.
I think the only case where it would really make sense to use multiple networks would be if you actually want to predict multiple unrelated values (or at least values that are not strongly related). For example, if, given images, you want to 1) predict whether or not there is a dog on the image, and 2) whether it is a photograph or a painting. Then it may be better to use two networks with two outputs each, instead of a single network with four outputs.
I know how the algorithm works, but I'm not sure how it determines the clusters. Based on images I guess that it sees all the neurons that are connected by edges as one cluster. So that you might have two clusters of two groups of neurons each all connected. But is that really it?
I also wonder.. is GNG really a neural network? It doesn't have a propagation function or an activation function or weighted edges.. isn't it just a graph? I guess that depends on personal opinion a bit but I would like to hear them.
UPDATE:
This thesis www.booru.net/download/MasterThesisProj.pdf deals with GNG-clustering and on page 11 you can see an example of what looks like clusters of connected neurons. But then I'm also confused by the number of iterations. Let's say I have 500 data points to cluster. Once I put them all in, do I remove them and add them again to adapt die existing network? And how often do I do that?
I mean.. I have to re-add them at some point.. when adding a new neuron r, between two old neurons u and v then some data points formerly belonging to u should now belong to r because it's closer. But the algorithm does not contain changing the assignment of these data points. And even if I remove them after one iteration and add them all again, then the false assignment of the points for the rest of that first iteration changes the processing of the network doesn't it?
NG and GNG are a form of self-organizing maps (SOM), which are also referred to as "Kohonen neural networks".
These are based on older, much wider view of neutal networks when they were still inspired by nature rather than being driven by GPU capabilites of matrix operations. Back then, when you did not yet have massive-SIMD architectures yet, there was nothing bad about having neurons self-organize rather than being preorganized in strict layers.
I would not call them clustering although that term is commonly (ab-) used in related work. Because I don't see any strong propery of these "clusters".
SOMs are literally maps as in geography. A SOM is a set of nodes ("neurons") usually arranged in a 2d rectangular or hexagonal grid. (=the map). The positions in the input space are then optimized iteratively to fit the data. Because they influence their neighbors, they cannot move freely. Think of wrapping a net around a tree; the knots of the net are your neurons. NG and GNG appear to be pretty mich the same thing, but with a more flexible structure of nodes. But actually a nice property of SOMs is the 2d map that you can get.
The only approach I remember for clustering was to project the input data to the discrete 2d space of the SOM grid, then run k-means on this projection. It will probably work okayish (as in: it will perform similar to k-means), but I'm not convinced that it's theoretically well supported.
My goal is to solve the XOR problem using a Neural Network. I’ve read countless articles on the theory, proof, and mathematics behind a multi-layered neural network. The theory make sense (math… not so much) but I have a few simple questions regarding the evaluation and topology of a Neural Network.
I feel I am very close to solving this problem, but I am beginning to question my topology and evaluation techniques. The complexities of back propagation aside, I just want to know if my approach to evaluation is correct. With that in mind, here are my questions:
Assuming we have multiple inputs, does each respective input get its’ own node? Do we ever input both values into a single node? Does the order in which we enter this information matter?
While evaluating the graph output, does each node fire as soon as it gets a value? Or do we instead collect all the values from the above layer and then fire off once we’ve consumed all the input?
Does the order of evaluation matter? For example, if a given node in layer “b” is ready to fire – but other nodes in that same layer are still awaiting input – should the ready node fire anyway? Or should all nodes in the layer be loaded up before firing?
Should each layer be connected to all nodes in the following layer?
I’ve attached a picture which should help explain (some of) my questions.
Thank you for your time!
1) Yes, each input gets its own node, and that node is always the node for that input type. The order doesn't matter - you just need to keep it consistent. After all, an untrained neural net can learn to map any set of linearly separable inputs to outputs, so there can't be an order that you need to put the nodes in in order for it to work.
2 and 3) You need to collect all the values from a single layer before any node in the next layer fires. This is important if you're using any activation function other than a stepwise one, because the sum of the inputs will affect the value that is propagated forward. Thus, you need to know what that sum is before you propagate anything.
4) Which nodes to connect to which other nodes is up to you. Since your net won't be excessively large and XOR is a fairly straightforward problem, it will probably be simplest for you to connect all nodes in one layer to all nodes in the next layer (i.e. a fully-connected neural net). There might be specialized cases in other problems where it would be better to not use this topology, but there isn't an easy way to figure it out (most people either use trial and error or a genetic algorithm, as in NEAT), and you don't need to worry about it for the purposes of this problem.
Would someone be able to explain to me or point me to some resources of why (or situations where) more than one hidden layer would be necessary or useful in a neural network?
Basically more layers allow more functions to be represented. The standard book for AI courses, "Artificial Intelligence, A Modern Approach" by Russell and Norvig, goes into some detail of why multiple layers matter in Chapter 20.
One important point is that with a sufficiently large single hidden layer, you can represent every continuous function, but you will need at least 2 layers to be able to represent every discontinuous function.
In practice, though, a single layer is enough at least 99% of the time.
That's more similar to the way the brain works (which might not necessarily be a computational advantage, but a lot of people are researching NN to gain insight about the way the mind works, rather than to solve real world problems.
Its easier to achieve some kinds of invariance using more layers. For example, an image classifier that works regardless of where in the image the object is found, or the object's size. see Bouvrie, J. , L. Rosasco, and T. Poggio. "On Invariance in Hierarchical Models". Advances in Neural Information Processing Systems (NIPS) 22, 2009.
Each layer effectively raises the potential "complexity" of adaptation in an exponential fashion (as opposed to a multiplicative fashion of adding more nodes to a single layer).