Neural networks physical model - machine-learning

Is neural network=Tree data structure?.If not how neural networks r described using physical model?
For ex:An array is considered as collection of similar data under same name(sequential memory allocation=physical model)

A neural network is not a data structure, but rather a tool used in machine learning to create artificial neural networks, which are an oversimplification of how we think that the human brain works.
You can use arrays to construct your network, but these will just be a component in a much larger piece.
For instance, in a neural network, one neuron from a given layer, is connected to one or more neurons from its adjacent layers. These connections, in turn, have weights. These weighted connections can be described by an array.
Another way in which arrays can be used in neural networks is when it comes to its inputs. Neural networks tend to take vectors as input, which can be described using arrays., for instance, if you have a neural network which operates with numbers, you can transform your number into binary and then store that binary number in an array, for instance, the number 8 could be transformed as [0, 0, 0, 1, 0].

The main difference between the two is, Tree is a data structure, and Neural Network is a learning algorithm. Tree has one root and several leaves. There is no such structure in Neural Network. You may feel by the looks of it that they are similar, but it's not. Trees can store data, hence their type - data structure. But neural network doesn't; it is a function which takes input, applies some weights and biases to work on the input to find the input's closeness to the possible or expected output(s).

Related

How to feed complex number input to a neural network?

I am trying to train a neural network with complex numbers as input, as generally suggested I split the complex number into real and imaginary part for feeding the data into network. However, what if I use modulus (argument) and phase of the complex number instead of real and imaginary part? How will it affect my training process? I am unable to have a conclusive thought on this.

Is it better to make neural network to have hierarchical output?

I'm quite new to neural network and I recently built neural network for number classification in vehicle license plate. It has 3 layers: 1 input layer for 16*24(382 neurons) number image with 150 dpi , 1 hidden layer(199 neurons) with sigmoid activation function, 1 softmax output layer(10 neurons) for each number 0 to 9.
I'm trying to expand my neural network to also classify letters in license plate. But I'm worried if I just simply add more classes into output, for example add 10 letters into classification so total 20 classes, it would be hard for neural network to separate feature from each class. And also, I think it might cause problem when input is one of number and neural network wrongly classifies as one of letter with biggest probability, even though sum of probabilities of all number output exceeds that.
So I wonder if it is possible to build hierarchical neural network in following manner:
There are 3 neural networks: 'Item', 'Number', 'Letter'
'Item' neural network classifies whether input is numbers or letters.
If 'Item' neural network classifies input as numbers(letters), then input goes through 'Number'('Letter') neural network.
Return final output from Number(Letter) neural network.
And learning mechanism for each network is below:
'Item' neural network learns all images of numbers and letters. So there are 2 output.
'Number'('Letter') neural network learns images of only numbers(letter).
Which method should I pick to have better classification? Just simply add 10 more classes or build hierarchical neural networks with method above?
I'd strongly recommend training only a single neural network with outputs for all the kinds of images you want to be able to detect (so one output node per letter you want to be able to recognize, and one output node for every digit you want to be able to recognize).
The main reason for this is because recognizing digits and recognizing letters is really kind of exactly the same task. Intuitively, you can understand a trained neural network with multiple layers as performing the recognition in multiple steps. In the hidden layer it may learn to detect various kinds of simple, primitive shapes (e.g. the hidden layer may learn to detect vertical lines, horizontal lines, diagonal lines, certain kinds of simple curved shapes, etc.). Then, in the weights between hidden and output layers, it may learn how to recognize combinations of multiple of these primitive shapes as a specific output class (e.g. a vertical and a horizontal line in roughly the correct locations may be recoginzed as a capital letter L).
Those "things" it learns in the hidden layer will be perfectly relevant for digits as well as letters (that vertical line which may indicate an L may also indicate a 1 when combined with other shapes). So, there are useful things to learn that are relevant for both ''tasks'', and it will probably be able to learn these things more easily if it can learn them all in the same network.
See also a this answer I gave to a related question in the past.
I'm trying to expand my neural network to also classify letters in license plate. But i'm worried if i just simply add more classes into output, for example add 10 letters into classification so total 20 classes, it would be hard for neural network to separate feature from each class.
You're far from where it becomes problematic. ImageNet has 1000 classes and is commonly done in a single network. See the AlexNet paper. If you want to learn more about CNNs, have a look at chapter 2 of "Analysis and Optimization of
Convolutional Neural Network Architectures". And when you're on it, see chapter 4 for hirarchical classification. You can read the summary for ... well, a summary of it.

What does "sparse" mean in the context of neural nets?

I've seen "sparse" and "sparsity" used in a way that suggests it's something that improves a model's accuracy. For example:
I think the unsupervised phase might be not so important if some
sparse connections or neurons are used, such as rectifier units or
convolutional connection, and big training data is available.
From https://www.quora.com/When-does-unsupervised-pre-training-improve-classification-accuracy-for-a-deep-neural-network-When-does-it-not
What does "sparse" mean in this context?
TL;DR: Sparsity means most of the weights are 0. This can lead to an increase in space and time efficiency.
Detailed version: In general, neural networks are represented as tensors. Each layer of neurons is represented by a matrix. Each entry in the matrix can be thought of as representative of the connection between two neurons. In a simple neural network, like a classic feed-forward neural network, every neuron on a given layer is connected to every neuron on the subsequent layer. This means that each layer must have n2 connections represented, where n is the size of both of the layers. In large networks, this can take a lot of memory and time to propagate. Since different parts of a neural network often work on different subtasks, it can be unnecessary for every neuron to be connected to every neuron in the next layer. In fact, it might make sense for a neural network to have most pairs of neurons with a connection weight of 0. Training a neural network might result in these less significant connection weights adopting values very close to 0 but accuracy would not be significantly affected if the values were exactly 0.
A matrix in which most entries are 0 is called a sparse matrix. These matrices can be stored more efficiently and certain computations can be carried out more efficiently on them provided the matrix is sufficiently large and sparse. Neural networks can leverage the efficiency gained from sparsity by assuming most connection weights are equal to 0.
I must say that neural networks are a complex and diverse topic. There are a lot of approaches used. There are certain kinds of neural networks with different morphologies than the simple layer connections I referenced above. Sparsity can be leveraged in many types of neural networks since matrices are fairly universal to neural network representation.
Sparse, as can be deduced from the meaning in layman English refers to sparsity in the connections between neurons, basically, the weights have non-significant values (close to 0)
In some cases it might also refer to cases where we do not have all connections and very less connections itself (less weights)

Why use a restricted Boltzmann machine rather than a multi-layer perceptron?

I'm trying to understand the difference between a restricted Boltzmann machine (RBM), and a feed-forward neural network (NN). I know that an RBM is a generative model, where the idea is to reconstruct the input, whereas an NN is a discriminative model, where the idea is the predict a label. But what I am unclear about, is why you cannot just use a NN for a generative model? In particular, I am thinking about deep belief networks and multi-layer perceptrons.
Suppose my input to the NN is a set of notes called x, and my output of the NN is a set of nodes y. In a discriminative model, my loss during training would be the difference between y, and the value of y that I want x to produce (e.g. ground truth probabilities for class labels). However, what about if I just made the output have the same number of nodes as the input, and then set the loss to be the difference between x and y? In this way, the network would learn to reconstruct the input, like in an RBM.
So, given that a NN (or a multi-layer perceptron) can be used to train a generative model in this way, why would you use an RBM (or a deep belief network) instead? Or in this case, would they be exactly the same?
You can use a NN for a generative model in exactly the way you describe. This is known as an autoencoder, and these can work quite well. In fact, these are often the building blocks of deep belief networks.
An RBM is a quite different model from a feed-forward neural network. They have connections going both ways (forward and backward) that have a probabilistic / energy interpretation. You'll need to read the details to understand.
A deep belief network (DBN) is just a neural network with many layers. This can be a large NN with layers consisting of a sort of autoencoders, or consist of stacked RBMs. You need special methods, tricks and lots of data for training these deep and large networks. Simple back-propagation suffers from the vanishing gradients problem. But if you do manage to train them, they can be very powerful (encode "higher level" concepts).
Hope this helps to point you in the right directions.

extrapolation with recurrent neural network

I Wrote a simple recurrent neural network (7 neurons, each one is initially connected to all the neurons) and trained it using a genetic algorithm to learn "complicated", non-linear functions like 1/(1+x^2). As the training set, I used 20 values within the range [-5,5] (I tried to use more than 20 but the results were not changed dramatically).
The network can learn this range pretty well, and when given examples of other points within this range, it can predict the value of the function. However, it can not extrapolate correctly and predicting the values of the function outside the range [-5,5]. What are the reasons for that and what can I do to improve its extrapolation abilities?
Thanks!
Neural networks are not extrapolation methods (no matter - recurrent or not), this is completely out of their capabilities. They are used to fit a function on the provided data, they are completely free to build model outside the subspace populated with training points. So in non very strict sense one should think about them as an interpolation method.
To make things clear, neural network should be capable of generalizing the function inside subspace spanned by the training samples, but not outside of it
Neural network is trained only in the sense of consistency with training samples, while extrapolation is something completely different. Simple example from "H.Lohninger: Teach/Me Data Analysis, Springer-Verlag, Berlin-New York-Tokyo, 1999. ISBN 3-540-14743-8" shows how NN behave in this context
All of these networks are consistent with training data, but can do anything outside of this subspace.
You should rather reconsider your problem's formulation, and if it can be expressed as a regression or classification problem then you can use NN, otherwise you should think about some completely different approach.
The only thing, which can be done to somehow "correct" what is happening outside the training set is to:
add artificial training points in the desired subspace (but this simply grows the training set, and again - outside of this new set, network's behavious is "random")
add strong regularization, which will force network to create very simple model, but model's complexity will not guarantee any extrapolation strength, as two model's of exactly the same complexity can have for example completely different limits in -/+ infinity.
Combining above two steps can help building model which to some extent "extrapolates", but this, as stated before, is not a purpose of a neural network.
As far as I know this is only possible with networks which do have the echo property. See Echo State Networks on scholarpedia.org.
These networks are designed for arbitrary signal learning and are capable to remember their behavior.
You can also take a look at this tutorial.
The nature of your post(s) suggests that what you're referring to as "extrapolation" would be more accurately defined as "sequence recognition and reproduction." Training networks to recognize a data sequence with or without time-series (dt) is pretty much the purpose of Recurrent Neural Network (RNN).
The training function shown in your post has output limits governed by 0 and 1 (or -1, since x is effectively abs(x) in the context of that function). So, first things first, be certain your input layer can easily distinguish between negative and positive inputs (if it must).
Next, the number of neurons is not nearly as important as how they're layered and interconnected. How many of the 7 were used for the sequence inputs? What type of network was used and how was it configured? Network feedback will reveal the ratios, proportions, relationships, etc. and aid in the adjustment of network weight adjustments to match the sequence. Feedback can also take the form of a forward-feed depending on the type of network used to create the RNN.
Producing an 'observable' network for the exponential-decay function: 1/(1+x^2), should be a decent exercise to cut your teeth on RNNs. 'Observable', meaning the network is capable of producing results for any input value(s) even though its training data is (far) smaller than all possible inputs. I can only assume that this was your actual objective as opposed to "extrapolation."

Resources