How to feed complex number input to a neural network? - machine-learning

I am trying to train a neural network with complex numbers as input, as generally suggested I split the complex number into real and imaginary part for feeding the data into network. However, what if I use modulus (argument) and phase of the complex number instead of real and imaginary part? How will it affect my training process? I am unable to have a conclusive thought on this.

Related

Neural networks physical model

Is neural network=Tree data structure?.If not how neural networks r described using physical model?
For ex:An array is considered as collection of similar data under same name(sequential memory allocation=physical model)
A neural network is not a data structure, but rather a tool used in machine learning to create artificial neural networks, which are an oversimplification of how we think that the human brain works.
You can use arrays to construct your network, but these will just be a component in a much larger piece.
For instance, in a neural network, one neuron from a given layer, is connected to one or more neurons from its adjacent layers. These connections, in turn, have weights. These weighted connections can be described by an array.
Another way in which arrays can be used in neural networks is when it comes to its inputs. Neural networks tend to take vectors as input, which can be described using arrays., for instance, if you have a neural network which operates with numbers, you can transform your number into binary and then store that binary number in an array, for instance, the number 8 could be transformed as [0, 0, 0, 1, 0].
The main difference between the two is, Tree is a data structure, and Neural Network is a learning algorithm. Tree has one root and several leaves. There is no such structure in Neural Network. You may feel by the looks of it that they are similar, but it's not. Trees can store data, hence their type - data structure. But neural network doesn't; it is a function which takes input, applies some weights and biases to work on the input to find the input's closeness to the possible or expected output(s).

Can a recurrent neural network learn slightly different sequences at once?

Can a recurrent neural network be used to learn a sequence with slightly different variations? For example, could I get an RNN trained so that it could produce a sequence of consecutive integers or alternate integers if I have enough training data?
For example, if I train using
1,2,3,4
2,3,4,5
3,4,5,6
and so on
and also train the same network using
1,3,5,7
2,4,6,8
3,5,7,9
and so on,
would I be able to predict both sequences successfully for the test set?
What if I have even more variations in the training data like sequences of every three integers or every four integers, et cetera?
Yes, provided there is enough information in the sequence so that it is not ambiguous, a neural network should be able to learn to complete these sequences correctly.
You should note a few details though:
Neural networks, and ML models in general, are bad at extrapolation. A simple network is very unlikely to learn about sequences in general. It will never learn the concept of sequence logic in the way a child quickly would. So if you feed in test data outside of its experience (e.g. steps of 3 between items, when they were not in the training data), it will perform badly.
Neural networks prefer scaled inputs - a common pre-processing step is to normalise to mean 0 standard deviation 1 for each input column. Whilst it is possible for a network to accept larger range of numbers at inputs, that will reduce effectiveness of training. With a generated training set such as artificial numeric sequences, you may be able to force your way through that by training for longer with more examples.
You will need more neurons, and more layers, to support a larger variation of sequences.
For a RNN, it will predict badly if the sequence it has processed so far is ambiguous. E.g. if you train 1,2,3,4 and 1,2,3,5 with equal numbers of samples, it will predict either 4.5 (for regression) or 50% chance 4 or 5 (for classifier) when it shown sequence 1,2,3 and asked to predict.

Machine learning with a variable-sized real vector of inputs?

I have a collection of objects with properties that I measure. For each object, I obtain a vector of real numbers describing that object. The vector is always incomplete: there's usually numbers missing from the beginning or end of what would be the complete vector, and sometimes there is information missing in the middle. Hence, each object results in a vector of a different length. I also measure, say, the mass of each object, and I now want to relate the vector of things I've measured to the mass.
It's common in my field (astrophysics) to extract features from this vector of real numbers, e.g. take an average or some linear combinations of the values; and then use those extracted features to infer the mass (or whatever) using for example neural networks. It was recently shown, however, that a very complex combination of the elements of the vector result in a much better model of the mass.
There are still residuals in this model, however, even when working on simulated data. Presumably then there is a better way out there to manipulate these variable-length vectors in order to get a better model.
I am wondering if it is possible to do machine learning with real-valued input vectors of all different lengths. I know for text mining there are things like the bag-of-words approach, but it is unclear how such a method would work on real-valued vectors. I know recurrent neural networks work on sentences of variable length, but I'm not sure they work for real-valued vectors. I have also considered imputing the missing data; however, sometimes it is missing for physical reasons, i.e. a value in such-and-such place cannot exist, and so imputing it would violate the physicality of the situation.
Is there any research in this area?
Recurrent Neural Networks (RNNs) are capable of taking a variable-sized input vector of length n and producing a variable sized output vector of length m.
There are many ways to make RNNs work. The most common cell types are called Long short-term memory (LSTM) and Gated Recurrent Unit (GRU).
You might want to read:
The Unreasonable Effectiveness of Recurrent Neural Networks: Nice to get an idea what RNNs are capable of, especially character predictors. It is easy to read, but not exactly what you're searching.
Understanding LSTM Networks: More technical; very well written
Sepp Hochreiter, Jurgen Schmidhuber: LONG SHORT-TERM MEMORY
RNNs in TensorFlow
However, training RNNs takes a lot of training data. You might be better off with computing a fixed-size feature vector from it. But you never know when you don't try it ;-)

What is a Recurrent Neural Network, what is a Long Short Term Memory (LSTM) network, and is it always better? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 5 years ago.
Improve this question
First, let me apologize for cramming three questions in that title. I'm not sure what better way is there.
I'll get right to it. I think I understand feedforward neural networks pretty well.
But LSTM really escapes me, and I feel maybe this is because I don't have a very good grasp of Recurrent neural networks in general. I have went through Hinton's and Andrew Ng's course on Coursera. A lot of it still doesn't make sense to me.
From what I understood, recurrent neural networks are different from feedforward neural networks in that past values influence the next prediction. Recurrent neural network are generally used for sequences.
The example I saw of recurrent neural network was binary addition.
010
+ 011
A recurrent neural network would take the right most 0 and 1 first, output a 1. Then take the 1,1 next, output a zero, and carry the 1. Take the next 0,0 and output a 1 because it carried the 1 from last calculation. Where does it store this 1? In feed forward networks the result is basically:
y = a(w*x + b)
where w = weights of connections to previous layer
and x = activation values of previous layer or inputs
How is a recurrent neural network calculated? I am probably wrong but from what I understood, recurrent neural networks are pretty much feedforward neural network with T hidden layers, T being number of timesteps. And each hidden layer takes the X input at timestep T and it's outputs are then added to the next respective hidden layer's inputs.
a(l) = a(w*x + b + pa)
where l = current timestep
and x = value at current timestep
and w = weights of connections to input layer
and pa = past activation values of hidden layer
such that neuron i in layer l uses the output value of neuron i in layer l-1
y = o(w*a(l-1) + b)
where w = weights of connections to last hidden layer
But even if I understood this correctly, I don't see the advantage of doing this over simply using past values as inputs to a normal feedforward network (sliding window or whatever it's called).
For example, what is the advantage of using a recurrent neural network for binary addition instead of than training a feedforward network with two output neurons. One for the binary result and the other for the carry? And then take the carry output and plug it back into the feedforward network.
However, I'm not sure how is this different than simply having past values as inputs in a feedforward model.
It seems to me that the more timesteps there are, recurrent neural networks are only a disadvantage over feedforward networks because of vanishing gradient. Which brings me to my second question, from what I understood, LSTM is a solution to the problem of vanishing gradient. But I have no actual grasp of how they work. Furthermore, are they simply better than recurrent neural networks, or are there sacrifices to using a LSTM?
What is a Recurrent neural network?
The basic idea is that recurrent networks have loops. These loops allow the network to use information from previous passes, which acts as memory. The length of this memory depends on a number of factors but it is important to note that it is not indefinite. You can think of the memory as degrading, with older information being less and less usable.
For example, let's say we just want the network to do one thing: Remember whether an input from earlier was 1, or 0. It's not difficult to imagine a network which just continually passes the 1 around in a loop. However every time you send in a 0, the output going into the loop gets a little lower (This is a simplification, but displays the idea). After some number of passes the loop input will be arbitrarily low, making the output of the network 0. As you are aware, the vanishing gradient problem is essentially the same, but in reverse.
Why not just use a window of time inputs?
You offer an alternative: A sliding window of past inputs being provided as current inputs. That's is not a bad idea, but consider this: While the RNN may have eroded over time, you will always lose the entirety of your time information after you window ends. And while you would remove the vanishing gradient problem, you would have to increase the number of weights of your network by several times. Having to train all those additional weights will hurt you just as badly as (if not worse than) vanishing gradient.
What is an LSTM network?
You can think of LSTM as a special type of RNN. The difference is that LSTM is able to actively maintain self connecting loops without them degrading. This is accomplished through a somewhat fancy activation, involving an additional "memory" output for the self looping connection. The network must then be trained to select what data gets put onto this bus. By training the network to explicit select what to remember, we don't have to worry about new inputs destroying important information, and the vanishing gradient doesn't affect the information we decided to keep.
There are two main drawbacks:
It is more expensive to calculate the network output and apply back propagation. You simply have more math to do because of the complex activation. However this is not as important as the second point.
The explicit memory adds several more weights to each node, all of which must be trained. This increases the dimensionality of the problem, and potentially makes it harder to find an optimal solution.
Is it always better?
Which structure is better depends on a number of factors, like the number of nodes you need for you problem, the amount of available data, and how far back you want your network's memory to reach. However if you only want the theoretical answer, I would say that given infinite data and computing speed, an LSTM is the better choice, however one should not take this as practical advice.
A feed forward neural network has connections from layer n to layer n+1.
A recurrent neural network allows connections from layer n to layer n as well.
These loops allow the network to perform computations on data from previous cycles, which creates a network memory. The length of this memory depends on a number of factors and is an area of active research, but could be anywhere from tens to hundreds of time steps.
To make it a bit more clear, the carried 1 in your example is stored in the same way as the inputs: in a pattern of activation of a neural layer. It's just the recurrent (same layer) connections that allow the 1 to persist through time.
Obviously it would be infeasible to replicate every input stream for more than a few past time steps, and choosing which historical streams are important would be very difficult (and lead to reduced flexibility).
LSTM is a very different model which I'm only familiar with by comparison to the PBWM model, but in that review LSTM was able to actively maintain neural representations indefinitely, so I believe it is more intended for explicit storage. RNNs are more suited to non-linear time series learning, not storage. I don't know if there are drawbacks to using LSTM rather RNNs.
Both RNN and LSTM can be sequence learners. RNN suffers from vanishing gradient point problem. This problem causes the RNN to have trouble in remembering values of past inputs after more than 10 timesteps approx. (RNN can remember previously seen inputs for a few time steps only)
LSTM is designed to solve the vanishing gradient point problem in RNN. LSTM has the capability of bridging long time lags between inputs. In other words, it is able to remember inputs from up to 1000 time steps in the past (some papers even made claims it can go more than this). This capability makes LSTM an advantage for learning long sequences with long time lags. Refer to Alex Graves Ph.D. thesis Supervised Sequence Labelling
with Recurrent Neural Networks for some details. If you are new to LSTM, I recommend Colah's blog for super simple and easy explanation.
However, recent advances in RNN also claim that with careful initialization, RNN can also learn long sequences comparable to the performance of LSTM. A Simple Way to Initialize Recurrent Networks of Rectified Linear Units.

extrapolation with recurrent neural network

I Wrote a simple recurrent neural network (7 neurons, each one is initially connected to all the neurons) and trained it using a genetic algorithm to learn "complicated", non-linear functions like 1/(1+x^2). As the training set, I used 20 values within the range [-5,5] (I tried to use more than 20 but the results were not changed dramatically).
The network can learn this range pretty well, and when given examples of other points within this range, it can predict the value of the function. However, it can not extrapolate correctly and predicting the values of the function outside the range [-5,5]. What are the reasons for that and what can I do to improve its extrapolation abilities?
Thanks!
Neural networks are not extrapolation methods (no matter - recurrent or not), this is completely out of their capabilities. They are used to fit a function on the provided data, they are completely free to build model outside the subspace populated with training points. So in non very strict sense one should think about them as an interpolation method.
To make things clear, neural network should be capable of generalizing the function inside subspace spanned by the training samples, but not outside of it
Neural network is trained only in the sense of consistency with training samples, while extrapolation is something completely different. Simple example from "H.Lohninger: Teach/Me Data Analysis, Springer-Verlag, Berlin-New York-Tokyo, 1999. ISBN 3-540-14743-8" shows how NN behave in this context
All of these networks are consistent with training data, but can do anything outside of this subspace.
You should rather reconsider your problem's formulation, and if it can be expressed as a regression or classification problem then you can use NN, otherwise you should think about some completely different approach.
The only thing, which can be done to somehow "correct" what is happening outside the training set is to:
add artificial training points in the desired subspace (but this simply grows the training set, and again - outside of this new set, network's behavious is "random")
add strong regularization, which will force network to create very simple model, but model's complexity will not guarantee any extrapolation strength, as two model's of exactly the same complexity can have for example completely different limits in -/+ infinity.
Combining above two steps can help building model which to some extent "extrapolates", but this, as stated before, is not a purpose of a neural network.
As far as I know this is only possible with networks which do have the echo property. See Echo State Networks on scholarpedia.org.
These networks are designed for arbitrary signal learning and are capable to remember their behavior.
You can also take a look at this tutorial.
The nature of your post(s) suggests that what you're referring to as "extrapolation" would be more accurately defined as "sequence recognition and reproduction." Training networks to recognize a data sequence with or without time-series (dt) is pretty much the purpose of Recurrent Neural Network (RNN).
The training function shown in your post has output limits governed by 0 and 1 (or -1, since x is effectively abs(x) in the context of that function). So, first things first, be certain your input layer can easily distinguish between negative and positive inputs (if it must).
Next, the number of neurons is not nearly as important as how they're layered and interconnected. How many of the 7 were used for the sequence inputs? What type of network was used and how was it configured? Network feedback will reveal the ratios, proportions, relationships, etc. and aid in the adjustment of network weight adjustments to match the sequence. Feedback can also take the form of a forward-feed depending on the type of network used to create the RNN.
Producing an 'observable' network for the exponential-decay function: 1/(1+x^2), should be a decent exercise to cut your teeth on RNNs. 'Observable', meaning the network is capable of producing results for any input value(s) even though its training data is (far) smaller than all possible inputs. I can only assume that this was your actual objective as opposed to "extrapolation."

Resources