How to model a for loop in a neural network - machine-learning

I am currently in the process of learning neural networks and can understand basic examples like AND, OR, Addition, Multiplication, etc.
Right now, I am trying to build a neural network that takes two inputs x and n, and computes pow(x, n). And, this would require the neural network to have some form of a loop, and I am not sure how I can model a network with a loop
Can this sort of computation be modelled on a neural network? I am assuming it is possible.. based on the recently released paper(Neural Turing Machine), but not sure how. Any pointers on this would be very helpful.
Thanks!

Feedforward neural nets are not Turing-complete, and in particular they cannot model loops of arbitrary order. However, if you fix the maximum n that you want to treat, then you can set up an architecture which can model loops with up to n repetitions. For instance, you could easily imagine that each layer could act as one iteration in the loop, so you might need n layers.
For a more general architecture that can be made Turing-complete, you could use Recurrent Neural Networks (RNN). One popular instance in this class are the so-called Long short-term memory (LSTM) networks by Hochreiter and Schmidhuber. Training such RNNs is quite different from training classical feedforward networks, though.

As you pointed out, Neural Turing Machines seem to working well to learn the basic algorithms. For instance, the repeat copy task which has been implemented in the paper, might tell us that NTM can learn the algorithm itself. As of now, NTMs have been used only for simple tasks so understanding its scope by using the pow(x,n) will be interesting given that repeat copy works well. I suggest reading Reinforcement Learning Neural Turing Machines - Revised for a deeper understanding.
Also, recent developments in the area of Memory Networks empower us to perform more complicated tasks. Hence, to make a neural network understand pow(x,n) might be possible. So go ahead and give it a shot!

Related

Neural networks - why everybody has different approach with XOR

Currently I'm trying to learn how to work with neural networks by reading books, but mostly internet tutorials.
I often see that "XOR is 'Hello World' of neural networks".
But here is a thing: The author of one tutorial says that for neural network that calculates XOR value we should use 1 hidden layer with 2 neurons. Also he uses backpropagation with deltas to adjust weights.
I implemented this, but even after 1 million epochs I have a problem that network is stuck with input data 1 and 1. There should be "0" as an answer, but answer is usually 0.5something. I checked my code, it is correct.
If I'll try to add just 1 more neuron in the hidden layer, network is successfully calculating XOR after ~50 000 epochs.
At the same time some people saying that "XOR is not a trivial task and we should use network with 2-3 or more layers". Why?
Come on, if XOR creates so much problems, maybe we shouldn't use it as 'hello world' of neural networks? Please explain what is going on.
So neural networks are really interesting. Theres a proof that says that a single perceptron can learn any linear function given enough time. Even more impressive, a neural network with one hidden layer can apparently learn any function, though I've yet to see a proof on that one.
XOR is a good function for teaching neural networks because as CS students, those in the class are likely already familiar with it. In addition, it is not trivial in the sense that a single perceptron can learn it. it isn't linear. See this graphic I put together.
There is no line that separates these values. YET, it is simple enough for humans to understand, and more importantly, that a human can understand the neural network that can solve it. NN are very blackbox-y, it becomes hard to tell why they work really fast. Hell, here is another network config that can solve XOR.
Your example of a more complicated network solving it faster shows the power that comes from combining more neurons and more layers. Its absolutely unnecessary to use 2-3 hidden layers to solve it, but it sure helps speed up the process.
The point is that it is a simple enough problem to solve by human and on a black-board in class, while also being slightly more challenging than a given linear function.
EDIT: Another fantastic example for teaching NNs practically is the MNIST hand drawn digit classification data set. I find that it very easily shows a problem that is simultaneously very simple for humans to understand, very hard to write a non learning program for, and a very practical use case for machine learning. The problem is that the network structure is impossible to draw on a blackboard and trace what is happening in a way practical for a class. XOR achieves this.
EDIT 2: Also, without the code it will probably be hard to diagnose why it isn't converging. Did you write the neurons yourself? What about the optimization function, etc?
EDIT 3: If the output of your function last node is 0.5, try using a step squashing function that makes all values below .5 into 0, and all values above 0.5 into 1. You only have binary output anyway so why bother with a continuous activation on the last node?

Why is lift for neural network that stable in SAS Viya demo?

I'm looking at the SAS Viya machine learing demo. It races some machine Learning algorithms against each other on a given dataset. All models produce almost equally good "lift" as shown in lift diagrams in the output.
If you tweak the Learning to perform on a smaller subset of the data; only 0.002% of the total data set (proc partition data=&casdata partition samppct=0.002;), most algorithms get into problems producing lift.
But the neural network is still performing very well. Feature or bug? I could imagine that the script does not re-initilize the network, but it is hard to guess from the calls alone.
I got good answers over at the SAS Community posted by BrettWujek and Xinmin there:
Mats - the short answer without running some studies of my own is that neural networks are highly adaptive and can train very accurate models with far fewer observations than many other techniques. The tree-based models are going to be quite unstable with very few observations. In this case you sampled all the way down to around 20 observations...even that might be sufficient for a neural network if the space it not overly nonlinear.
As for your last comment - it seems you are referring to what is known as warm start, where a previously trained model can be used as a starting point and refined by providing new observations. That is NOT what is happening here, as that capability is only coming available in our upcoming release which is just over a month away.
Brett
And I've got some detail on this from Xinmin:
Mats, PROC NNET initializes weight random, if you specify a seed in the train statement, the initial weights are repeatable. NNET training is powered by a sophiscated nonlinear optimization solver, if the log shows "converged" status, it means the model is fit very well.

Clarification on a Neural Net that plays Snake

I'm new to neural networks/machine learning/genetic algorithms, and for my first implementation I am writing a network that learns to play snake (An example in case you haven't played it before) I have a few questions that I don't fully understand:
Before my questions I just want to make sure I understand the general idea correctly. There is a population of snakes, each with randomly generated DNA. The DNA is the weights used in the neural network. Each time the snake moves, it uses the neural net to decide where to go (using a bias). When the population dies, select some parents (maybe highest fitness), and crossover their DNA with a slight mutation chance.
1) If given the whole board as an input (about 400 spots) enough hidden layers (no idea how many, maybe 256-64-32-2?), and enough time, would it learn to not box itself in?
2) What would be good inputs? Here are some of my ideas:
400 inputs, one for each space on the board. Positive if snake should go there (the apple) and negative if it is a wall/your body. The closer to -1/1 it is the closer it is.
6 inputs: game width, game height, snake x, snake y, apple x, and apple y (may learn to play on different size boards if trained that way, but not sure how to input it's body, since it changes size)
Give it a field of view (maybe 3x3 square in front of head) that can alert the snake of a wall, apple, or it's body. (the snake would only be able to see whats right in front unfortunately, which could hinder it's learning ability)
3) Given the input method, what would be a good starting place for hidden layer sizes (of course plan on tweaking this, just don't know what a good starting place)
4) Finally, the fitness of the snake. Besides time to get the apple, it's length, and it's lifetime, should anything else be factored in? In order to get the snake to learn to not block itself in, is there anything else I could add to the fitness to help that?
Thank you!
In this post, I will advise you of:
How to map navigational instructions to action sequences with an LSTM
neural network
Resources that will help you learn how to use neural
networks to accomplish your task
How to install and configure neural
network libraries based on what I needed to learn the hard way
General opinion of your idea:
I can see what you're trying to do, and I believe that your game idea (of using randomly generated identities of adversaries that control their behavior in a way that randomly alters the way they're using artificial intelligence to behave intelligently) has a lot of potential.
Mapping navigational instructions to action sequences with a neural network
For processing your game board, because it involves dense (as opposed to sparse) data, you could find a Convolutional Neural Network (CNN) to be useful. However, because you need to translate the map to an action sequence, sequence-optimized neural networks (such as Recurrent Neural Networks) will likely be the most useful for you. I did find some studies that use neural networks to map navigational instructions to action sequences, construct the game map, and move a character through a game with many types of inputs:
Mei, H., Bansal, M., & Walter, M. R. (2015). Listen, attend, and walk: Neural mapping of navigational instructions to action sequences. arXiv preprint arXiv:1506.04089. Available at: Listen, Attend, and Walk: Neural Mapping of Navigational Instructions to Action Sequences
Lample, G., & Chaplot, D. S. (2016). Playing FPS games with deep reinforcement learning. arXiv preprint arXiv:1609.05521. Available at: Super Mario as a String: Platformer Level Generation Via LSTMs
Lample, G., & Chaplot, D. S. (2016). Playing FPS games with deep reinforcement learning. arXiv preprint arXiv:1609.05521. Available at: Playing FPS Games with Deep Reinforcement Learning
Schulz, R., Talbot, B., Lam, O., Dayoub, F., Corke, P., Upcroft, B., & Wyeth, G. (2015, May). Robot navigation using human cues: A robot navigation system for symbolic goal-directed exploration. In Robotics and Automation (ICRA), 2015 IEEE International Conference on (pp. 1100-1105). IEEE. Available at: Robot Navigation Using Human Cues: A robot navigation system for symbolic goal-directed exploration
General opinion of what will help you
It sounds like you're missing some basic understanding of how neural networks work, so my primary recommendation to you is to study more of the underlying mechanics behind neural networks in general. It's important to keep in mind that a neural network is a type of machine learning model. So, it doesn't really make sense to just construct a neural network with random parameters. A neural network is a machine learning model that is trained from sample data, and once it is trained, it can be evaluated on test data (e.g. to perform predictions).
The root of machine learning is largely influenced by Bayesian statistics, so you might benefit from getting a textbook on Bayesian statistics to gain a deeper understanding of how machine-based classification works in general.
It will also be valuable for you to learn the differences between different types of neural networks, such as Long Short Term Memory (LSTM) and Convolutional Neural Networks (CNNs).
If you want to tinker with how neural networks can be used for classification tasks, try this:
Tensorflow Playground
To learn the math:
My professional opinion is that learning the underlying math of neural networks is very important. If it's intimidating, I give you my testimony that I was able to learn all of it on my own. But if you prefer learning in a classroom environment, then I recommend that you try that. A great resource and textbook for learning the mechanics and mathematics of neural networks is:
Neural Networks and Deep Learning
Tutorials for neural network libraries
I recommend that you try working through the tutorials for a neural network library, such as:
TensorFlow tutorials
Deep Learning tutorials with Theano
CNTK tutorials (CNTK 205: Artistic Style Transfer is particularly cool.)
Keras tutorial (Keras is a powerful high-level neural network library that can use either TensorFlow or Theano.)
I saw similar application. Inputs usually were snake coordinates, apple coordinates and some sensory data(is wall next to snake head or no in your case).
Using genetic algorithm is a good idea in this case. You doing only parametric learning(finding set of weights), but structure will be based on your estimation. GA can be also used for structure learning(finding topology of ANN). But using GA for both will be very computational hard.
Professor Floreano did something similar. He use GA for finding weights for neural network controller of robot. Robot was in labyrinth and perform some task. Neural network hidden layer was one neuron with recurrent joints on inputs and one lateral connection on himself. There was two outputs. Outputs were connected on input layer and hidden layer(mentioned one neuron).
But Floreano did something more interesting. He say, We don't born with determined synapses, our synapses change in our lifetime. So he use GA for finding rules for change of synapses. These rules was based on Hebbian learning. He perform node encoding(for all weights connected to neuron will apply same rule). On beginning, he initialized weights on small random values. Finding rules instead of numerical value of synapse leads to better results.
One from Floreno's articles.
And on the and my own experience. In last semester I and my schoolmate get a task finding the rules for synapse with GA but for Spiking neural network. Our SNN was controller for kinematic model of mobile robot and task was lead robot in to the chosen point. We obtained some results but not expected. You can see results here. So I recommend you use "ordinary" ANN instead off SNN because SNN brings new phenomens.

Should the neurons in a neural network be asynchronous?

I am designing a neural network and am trying to determine if I should write it in such a way that each neuron is its own 'process' in Erlang, or if I should just go with C++ and run a network in one thread (I would still use all my cores by running an instance of each network in its own thread).
Is there a good reason to give up the speed of C++ for the asynchronous neurons that Erlang offers?
I'm not sure I understand what you're trying to do. An artificial neural network is essentially represented by the weight of the connections between nodes. The nodes themselves don't exist in isolation; their values are only calculated (at least in feed-forward networks) through the forward-propagation algorithm, when it is given input.
The backpropagation algorithm for updating weights is definitely parallelizable, but that doesn't seem to be what you're describing.
The usefulness of having neurons in a Neural Network (NN), is to have a multi-dimension matrix which coefficients you want to handle ( to train them, to change them, to adapt them little by little, so as they fit well to the problem you want to solve). On this matrix you can apply numerical methods (proven and efficient) so as to find an acceptable solution, in an acceptable time.
IMHO, with NN (namely with back-propagation training method), the goal is to have a matrix which is efficient both at run-time/predict-time, and at training time.
I don't grasp the point of having asynchronous neurons. What would it offers ? what issue would it solve ?
Maybe you could explain clearly what problem you would solve putting them asynchronous ?
I am indeed inverting your question: what do you want to gain with asynchronicity regarding traditional NN techniques ?
It would depend upon your use case: the neural network computational model and your execution environment. Here is a recent paper (2014) by Plotnikova et al, that uses "Erlang and platform Erlang/OTP with predefined base implementation of actor model functions" and a new model developed by the authors that they describe as “one neuron—one process” using "Gravitation Search Algorithm" for training:
http://link.springer.com/chapter/10.1007%2F978-3-319-06764-3_52
To briefly cite their abstract, "The paper develops asynchronous distributed modification of this algorithm and presents the results of experiments. The proposed architecture shows the performance increase for distributed systems with different environment parameters (high-performance cluster and local network with a slow interconnection bus)."
Also, most other answers here reference a computational model that uses matrix operations for the base of training and simulation, for which the authors of this paper compare by saying, "this case neural network model [ie matrix operations based] becomes fully mathematical and its original nature (from neural networks biological prototypes) gets lost"
The tests were run on three types of systems;
IBM cluster is represented as 15 virtual machines.
Distributed system deployed to the local network is represented as 15 physical machines.
Hybrid system is based on the system 2 but each physical machine has four processor cores.
They provide the following concrete results, "The presented results evidence a good distribution ability of gravitation search, especially for large networks (801 and more neurons). Acceleration depends on the node count almost linearly. If we use 15 nodes we can get about eight times acceleration of the training process."
Finally, they conclude regarding their model, "The model includes three abstraction levels: NNET, MLP and NEURON. Such architecture allows encapsulating some general features on general levels and some specific for the considered neural networks features on special levels. Asynchronous message passing between levels allow to differentiate synchronous and asynchronous parts of training and simulation algorithms and, as a result, to improve the use of resources."
It depends what you are after.
2nd Generation of Neural Networks are synchronous. They perform computations on an input-output basis without a delay, and can be trained either through reinforcement or back-propagation. This is the prevailing type of ANN at the moment and the easiest to get started with if you are trying to solve a problem via machine learning, lots of literature and examples available.
3rd Generation of Neural Networks (so-called "Spiking Neural Networks") are asynchronous. Signals propagate internally through the network as a chain-reaction of spiking events, and can create interesting patterns and oscillations depending on the shape of the network. While they model biological brains more closely they are also harder to make use of in a practical setting.
I think that async computation for NNs might prove beneficial for the (recognition) performance. In fact, the result might be similar (maybe less pronounced) to using dropout.
But a straight-forward implementation of async NNs would be much slower, because for synchronous NNs you can use linear algebra libraries, which make good use of vectorization or GPUs.

Convolutional Deep Belief Networks (CDBN) vs. Convolutional Neural Networks (CNN)

Lastly, I started to learn neural networks and I would like know the difference between Convolutional Deep Belief Networks and Convolutional Networks. In here, there is a similar question but there is no exact answer for it. We know that Convolutional Deep Belief Networks are CNNs + DBNs. So, I am going to do an object recognition. I want to know which one is much better than other or their complexity. I searched but I couldn't find anything maybe doing something wrong.
I don't know if you still need an answer but anyway I hope you will find this useful.
A CDBN adds the complexity of a DBN, but if you already have some background it's not that much.
If you are worried about computational complexity instead, it really depends on how you use the DBN part. The role of DBN usually is to initialize the weights of the network for faster convergence. In this scenario, the DBN appears only during pre-training.
You can also use the whole DBN like a discriminative network (keeping the generative power) but the weight initialization provided by it is enough for discriminative tasks. So during an hypothetical real-time utilization, the two system are equal performance-wise.
Also the weight-initialization provided by the first model anyway really helps for difficult task like object recognition (even a good Convolutional Neural network alone doesn't reach good success rate, at least compared to a human) so it's generally a good choice.

Resources