I have already worked with Neural Networks before and know most basics about them. I especially have experience with regular Multi-Layer-Perceptrons. I was now asked by someone if the following is possible and somehow feel challenged to master the problem :)
The Situation
Let's assume I have a program that can encrypt and decrypt regular ASCII-Coded Files. I have no idea at all about the specific encryption method nor the key used. All I know is, that the program can reverse the encryption and thus read the original content.
What I want?
Now my question is: Do you think it is possible to train (some kind of) Neural Network which replicates the exact decryption-Algorithm with acceptable effort?
My ideas and work so far
I have not much of experience with encryption. Someone suggested just to assume AES encryption, so I could write a little program to batch-encrypt ASCII-Coded files. So this would cover the gathering of learning data for supervised learning. Using the encrypted files als input for the neural networks and the original files as training data I could train any net. But now I am stuck, how would you suggest to feed the input and output data to the Neural Network. So how many Input and Output-Neurons would you guys use?
Since I have no Idea what the encrypted files would look like, it might be the best idea to pass the data in binary form. But I can't just use thousands of input and output-neurons and pass all bits at the same time. Maybe recurrent networks and feed one bit after another? Also doesn't sound very effective.
Another problem is, that you can't decrypt partially - meaning you can't be roughly correct. You either got it right or not. To put it other words, in the end the net error has to be zero. From what I have experienced so far with ANN, this is nearly impossible to achieve for big networks. So is this problem solvable?
Another problem is, that you can't decrypt partially - meaning you can't be roughly correct. You either got it right or not.
That's exactly the problem. Neural Networks can approximate continuous functions, meaning that a small change in the input values causes a small change in the output value, while encryption functions/algorithm are designed to be as non-continuous as possible.
I think if that worked, people would be doing it. As far as i know, they aren't doing it.
Seriously, if you could just throw a lot of plaintext/ciphertext pairs at a neural network and construct a decrypter, then it would be a very effective known-plaintext or chosen-plaintext attack. Yet the attacks of that kind we have against current ciphers are not very effective at all. That means that either the entire open cryptographic community has missed the idea, or it doesn't work. I realise that this is far from a conclusive argument (it's effectively an argument from authority), but i would suggest it's indicative that this approach won't work.
Say you have two keys A and B that translate ciphertext K into Pa and Pb respectively. Pa and Pb are both "correct" decryptions of ciphertext K. So if your neural network has only K as input, it has no means of actually predicting the correct answer. Most ways of encryption cracking involve looking at the result to if it looks like what you're after. For example, readable text is more likely to be the plaintext than apparently random junk. A neural network would need to be good at guessing if it got the right answer according to what the user would expect the contents to be, which could never be 100% correct.
However, neural networks can in theory learn any function. So if you have enough cyphertext/plaintext pairs for a particular encryption key, then a sufficiently complex neural network can learn to be exactly the decryption algorithm for that particular key.
Also regarding the continuous vs discrete problem, this is basically solved. The outputs have something like the sigmoid function so you just have to pick a threshold for 1 vs 0. .5 could work. With enough training you could in theory get the correct answer for 1 vs 0 100% of the time.
The above assumes that you have one network big enough to process the entire file at once. For arbitrarily sized ciphertext, you would probably need to do blocks at a time with an RNN, but I don't know if that still has the same "compute any function" properties as for a traditional network.
None of this is to say that such a solution is practically doable.
Related
Currently I'm trying to learn how to work with neural networks by reading books, but mostly internet tutorials.
I often see that "XOR is 'Hello World' of neural networks".
But here is a thing: The author of one tutorial says that for neural network that calculates XOR value we should use 1 hidden layer with 2 neurons. Also he uses backpropagation with deltas to adjust weights.
I implemented this, but even after 1 million epochs I have a problem that network is stuck with input data 1 and 1. There should be "0" as an answer, but answer is usually 0.5something. I checked my code, it is correct.
If I'll try to add just 1 more neuron in the hidden layer, network is successfully calculating XOR after ~50 000 epochs.
At the same time some people saying that "XOR is not a trivial task and we should use network with 2-3 or more layers". Why?
Come on, if XOR creates so much problems, maybe we shouldn't use it as 'hello world' of neural networks? Please explain what is going on.
So neural networks are really interesting. Theres a proof that says that a single perceptron can learn any linear function given enough time. Even more impressive, a neural network with one hidden layer can apparently learn any function, though I've yet to see a proof on that one.
XOR is a good function for teaching neural networks because as CS students, those in the class are likely already familiar with it. In addition, it is not trivial in the sense that a single perceptron can learn it. it isn't linear. See this graphic I put together.
There is no line that separates these values. YET, it is simple enough for humans to understand, and more importantly, that a human can understand the neural network that can solve it. NN are very blackbox-y, it becomes hard to tell why they work really fast. Hell, here is another network config that can solve XOR.
Your example of a more complicated network solving it faster shows the power that comes from combining more neurons and more layers. Its absolutely unnecessary to use 2-3 hidden layers to solve it, but it sure helps speed up the process.
The point is that it is a simple enough problem to solve by human and on a black-board in class, while also being slightly more challenging than a given linear function.
EDIT: Another fantastic example for teaching NNs practically is the MNIST hand drawn digit classification data set. I find that it very easily shows a problem that is simultaneously very simple for humans to understand, very hard to write a non learning program for, and a very practical use case for machine learning. The problem is that the network structure is impossible to draw on a blackboard and trace what is happening in a way practical for a class. XOR achieves this.
EDIT 2: Also, without the code it will probably be hard to diagnose why it isn't converging. Did you write the neurons yourself? What about the optimization function, etc?
EDIT 3: If the output of your function last node is 0.5, try using a step squashing function that makes all values below .5 into 0, and all values above 0.5 into 1. You only have binary output anyway so why bother with a continuous activation on the last node?
Is there software out there that optimises the best combination of learning rate, weight ranges, hidden layer structure, for a certain task? After presumably trying and failing different combinations? What is this called? As far as I can tell, we just do it manually at the moment...
I know this is not differently code related but am sure it will help many others too. Cheers.
The above comes under multi variate optimization problem, use an optimization algorithm and check the results. Particle Swarm Optimization would do it ( there are however considerations to use this algorithm) as long as you have a cost function to optimize for example the error rate of the network output
I'm an aspiring data scientist (presently just a software developer) and I've just got this (foolish?) idea.
So far (as far as I know), we’ve being using compression algorithms based on replacing the standard way of encoding data by a smarter one. What if we could compress data by comprehension? For example by generating a kind of abstract from which we can recover the original data.
Just think of how our minds work. By associating ideas one with another.
Can machine learning techniques learn and understand the data (and how it is represented on disk) so that it can be generated back from an abstract being generated by the algorithm?
Sure, but then you would have to transfer a representation of the associations and "comprehension" to the other end in order to decompress. That representation will likely be much larger than the data you were trying to compress.
There are actually similar ideas that are, at least to a some degree, already realized.
For instance Autoencoder allows for the compression (encoder part) and re-construction of original data (decoder part).
This technique coupled with the idea of Thought Vector which, sort of, "encodes" concept/meaning/comprehension in a single vector would result into something you have described.
I have a target solution for a system of differential equations that has some unknown parameters.I want to find the values of these parameters for which the solution is closer to the target.Can I do this with neural networks?If yes,how?
I am asking this because a paper I'm reading (unfortunately in Greek) implies to be doing this very thing.
There is the following system of differential equations
The wanted output is
and the control input u has some unknown non-linearities in it which it is stated that are approximated using neural networks. Since there is no data to train a network,I wasn;t able to understand how it is done? Any ideas?
I don't think so; the typical usage of a NN is to learn a pattern from a set of examples, so that it can properly classify examples it hasn't seen. That description doesn't seem to fit your problem.
Update (after question was edited): I don't think the specifics of the equation are relevant. As you say, there is no data with which to train a network, so it would be hard (if not impossible) to evaluate that aspect, as it might as well have been done by flipping coins. Thus, I think you'd have to focus on the other aspects of the paper (assuming there are some).
I am currently trying to set up an Neural Network for information extraction and I am pretty fluent with the (basic) concepts of Neural Networks, except for one which seem to puzzle me. It is probably pretty obvious but I can't seem to found information about it.
Where/How do Neural Networks store their memory? ( / Machine Learning)
There is quite a bit of information available online about Neural Networks and Machine Learning but they all seem to skip over memory storage. For example after restarting the program, where does it find its memory to continue learning/predicting? Many examples online don't seem to 'retain' memory but I can't imagine this being 'safe' for real/big-scale deployment.
I have a difficult time wording my question, so please let me know if I need to elaborate a bit more.
Thanks,
EDIT: - To follow up on the answers below
Every Neural Network will have edge weights associated with them.
These edge weights are adjusted during the training session of a
Neural Network.
This is exactly where I am struggling, how do/should I vision this secondary memory?
Is this like RAM? that doesn't seem logical.. The reason I ask because I haven't encountered an example online that defines or specifies this secondary memory (for example in something more concrete such as an XML file, or maybe even a huge array).
Memory storage is implementation-specific and not part of the algorithm per se. It is probably more useful to think about what you need to store rather than how to store it.
Consider a 3-layer multi-layer perceptron (fully connected) that has 3, 8, and 5 nodes in the input, hidden, and output layers, respectively (for this discussion, we can ignore bias inputs). Then a reasonable (and efficient) way to represent the needed weights is by two matrices: a 3x8 matrix for weights between the input and hidden layers and an 8x5 matrix for the weights between the hidden and output layers.
For this example, you need to store the weights and the network shape (number of nodes per layer). There are many ways you could store this information. It could be in an XML file or a user-defined binary file. If you were using python, you could save both matrices to a binary .npy file and encode the network shape in the file name. If you implemented the algorithm, it is up to you how to store the persistent data. If, on the other hand, you are using an existing machine learning software package, it probably has its own I/O functions for storing and loading a trained network.
Every Neural Network will have edge weights associated with them. These edge weights are adjusted during the training session of a Neural Network. I suppose your doubt is about storing these edge weights. Well, these values are stored separately in a secondary memory so that they can be retained for future use in the Neural Network.
I would expect discussion of the design of the model (neural network) would be kept separate from the discussion of the implementation, where data requirements like durability are addressed.
A particular library or framework might have a specific answer about durable storage, but if you're rolling your own from scratch, then it's up to you.
For example, why not just write the trained weights and topology in a file? Something like YAML or XML could serve as a format.
Also, while we're talking about state/storage and neural networks, you might be interested in investigating associative memory.
This may be answered in two steps:
What is "memory" in a Neural Network (referred to as NN)?
As a neural network (NN) is trained, it builds a mathematical model
that tells the NN what to give as output for a particular input. Think
of what happens when you train someone to speak a new language. The
human brain creates a model of the language. Similarly, a NN creates
mathematical model of what you are trying to teach it. It represents the mapping from input to output as a series of functions. This math model
is the memory. This math model is the weights of different edges in the network. Often, a NN is trained and these weights/connections are written to the hard disk (XML, Yaml, CSV etc). Whenever a NN needs to be used, these values are read back and the network is recreated.
How can you make a network forget its memory?
Think of someone who has been taught two languages. Let us say the individual never speaks one of these languages for 15-20 years, but uses the other one every day. It is very likely that several new words will be learnt each day and many words of the less frequent language forgotten. The critical part here is that a human being is "learning" every day. In a NN, a similar phenomena can be observed by training the network using new data. If the old data were not included in the new training samples, then the underlying math model will change so much that the old training data will no longer be represented in the model. It is possible to prevent a NN from "forgetting" the old model by changing the training process. However, this has the side effect that such a NN cannot learn completely new data samples.
I would say your approach is wrong. Neural Networks are not dumps of memory as we see on the computer. There are no addresses where a particular chunk of memory resides. All the neurons together make sure that a given input leads to a particular output.
Lets compare it with your brain. When you taste sugar, your tongue's taste buds are the input nodes which read chemical signals and transmit electric signals to brain. The brain then determines the taste using the various combinations of electric signals.
There are no lookup tables. There is no primary and secondary memories, only short and long term memory.