Machine learning cleaning a dataset - machine-learning

I am a noob with machine learning, but I really excited to be honest.
My question is: I have built a dataset related to DDoS attacks, in my topology I have two pcs, pc1 and pc2. Pc1 sends legitimate traffic, pc2 sends a DDoS attack.
Now I have my dataset and I started with the basics,I was cleaning.
In this case, I could remove all columns with "0" values ?
Because I think, that if the column has only 0 values, this column should not be useful for my analyses, because there is nothing that differentiates the traffic between this two pcs, makes sense ?
And what other things should I do before I apply a machine learning algorithm? I don't see any missing values in my dataset.
And any recommendations about algorithms? My dataset is label and I was think about decision three or random forest regression.


How to interpret weight distributions of neural net layers

I have designed a 3 layer neural network whose inputs are the concatenated features from a CNN and RNN. The weights learned by network take very small values. What is the reasonable explanation for this? and how to interpret the weight histograms and distributions in Tensorflow? Any good resource for it?
This is the weight distribution of the first hidden layer of a 3 layer neural network visualized using tensorboard. How to interpret this? all the weights are taking up zero value?
This is the weight distribution of the second hidden layer of a 3 layer neural:
how to interpret the weight histograms and distributions in Tensorflow?
Well, you probably didn't realize it, but you have just asked the 1 million dollar question in ML & AI...
Model interpretability is a hyper-active and hyper-hot area of current research (think of holy grail, or something), which has been brought forward lately not least due to the (often tremendous) success of deep learning models in various tasks; these models are currently only black boxes, and we naturally feel uncomfortable about it...
Any good resource for it?
Probably not exactly the kind of resources you were thinking of, and we are well off a SO-appropriate topic here, but since you asked...:
A recent (July 2017) article in Science provides a nice overview of the current status & research: How AI detectives are cracking open the black box of deep learning (no in-text links, but googling names & terms will pay off)
DARPA itself is currently running a program on Explainable Artificial Intelligence (XAI)
There was a workshop in NIPS 2016 on Interpretable Machine Learning for Complex Systems
On a more practical level:
The Layer-wise Relevance Propagation (LRP) toolbox for neural networks (paper, project page, code, TF Slim wrapper)
FairML: Auditing Black-Box Predictive Models, by Fast Forward Labs (blog post, paper, code)
A very recent (November 2017) paper by Geoff Hinton, Distilling a Neural Network Into a Soft Decision Tree, with an independent PyTorch implementation
SHAP: A Unified Approach to Interpreting Model Predictions (paper, authors' code)
These should be enough for starters, and to give you a general idea of the subject about which you asked...
UPDATE (Oct 2018): I have put up a much more detailed list of practical resources in my answer to the question Predictive Analytics - “Why” factor?
The weights learned by network take very small values. What is the reasonable explanation for this? How to interpret this? all the weights are taking up zero value?
Not all weights are zero, but many are. One reason is regularization (in combination with a large, i.e. wide layers, network) Regularization makes weights small (both L1 and L2). If your network is large, most weights are not needed, i.e., they can be set to zero and the model still performs well.
How to interpret the weight histograms and distributions in Tensorflow? Any good resource for it?
I am not so sure about weight distributions. There is some work that analysis them, but I am not aware of a general interpretation, e.g., for CNNs it is known that center weights of a filter/feature usually have larger magnitude than those in corners, see [Locality-Promoting Representation Learning, 2021, ICPR,]
For CNNs you can also visualize weights directly, if you have large filters. For example, for (simpl)e networks you can see that weights first converge towards some kind of class average before overfitting starts. This is shown in Figure 2 of [The learning phases in NN: From Fitting the Majority to Fitting a Few, 2022,]
Rather than going for weights, you can also look at what samples trigger the strongest activations for specific features. If you don't want to look at single features, there is also the possibility to visualize what the network actually remembers on the input, e.g., see [Explaining Neural Networks by Decoding Layer Activations,].
These are just a few examples (Disclaimer I authored these works) - there is thousands of other works on explainability out there.

Why is lift for neural network that stable in SAS Viya demo?

I'm looking at the SAS Viya machine learing demo. It races some machine Learning algorithms against each other on a given dataset. All models produce almost equally good "lift" as shown in lift diagrams in the output.
If you tweak the Learning to perform on a smaller subset of the data; only 0.002% of the total data set (proc partition data=&casdata partition samppct=0.002;), most algorithms get into problems producing lift.
But the neural network is still performing very well. Feature or bug? I could imagine that the script does not re-initilize the network, but it is hard to guess from the calls alone.
I got good answers over at the SAS Community posted by BrettWujek and Xinmin there:
Mats - the short answer without running some studies of my own is that neural networks are highly adaptive and can train very accurate models with far fewer observations than many other techniques. The tree-based models are going to be quite unstable with very few observations. In this case you sampled all the way down to around 20 observations...even that might be sufficient for a neural network if the space it not overly nonlinear.
As for your last comment - it seems you are referring to what is known as warm start, where a previously trained model can be used as a starting point and refined by providing new observations. That is NOT what is happening here, as that capability is only coming available in our upcoming release which is just over a month away.
And I've got some detail on this from Xinmin:
Mats, PROC NNET initializes weight random, if you specify a seed in the train statement, the initial weights are repeatable. NNET training is powered by a sophiscated nonlinear optimization solver, if the log shows "converged" status, it means the model is fit very well.

cluster analysis? label the cluster

I am quite confused about following two problems:
I have a 15 dimensional dataset which should be used to cluster how many types of attacks are contained in the dataset.
1. now i have already clustered my dataset into 5 clusters (5 attacks). Does anyone know how can i point which cluster is which attack? (how to label the clusters not by just "cluster 1,cluster 2...")
2. In supervised classification, we have training dataset and testing dataset, and the testing is conducted with the classifier built from traning dataset. My question is, can the same approach be used for clustering. Like building a model with clustering algorithm, and then automatically classify the new instance into a specific cluster? Is this achievable?
How should an unsupervised method be able to identify named attacks?
The human-assigned name is not in the data!
For some clustering algorithms you can assign new instances automatically, but in general you cannot (not without knowing the model used by the clustering). In the worst case, a new observation would even e.g. merge two clusters into one. What are you going to do then?
If you want classification, use classification, not clustering.
Clustering has a very different mind-set. If you approach it from a classification point of view, you will not really understand it. You use clustering for finding something unknown in data, classification for generalizing something known to new data.
If necessary, you can also train a classifier on your cluster. But don't do this blindly. First make sure that the clusters actually are something useful. It's much easier to come up with a completely meaningless clustering result than with a good clustering. Training a classifier on worthless clusters won't produce a meaningful output.

Neural Network / Machine Learning memory storage

I am currently trying to set up an Neural Network for information extraction and I am pretty fluent with the (basic) concepts of Neural Networks, except for one which seem to puzzle me. It is probably pretty obvious but I can't seem to found information about it.
Where/How do Neural Networks store their memory? ( / Machine Learning)
There is quite a bit of information available online about Neural Networks and Machine Learning but they all seem to skip over memory storage. For example after restarting the program, where does it find its memory to continue learning/predicting? Many examples online don't seem to 'retain' memory but I can't imagine this being 'safe' for real/big-scale deployment.
I have a difficult time wording my question, so please let me know if I need to elaborate a bit more.
EDIT: - To follow up on the answers below
Every Neural Network will have edge weights associated with them.
These edge weights are adjusted during the training session of a
Neural Network.
This is exactly where I am struggling, how do/should I vision this secondary memory?
Is this like RAM? that doesn't seem logical.. The reason I ask because I haven't encountered an example online that defines or specifies this secondary memory (for example in something more concrete such as an XML file, or maybe even a huge array).
Memory storage is implementation-specific and not part of the algorithm per se. It is probably more useful to think about what you need to store rather than how to store it.
Consider a 3-layer multi-layer perceptron (fully connected) that has 3, 8, and 5 nodes in the input, hidden, and output layers, respectively (for this discussion, we can ignore bias inputs). Then a reasonable (and efficient) way to represent the needed weights is by two matrices: a 3x8 matrix for weights between the input and hidden layers and an 8x5 matrix for the weights between the hidden and output layers.
For this example, you need to store the weights and the network shape (number of nodes per layer). There are many ways you could store this information. It could be in an XML file or a user-defined binary file. If you were using python, you could save both matrices to a binary .npy file and encode the network shape in the file name. If you implemented the algorithm, it is up to you how to store the persistent data. If, on the other hand, you are using an existing machine learning software package, it probably has its own I/O functions for storing and loading a trained network.
Every Neural Network will have edge weights associated with them. These edge weights are adjusted during the training session of a Neural Network. I suppose your doubt is about storing these edge weights. Well, these values are stored separately in a secondary memory so that they can be retained for future use in the Neural Network.
I would expect discussion of the design of the model (neural network) would be kept separate from the discussion of the implementation, where data requirements like durability are addressed.
A particular library or framework might have a specific answer about durable storage, but if you're rolling your own from scratch, then it's up to you.
For example, why not just write the trained weights and topology in a file? Something like YAML or XML could serve as a format.
Also, while we're talking about state/storage and neural networks, you might be interested in investigating associative memory.
This may be answered in two steps:
What is "memory" in a Neural Network (referred to as NN)?
As a neural network (NN) is trained, it builds a mathematical model
that tells the NN what to give as output for a particular input. Think
of what happens when you train someone to speak a new language. The
human brain creates a model of the language. Similarly, a NN creates
mathematical model of what you are trying to teach it. It represents the mapping from input to output as a series of functions. This math model
is the memory. This math model is the weights of different edges in the network. Often, a NN is trained and these weights/connections are written to the hard disk (XML, Yaml, CSV etc). Whenever a NN needs to be used, these values are read back and the network is recreated.
How can you make a network forget its memory?
Think of someone who has been taught two languages. Let us say the individual never speaks one of these languages for 15-20 years, but uses the other one every day. It is very likely that several new words will be learnt each day and many words of the less frequent language forgotten. The critical part here is that a human being is "learning" every day. In a NN, a similar phenomena can be observed by training the network using new data. If the old data were not included in the new training samples, then the underlying math model will change so much that the old training data will no longer be represented in the model. It is possible to prevent a NN from "forgetting" the old model by changing the training process. However, this has the side effect that such a NN cannot learn completely new data samples.
I would say your approach is wrong. Neural Networks are not dumps of memory as we see on the computer. There are no addresses where a particular chunk of memory resides. All the neurons together make sure that a given input leads to a particular output.
Lets compare it with your brain. When you taste sugar, your tongue's taste buds are the input nodes which read chemical signals and transmit electric signals to brain. The brain then determines the taste using the various combinations of electric signals.
There are no lookup tables. There is no primary and secondary memories, only short and long term memory.

What is machine learning? [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
What is machine learning ?
What does machine learning code do ?
When we say that the machine learns, does it modify the code of itself or it modifies history (database) which will contain the experience of code for given set of inputs?
What is a machine learning ?
Essentially, it is a method of teaching computers to make and improve predictions or behaviors based on some data. What is this "data"? Well, that depends entirely on the problem. It could be readings from a robot's sensors as it learns to walk, or the correct output of a program for certain input.
Another way to think about machine learning is that it is "pattern recognition" - the act of teaching a program to react to or recognize patterns.
What does machine learning code do ?
Depends on the type of machine learning you're talking about. Machine learning is a huge field, with hundreds of different algorithms for solving myriad different problems - see Wikipedia for more information; specifically, look under Algorithm Types.
When we say machine learns, does it modify the code of itself or it modifies history (Data Base) which will contain the experience of code for given set of inputs ?
Once again, it depends.
One example of code actually being modified is Genetic Programming, where you essentially evolve a program to complete a task (of course, the program doesn't modify itself - but it does modify another computer program).
Neural networks, on the other hand, modify their parameters automatically in response to prepared stimuli and expected response. This allows them to produce many behaviors (theoretically, they can produce any behavior because they can approximate any function to an arbitrary precision, given enough time).
I should note that your use of the term "database" implies that machine learning algorithms work by "remembering" information, events, or experiences. This is not necessarily (or even often!) the case.
Neural networks, which I already mentioned, only keep the current "state" of the approximation, which is updated as learning occurs. Rather than remembering what happened and how to react to it, neural networks build a sort of "model" of their "world." The model tells them how to react to certain inputs, even if the inputs are something that it has never seen before.
This last ability - the ability to react to inputs that have never been seen before - is one of the core tenets of many machine learning algorithms. Imagine trying to teach a computer driver to navigate highways in traffic. Using your "database" metaphor, you would have to teach the computer exactly what to do in millions of possible situations. An effective machine learning algorithm would (hopefully!) be able to learn similarities between different states and react to them similarly.
The similarities between states can be anything - even things we might think of as "mundane" can really trip up a computer! For example, let's say that the computer driver learned that when a car in front of it slowed down, it had to slow down to. For a human, replacing the car with a motorcycle doesn't change anything - we recognize that the motorcycle is also a vehicle. For a machine learning algorithm, this can actually be surprisingly difficult! A database would have to store information separately about the case where a car is in front and where a motorcycle is in front. A machine learning algorithm, on the other hand, would "learn" from the car example and be able to generalize to the motorcycle example automatically.
Machine learning is a field of computer science, probability theory, and optimization theory which allows complex tasks to be solved for which a logical/procedural approach would not be possible or feasible.
There are several different categories of machine learning, including (but not limited to):
Supervised learning
Reinforcement learning
Supervised Learning
In supervised learning, you have some really complex function (mapping) from inputs to outputs, you have lots of examples of input/output pairs, but you don't know what that complicated function is. A supervised learning algorithm makes it possible, given a large data set of input/output pairs, to predict the output value for some new input value that you may not have seen before. The basic method is that you break the data set down into a training set and a test set. You have some model with an associated error function which you try to minimize over the training set, and then you make sure that your solution works on the test set. Once you have repeated this with different machine learning algorithms and/or parameters until the model performs reasonably well on the test set, then you can attempt to use the result on new inputs. Note that in this case, the program does not change, only the model (data) is changed. Although one could, theoretically, output a different program, but that is not done in practice, as far as I am aware. An example of supervised learning would be the digit recognition system used by the post office, where it maps the pixels to labels in the set 0...9, using a large set of pictures of digits that were labeled by hand as being in 0...9.
Reinforcement Learning
In reinforcement learning, the program is responsible for making decisions, and it periodically receives some sort of award/utility for its actions. However, unlike in the supervised learning case, the results are not immediate; the algorithm could prescribe a large sequence of actions and only receive feedback at the very end. In reinforcement learning, the goal is to build up a good model such that the algorithm will generate the sequence of decisions that lead to the highest long term utility/reward. A good example of reinforcement learning is teaching a robot how to navigate by giving a negative penalty whenever its bump sensor detects that it has bumped into an object. If coded correctly, it is possible for the robot to eventually correlate its range finder sensor data with its bumper sensor data and the directions that sends to the wheels, and ultimately choose a form of navigation that results in it not bumping into objects.
More Info
If you are interested in learning more, I strongly recommend that you read Pattern Recognition and Machine Learning by Christopher M. Bishop or take a machine learning course. You may also be interested in reading, for free, the lecture notes from CIS 520: Machine Learning at Penn.
Machine learning is a scientific discipline that is concerned with the design and development of algorithms that allow computers to evolve behaviors based on empirical data, such as from sensor data or databases. Read more on Wikipedia
Machine learning code records "facts" or approximations in some sort of storage, and with the algorithms calculates different probabilities.
The code itself will not be modified when a machine learns, only the database of what "it knows".
Machine learning is a methodology to create a model based on sample data and use the model to make a prediction or strategy. It belongs to artificial intelligence.
Machine learning is simply a generic term to define a variety of learning algorithms that produce a quasi learning from examples (unlabeled/labeled). The actual accuracy/error is entirely determined by the quality of training/test data you provide to your learning algorithm. This can be measured using a convergence rate. The reason you provide examples is because you want the learning algorithm of your choice to be able to informatively by guidance make generalization. The algorithms can be classed into two main areas supervised learning(classification) and unsupervised learning(clustering) techniques. It is extremely important that you make an informed decision on how you plan on separating your training and test data sets as well as the quality that you provide to your learning algorithm. When you providing data sets you want to also be aware of things like over fitting and maintaining a sense of healthy bias in your examples. The algorithm then basically learns wrote to wrote on the basis of generalization it achieves from the data you have provided to it both for training and then for testing in process you try to get your learning algorithm to produce new examples on basis of your targeted training. In clustering there is very little informative guidance the algorithm basically tries to produce through measures of patterns between data to build related sets of clusters e.g kmeans/knearest neighbor.
some good books:
Introduction to ML (Nilsson/Stanford),
Gaussian Process for ML,
Introduction to ML (Alpaydin),
Information Theory Inference and Learning Algorithms (very useful book),
Machine Learning (Mitchell),
Pattern Recognition and Machine Learning (standard ML course book at Edinburgh and various Unis but relatively a heavy reading with math),
Data Mining and Practical Machine Learning with Weka (work through the theory using weka and practice in Java)
Reinforcement Learning there is a free book online you can read:
IR, IE, Recommenders, and Text/Data/Web Mining in general use alot of Machine Learning principles. You can even apply Metaheuristic/Global Optimization Techniques here to further automate your learning processes. e.g apply an evolutionary technique like GA (genetic algorithm) to optimize your neural network based approach (which may use some learning algorithm). You can approach it purely in form of a probablistic machine learning approach for example bayesian learning. Most of these algorithms all have a very heavy use of statistics. Concepts of convergence and generalization are important to many of these learning algorithms.
Machine learning is the study in computing science of making algorithms that are able to classify information they haven't seen before, by learning patterns from training on similar information. There are all sorts of kinds of "learners" in this sense. Neural networks, Bayesian networks, decision trees, k-clustering algorithms, hidden markov models and support vector machines are examples.
Based on the learner, they each learn in different ways. Some learners produce human-understandable frameworks (e.g. decision trees), and some are generally inscrutable (e.g. neural networks).
Learners are all essentially data-driven, meaning they save their state as data to be reused later. They aren't self-modifying as such, at least in general.
I think one of the coolest definitions of machine learning that I've read is from this book by Tom Mitchell. Easy to remember and intuitive.
A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E
Shamelessly ripped from Wikipedia: Machine learning is a scientific discipline that is concerned with the design and development of algorithms that allow computers to evolve behaviors based on empirical data, such as from sensor data or databases.
Quite simply, machine learning code accomplishes a machine learning task. That can be a number of things from interpreting sensor data to a genetic algorithm.
I would say it depends. No, modifying code is not normal, but is not outside the realm of possibility. I would also not say that machine learning always modifies a history. Sometimes we have no history to build off of. Sometime we simply want to react to the environment, but not actually learn from our past experiences.
Basically, machine learning is a very wide-open discipline that contains many methods and algorithms that make it impossible for there to be 1 answer to your 3rd question.
Machine learning is a term that is taken from the real world of a person, and applied on something that can't actually learn - a machine.
To add to the other answers - machine learning will not (usually) change the code, but it might change it's execution path and decision based on previous data or new gathered data and hence the "learning" effect.
there are many ways to "teach" a machine - you give weights to many parameter of an algorithm, and then have the machine solve it for many cases, each time you give her a feedback about the answer and the machine adjusts the weights according to how close the machine answer was to your answer or according to the score you gave it's answer, or according to some results test algorithm.
This is one way of learning and there are many more...
