Neural Network: Handling unavailable inputs (missing or incomplete data) [closed] - machine-learning

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 9 years ago.
Improve this question
Hopefully the last NN question you'll get from me this weekend, but here goes :)
Is there a way to handle an input that you "don't always know"... so it doesn't affect the weightings somehow?
Soo... if I ask someone if they are male or female and they would not like to answer, is there a way to disregard this input? Perhaps by placing it squarely in the centre? (assuming 1,0 inputs at 0.5?)
Thanks

You probably know this or suspect it, but there's no statistical basis for guessing or supplying the missing values by averaging over the range of possible values, etc.
For NN in particular, there are quite a few techniques avaialble. The technique i use--that i've coded--is one of the simpler techniques, but it has a solid statistical basis and it's still used today. The academic paper that describes it here.
The theory that underlies this technique is weighted integration over the incomlete data. In practice, no integrals are evaluated, instead they are approximated by closed-form solutions of Gaussian Basis Function networks. As you'll see in the paper (which is a step-by-step explanation, it's simple to implement in your backprop algorithm.

Neural networks are fairly resistant to noise - that's one of their big advantages. You may want to try putting inputs at (-1.0,1.0) instead, with 0 as the non-input input, though. That way the input to the weights from that neuron is 0.0, meaning that no learning will occur there.
Probably the best book I've ever had the misfortune of not finishing (yet!) is Neural Networks and Learning Machines by Simon S. Haykin. In it, he talks about all kinds of issues, including the way you should distribute your inputs/training set for the best training, etc. It's a really great book!

Related

Is it a bad idea to always standardize all features by default? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 1 year ago.
Improve this question
Is there a reason not to standardize all features by default? I realize it may not be necessary for e.g., decision trees but for certain algorithms such as KNN, SVM and K-Means. Would there be any harm just routinely to do this for all of my features?
Also, it seems the consensus that standardization is preferable to normalization? When would this not be a good idea?
Standardization and normalization, in my experience, have the most (positive) impact when your dataset consists of features that have very different ranges (for instance age vs number of dolars per house)
In my professional experience, while working on a project with sensors from the car (time-series), I noticed that normalization (min-max scaling), even though when applied in case of a neural network, had a negative impact upon the training process and of course the final results. Admittedly, were the sensor features(values) very close as values to one another. It was a very interesting result to remark considering that I was working with Time-Series, where most of the data scientists resort to scaling by default (they are neural network in the end, goes along the theory).
In principle, standardization is better to be applied when it comes to having specific outliers in the dataset, since normalization generates smaller standard deviation values. In my humble knowledge this is the main reason standardization tends to be favored over normalization, its robustness over outliers.
Three years ago, if someone asked me this question, I would have said "standardization" is the way to go. Now I say, follow the principles, but test every hypothesis prior to jumping to a certain conclusion.

Which supervised machine learning classification method suits for randomly spread classes? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 2 years ago.
Improve this question
If classes are randomly spread or it is having more noise, which type of supervised ML classification model will give better results, and why?
It is difficult to say which classifier will perform best on general problems. It often requires testing of a variety of algorithms on a given problem in order to determine which classifier performs best.
Best performance is also dependent on the nature of the problem. There is a great answer in this stackoverflow question which looks at various scoring metrics. For each problem, one needs to understand and consider which scoring metric will be best.
All of that said, neural networks, Random Forest classifiers, Support Vector Machines, and a variety of others are all candidates for creating useful models given that classes are, as you indicated, equally distributed. When classes are imbalanced, the rules shift slightly, as most ML algorithms assume balance.
My suggestion would be to try a few different algorithms, and tune the hyper parameters, to compare them for your specific application. You will often find one algorithm is better, but not remarkably so. In my experience, often of far greater importance, is how your data are preprocessed and how your features are prepared. Once again this is a highly generic answer as it depends greatly on your given application.

When to use regression trees/forests? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 7 years ago.
Improve this question
As I was looking for a fine regression algorithm for my problem. I found out one can do that with simple decision trees as well, which is usually used for classification. The output would be something like:
The red noise would be the prediction states of such a tree or forest.
Now my question is, why at all to use this method, when there are alternatives, that really try to figure out the underlying equation (such as the famous support vector machines SVM). Are there any positive / unique aspects, or was a regression tree more a nice-to-have-algorithm?
The image you posted conveys a smooth function of y in x. A regression tree is certainly not the best technique to estimate such a function and I probably wouldn't use SVMs either. This looks like a good application for splines, e.g., by using a GAM (generalized additive model).
A regression tree on the other hand is a handy tool if you haven't got such smooth functions and if you don't know which explanatory variable will have which effect on the response. It will be particularly useful if there are jumps in the response or interactions - especially if the jump points and interaction patterns are not known in advance.

Survey to determine satisfaction: how to find the questions that mattered? [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
If a survey is given to determine overall customer satisfaction, and there are 20 general questions and a final summary question: "What's your overall satisfaction 1-10", how could it be determined which questions are most significantly related to the summary question's answer?
In short, which questions actually mattered and which ones were just wasting space on the survey...
Information about the relevance of certain features is given by linear classification and regression weights associated with these features.
For your specific application, you could try training an L1 or L0 regularized regressor (http://en.wikipedia.org/wiki/Least-angle_regression, http://en.wikipedia.org/wiki/Matching_pursuit). These regularizers force many of the regression weights to zero, which means that the features associated with these weights can be effectively ignored.
There are many different approaches for answering this question and at varying levels of sophistication. I would start by calculating the correlation matrix for all pair-wise combinations of answers, thereby indicating which individual questions are most (or most negatively) correlated with the overall satisfaction score. This is pretty straightforward in Excel with the Analysis ToolPak.
Next, I would look into clustering techniques starting simple and moving up in sophistication only if necessary. Not knowing anything about the domain to which this survey data applies it is hard to say which algorithm would be the most effective, but for starters I would look at k-means and variants if your clusters are likely to all be similarly-sized. However, if a vast majority of the responses are very similar, I would look into expectation-maximization-based algorithms. A good open-source toolkit for exploring data and testing the efficacy of various algorithms is called Weka.

Why is Bayesian filtering better than Neural Networks when classifying spam? [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 11 years ago.
Improve this question
According to several people on StackOverflow Bayesian filtering is better than Neural Networks for detecting spam.
According to the literature I've read that shouldn't be the case. Please explain!
There is no mathematical proof or explanation that can explain why the applications of Neural Networks have not been as good at detecting spam as Bayesian filters. This does not mean that Neural Networks would not produce similar or better results, but the time it would take for one to tweak the Neural Network topology and train it to get even approximately the same results as a Bayesian filter is simply not justified. At the end of the day, people care about results and minimizing the time/effort achieving those results. When it comes to spam detection, Bayesian filters get you the best results with the least amount of effort and time. If the spam detection system using Bayesian filters detects 99% of the spam correctly, then there is very little incentive for people to spend a lot of time adjusting Neural Networks just so they can eek out an extra 0.5% or so.
"According to the literature I've read that shouldn't be the case."
It's technically correct. If properly configured, a Neural Network would get as good or even better results than the Bayesian filters, but its the cost/benefit ratio that makes the difference and ultimately the trend.
Neural Networks works mostly as black box approach. You determine your inputs and outputs. After that finding suitable architecture (2 hidden layer Multi layer perceptron , RBF network etc) is done mostly empirically. There are suggestions for determining architecture but they are, well suggestions.
This is good for some problems since we, domain analyst, do not have enough information about problem itself. Ability of NN to find an answer is a wanted thing.
Bayesian Network is on the other hand is designed mostly by domain analyst. Since spam classification is a well known problem, a domain analyst can tweak architecture more easily. Bayesian network would get better results more easily in this way.
Also most NNs are not very good with changing features therefore almost always need to be RE-trained,
an expensive operation.
Bayesian network on the other hand may only change probabilities.

Resources