Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 7 years ago.
Improve this question
As I was looking for a fine regression algorithm for my problem. I found out one can do that with simple decision trees as well, which is usually used for classification. The output would be something like:
The red noise would be the prediction states of such a tree or forest.
Now my question is, why at all to use this method, when there are alternatives, that really try to figure out the underlying equation (such as the famous support vector machines SVM). Are there any positive / unique aspects, or was a regression tree more a nice-to-have-algorithm?
The image you posted conveys a smooth function of y in x. A regression tree is certainly not the best technique to estimate such a function and I probably wouldn't use SVMs either. This looks like a good application for splines, e.g., by using a GAM (generalized additive model).
A regression tree on the other hand is a handy tool if you haven't got such smooth functions and if you don't know which explanatory variable will have which effect on the response. It will be particularly useful if there are jumps in the response or interactions - especially if the jump points and interaction patterns are not known in advance.
Related
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 2 years ago.
Improve this question
I am currently learning ML
and I notice that in multiple linear regression we don't need scaling for our independent variable
and I didn't know why?
Whether feature scaling is useful or not depends on the training algorithm you are using.
For example, to find the best parameter values of a linear regression model, there is a closed-form solution, called the Normal Equation. If your implementation makes use of that equation, there is no stepwise optimization process, so feature scaling is not necessary.
However, you could also find the best parameter values with a gradient descent algorithm. This could be a better choice in terms of speed if you have many training instances. If you use gradient descent, feature scaling is recommended, because otherwise the algorithm might take much longer to converge.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 2 years ago.
Improve this question
My question is given a particular dataset and a binary classification task, is there a way we can choose a particular type of model that is likely to work best? e.g. consider the titanic dataset on kaggle here: https://www.kaggle.com/c/titanic. Just by analyzing graphs and plots, are there any general rules of thumb to pick Random Forest vs KNNs vs Neural Nets or do I just need to test them out and then pick the best performing one?
Note: I'm not talking about image data since CNNs are obv best for those.
No, you need to test different models to see how they perform.
The top algorithms based on the papers and kaggle seem to be boosting algorithms, XGBoost, LightGBM, AdaBoost, stack of all of those together, or just Random Forests in general. But there are instances where Logistic Regression can outperform them.
So just try them all. If the dataset is >100k, you're not gonna lose that much time, and you might learn something valuable about your data.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 4 years ago.
Improve this question
One of anomaly detection algorithms is to use multivariate Gaussian to construct a probability density, according to Andrew Ng's coursera lecture.
What if data show cluster structures (not a single chunk)? In this case do we resort to unsupervised clustering to construct the density? If yes, how to do it? Are there other systematic ways to discover if such a case exists?
You can just use regular GMM and use a threshold on the likelihood to identify outliers. Points that don't fit the model well are outliers.
This works okay as long as your data really is composed of Gaussians.
Furthermore, clustering is fairly expensive. Usually it will be faster to directly use a nonparametric outlier model like KNN or LOF or LOOP.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist
Closed 9 years ago.
Improve this question
I have a question about Reinforcement learning. If we use a mechanism to find the response of the environment in an unsupervised method to improve its performance, is the method still unsupervised?
In other word, using the response of the environment, is a method supervised or we can do it in an unsupervised manner? If so, how?
I have to disagree with #phs. Reinforcment learning is treated in the literature either as:
completely separate, third method of training -- so it is not supervised or unsupervised, it is simply reinforcment
it is sometimes marked as supervised due to its much stronger similarities to this paradigm
So, if the algorithm is trained in the reinforcment fashion and unsupervised, you can call it a unsupervised-reinforcment hybrid or something similar, but no longer "unsupervised", as reinforcment learning requires some additional knowledge about the world, than the one encoded in the data representation (feedbacks are not stored in data representation, they are much more like "true labels").
Unsupervised learning describes a class of problems where the model is not provided "answers" during its training phase, whatever that might mean in the current context.
Clustering is a canonical example. In a clustering problem one is only looking for inherent structure or grouping in the training data, and not seeking to distinguish "right" data points from "wrong" ones.
Your question is vague, but I believe you are asking whether we can call a training method unsupervised even if we have a proscribed algorithm for performing the training. The answer is yes; the word is just a word. All learning algorithms have inherent proscribed structure (the algorithm) and so are in some sense "supervised".
Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 9 years ago.
Improve this question
Hopefully the last NN question you'll get from me this weekend, but here goes :)
Is there a way to handle an input that you "don't always know"... so it doesn't affect the weightings somehow?
Soo... if I ask someone if they are male or female and they would not like to answer, is there a way to disregard this input? Perhaps by placing it squarely in the centre? (assuming 1,0 inputs at 0.5?)
Thanks
You probably know this or suspect it, but there's no statistical basis for guessing or supplying the missing values by averaging over the range of possible values, etc.
For NN in particular, there are quite a few techniques avaialble. The technique i use--that i've coded--is one of the simpler techniques, but it has a solid statistical basis and it's still used today. The academic paper that describes it here.
The theory that underlies this technique is weighted integration over the incomlete data. In practice, no integrals are evaluated, instead they are approximated by closed-form solutions of Gaussian Basis Function networks. As you'll see in the paper (which is a step-by-step explanation, it's simple to implement in your backprop algorithm.
Neural networks are fairly resistant to noise - that's one of their big advantages. You may want to try putting inputs at (-1.0,1.0) instead, with 0 as the non-input input, though. That way the input to the weights from that neuron is 0.0, meaning that no learning will occur there.
Probably the best book I've ever had the misfortune of not finishing (yet!) is Neural Networks and Learning Machines by Simon S. Haykin. In it, he talks about all kinds of issues, including the way you should distribute your inputs/training set for the best training, etc. It's a really great book!