Why Up sampling over down sampling? [closed] - machine-learning

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 2 years ago.
Improve this question
I have a data of 191 samples and have created a logistic regression. I have first run the model with using the raw data and then went to upsampling.
What I am not able to understand is: -
Why to do upsampling before downsampling or both up and down sampling.
If upsampling creates a problem of over fitting then it can be handled with scaling of the data.
After up sampling or any other sampling, what are the parameters that I must look into so that I proceed with another sampling e.g. down sampling or up and down sampling?
I kindly request someone to help me understand the above.

Downsampling always means a loss of information, which is why in general downsampling is preferably avoided.
Scaling is actually the best alternative. Typically the data is up-sampled as it is underrepresented in the data compared to the majority data. As many algorithm try to minimise the empirical risk -- the probability of a misclassification -- they are focusing more on the majority data. The reason for upsampling/downsampling is than as either the representation in the training data is not representative or the cost of misclassification of the minority data is much higher, e.g., in predictive maintenance. The best way to correct this is actually a cost-matrix. However, as quite a few algorithms do not have an out-ofthe-box mechnisms for cost-functions upsampling/downsampling is often used as approximation. Hence, upsampling is only to be preferred if additional "noise" can be introduced during the sampling process
standard validation

Related

Is it a bad idea to always standardize all features by default? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 1 year ago.
Improve this question
Is there a reason not to standardize all features by default? I realize it may not be necessary for e.g., decision trees but for certain algorithms such as KNN, SVM and K-Means. Would there be any harm just routinely to do this for all of my features?
Also, it seems the consensus that standardization is preferable to normalization? When would this not be a good idea?
Standardization and normalization, in my experience, have the most (positive) impact when your dataset consists of features that have very different ranges (for instance age vs number of dolars per house)
In my professional experience, while working on a project with sensors from the car (time-series), I noticed that normalization (min-max scaling), even though when applied in case of a neural network, had a negative impact upon the training process and of course the final results. Admittedly, were the sensor features(values) very close as values to one another. It was a very interesting result to remark considering that I was working with Time-Series, where most of the data scientists resort to scaling by default (they are neural network in the end, goes along the theory).
In principle, standardization is better to be applied when it comes to having specific outliers in the dataset, since normalization generates smaller standard deviation values. In my humble knowledge this is the main reason standardization tends to be favored over normalization, its robustness over outliers.
Three years ago, if someone asked me this question, I would have said "standardization" is the way to go. Now I say, follow the principles, but test every hypothesis prior to jumping to a certain conclusion.

Which supervised machine learning classification method suits for randomly spread classes? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 2 years ago.
Improve this question
If classes are randomly spread or it is having more noise, which type of supervised ML classification model will give better results, and why?
It is difficult to say which classifier will perform best on general problems. It often requires testing of a variety of algorithms on a given problem in order to determine which classifier performs best.
Best performance is also dependent on the nature of the problem. There is a great answer in this stackoverflow question which looks at various scoring metrics. For each problem, one needs to understand and consider which scoring metric will be best.
All of that said, neural networks, Random Forest classifiers, Support Vector Machines, and a variety of others are all candidates for creating useful models given that classes are, as you indicated, equally distributed. When classes are imbalanced, the rules shift slightly, as most ML algorithms assume balance.
My suggestion would be to try a few different algorithms, and tune the hyper parameters, to compare them for your specific application. You will often find one algorithm is better, but not remarkably so. In my experience, often of far greater importance, is how your data are preprocessed and how your features are prepared. Once again this is a highly generic answer as it depends greatly on your given application.

why dont we need feature scaling in multiple linear regression [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 2 years ago.
Improve this question
I am currently learning ML
and I notice that in multiple linear regression we don't need scaling for our independent variable
and I didn't know why?
Whether feature scaling is useful or not depends on the training algorithm you are using.
For example, to find the best parameter values of a linear regression model, there is a closed-form solution, called the Normal Equation. If your implementation makes use of that equation, there is no stepwise optimization process, so feature scaling is not necessary.
However, you could also find the best parameter values with a gradient descent algorithm. This could be a better choice in terms of speed if you have many training instances. If you use gradient descent, feature scaling is recommended, because otherwise the algorithm might take much longer to converge.

Does the use of kernels in SVMs increase the chances of overfitting? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist
Closed 9 years ago.
Improve this question
When using kernels to delimit non linear domains in SVMs, we introduce new features based on the training examples. We then have as many features as training examples. But having as many features as examples increases the chances of overfitting right? Should we drop some of these new features?
You really can't drop any of the kernel-generated features, in many cases you don't know what features are being used or what weight is being given to them. In addition to the usage of kernels, SVMs use regularization, and this regularization decreases the possibility of overfitting.
You can read about the connection between the formulation of SVMs and statistical learning theory, but the high level summary is that the SVM doesn't just find a separating hyperplane but finds one that maximizes the margin.
The wikipedia article for SVMs is very good and provides excellent links to regularization, and parameter search and many other important topics.
increasing feature did increase the chances of overfitting,may be you should use cross validation(libsvm contains) strategy to test the model you trained now is overfitting or not,
and use feature seletion tools to select feature http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/fselect/fselect.py

Lasvm documentation and information [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
I have thousands of samples for training and test and I want to use SVM with RBF kernel to classify them. The problem is the fact that the Libsvm's implementation of RBF kernel is very slow when using 10k or more data. The main focus of the slow performance is the grid search.
I read about the Liblinear and Lasvm. But liblinear is not what I want because Svms with linear kernel usually have smaller accuracy than RBF kernel accuracy.
I was searching for Lasvm and I can't find useful informations about it. The project site is very poor about information of it. I want to know if Lasvm can use the RBF kernel or it has a specific kind of kernel, if I should scale the test and treining data and if I can do the grid search for my kernel parameters with cross validation.
LaSVM has an RBF kernel implementation too. Based on my experience on large data (>100.000 instances in >1.000 dimensions), it is no faster than LIBSVM though. If you really want to use a nonlinear kernel for huge data, you could try EnsembleSVM.
If your data is truly huge and you are not familiar with ensemble learning, LIBLINEAR is the way to go. If you have a high number of input dimensions, the linear kernel is usually not much worse than RBF while being orders of magnitude faster.

Resources