Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
I am new to Machine Learning. Can anyone tell me the major difference between classification and regression in machine learning?
Regression aims to predict a continuous output value. For example, say that you are trying to predict the revenue of a certain brand as a function of many input parameters. A regression model would literally be a function which can output potentially any revenue number based on certain inputs. It could even output revenue numbers which never appeared anywhere in your training set.
Classification aims to predict which class (a discrete integer or categorical label) the input corresponds to. e.g. let us say that you had divided the sales into Low and High sales, and you were trying to build a model which could predict Low or High sales (binary/two-class classication). The inputs might even be the same as before, but the output would be different. In the case of classification, your model would output either "Low" or "High," and in theory every input would generate only one of these two responses.
(This answer is true for any machine learning method; my personal experience has been with random forests and decision trees).
Regression - the output variable takes continuous values.
Example :Given a picture of a person, we have to predict their age on the basis of the given picture
Classification - the output variable takes class labels.
Example: Given a patient with a tumor, we have to predict whether the tumor is malignant or benign.
I am a beginner in Machine Learning field but as far as i know, regression is for "continuous values" and classification is for "discrete values". With regression, there is a line for your continuous value and you can see that your model is good or bad fit. On the other hand, you can see how discrete values gains some meaning "discretely" with classification. If i am wrong please feel free to make correction.
Related
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 12 months ago.
Improve this question
I've seen examples of real time scoring used in credit card fraud detection, but I'm not seeing how scoring can achieve such a task. I think I'm fundamentally misunderstanding scoring.
My understanding is: "scoring a model" (in the case of classification models) means predicting on a series of datasets (where we know the answer to) and evaluating the predictions it's made by calculating all the wrong predictions the model made vs. the correct. i.e. if a model made 50 mistakes out of 100 predictions, the model is 50% accurate -- thus the score.
But I don't get how doing this in real time can detect fraud. If we don't know if the transaction is a fraud or not (since it's not historical data), how can scoring achieve fraud detection?
OR Is scoring actually the "confidence" of the prediction? i.e. When I make a prediction on an unseen dataset, a classification model will tell me that the confidence for the prediction can be 80% (the model is 80% sure it has the correct prediction). Is the score 80% in this case?
I've also seen scoring defined as applying a model to a new dataset. Isn't that the same as a prediction?
Firstly scoring depends on what metric you have defined yourself in order to measure the performance of your model. It can be anything whether like confidence or accuracy or any other metric for model evaluation. You have to define yourself which metric to use and which works best and its output will be called score.
The difference in Real Time Scoring & Batch Scoring:
Let us say you are building a fraud detection model. You will have to assign the scores to each transaction. There are two ways to do it.
Real Time Scoring - You get the features in real time and do all the preprocessing and pass it through the model in order to get the predictions. This all should be happening in real time itself giving immediate results. Pros are users or systems will not have to wait in order to get the results.
Batch Scoring - When you create a model which does the predictions in batch periodically, then it is called batch inferencing or batch scoring. Imagine you run your systems for predictions every hour or every midnight then it is doing it in batches.
They both have their pros & cons but generally, these decisions depend on business stakeholders and business requirements.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 2 years ago.
Improve this question
what is the difference between using K-nearest neighbor in classification and using it in regression?
and when using KNN in recommendation system. Does it concerned as classification or as regression?
In classification tasks, the user seeks to predict a category, which is usually represented as an integer label, but represents a category of "things". For instance, you could try to classify pictures between "cat" and "dog" and use label 0 for "cat" and 1 for "dog".
The KNN algorithm for classification will look at the k nearest neighbours of the input you are trying to make a prediction on. It will then output the most frequent label among those k examples.
In regression tasks, the user wants to output a numerical value (usually continuous). It may be for instance estimate the price of a house, or give an evaluation of how good a movie is.
In this case, the KNN algorithm would collect the values associated with the k closest examples from the one you want to make a prediction on and aggregate them to output a single value. usually, you would choose the average of the k values of the neighbours, but you could choose the median or a weighted average (or actually anything that makes sense to you for the task at hand).
For your specific problem, you could use both but regression makes more sense to me in order to predict some kind of a "matching percentage ""between the user and the thing you want to recommand to him.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I'm working on a deep learning classifier (Keras and Python) that classifies time series into three categories. The loss function that I'm using is the standard categorical cross-entropy. In addition to this, I also have an attention map which is being learnt within the same model.
I would like this attention map to be as small as possible, so I'm using a regularizer. Here comes the problem: how do I set the right regularization parameter? What I want is the network to reach its maximum classification accuracy first, and then starts minimising the intensity attention map. For this reason, I train my model once without regulariser and a second time with the regulariser on. However, if the regulariser parameter (lambda) is too high, the network loses completely accuracy and only minimises the attention, while if the regulariser is too small, the network only cares about the classification error and won't minimise the attention, even when the accuracy is already the maximum.
Is there a smarter way to combine the categorical cross-entropy with the regulariser? Maybe something that considers the variation of categorical cross-entropy in time, and if it doesn't go down for, say N iterations, it only considers the regulariser?
Thank you
Regularisation is a way to fight with overfitting. So, you should understood if your model overfits. A simple way to do it: you can compare f1 score for train and test. If f1 score for train is high and for test is low, seems, you have overfitting - so you need add some regularisation.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 7 years ago.
Improve this question
According to an article I read here, Machine Learning is to do with teaching a machine how to do certain tasks through 'learning' input/output relations.
What is a more accurate definition of machine learning?
Machine Learning is to do with teaching a machine how to do certain tasks through input/output relations. Is this kind of correct?
The short answer is yes, kind of. Read on.
Definition of Machine Learning
To understand what Machine Learning is let's first define the term Learning. The often quoted definition by Tom M. Mitchell (1) is as follows:
A computer program is said to learn from experience E with respect to
some class of tasks T and performance measure P if its performance at
tasks in T, as measured by P, improves with experience E
Meaning?
This sounds quite formal, however it just says computers learn from experience that they are presented with in terms of data. The data to enable learning exists relative to a specific task and consists of several parameters:
T, a task to accomplish, e.g. predict housing price predictions
E, some value of experience, e.g. prices observed
P, some value of performance, e.g. how many prices are predicted
Example: Housing prices
Once a program has learnt from these inputs, it can take a new, previously unseen experience and from that predict, in our example, the specific housing price. The housing price might be strongly correlated to say location, age and size of house or apartment, and the luxury of its interiors.
What is the result of a learning algorithm?
In its simplest form then a machine learning algorithm for housing prices might implement a multi-variate regression analysis. It takes as input a body of data that relates real, observed prices to the four features location, age, size, luxury. The process of learning produces a regression model that in essence assigns a weight to each feature, of the form
y^ = w_location * location + w_age * age + w_size * size + w_luxury * luxury
That is, the weights w_* are learned from the input data, y^ is the predicted price. The learning is considered successful once the formula is able to successfully predict housing prices based on a list of features alone. Usually a prediction is considered successful if it falls within a certain bound (%-range) of the real price.
Note that the definition of successful very much depends on the kind of task that the program must learn, however the result needs to be substantially better than a pure random guess (that is, the ratio of correct results needs to be statistically significant).
Is there more to it?
Yes, a lot. Some pointers can be found in this Wikipedia article. If you are keen to get into the subject, professor Andrew Ng's Standford lecture is quite famous, although there are many more courses if you look for it. Pick the one that best suits your interests.
References
(1): Mitchell, T. (1997). Machine Learning, McGraw Hill. ISBN 0-07-042807-7, p.2. as referenced by Wikipedia
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 9 years ago.
Improve this question
For a Multiclass problem, should the data be balanced for machine learning algorithms such as Random forests and Random ferns or is it ok for it to be imbalanced for a certain extent?
The issue with imbalanced classes raises when the disproportion alters the separability of the classes instances. But this does not happen in ever imbalanced dataset: sometimes the more data you have from one class the better you can differentiate the scarse data from it since it lets you find more easily which features are meaningful to create an discriminating plane (even though you are not using discriminative analysis the point is to classify-separate the instances according to classes).
For example I can remember the KDDCup2004 protein classification task in which one class had 99.1% of the instances in the training set but if you tried to use under sampling methods to alleviate the imbalance you would only get worse results. That meaning that the large amount of data from the first class defined the data in the smaller one.
Concerning random forests, and decision trees in general, they work by selecting, at each step, the most promising feature that can partitionate the set into two (or more) class-meaningful subsets. Having inherently more data about one class does not bias this partitioning by default ( = always) but only when the imbalance is not representative of the classes real distributions.
So I suggest that you first run a multivariate analysis to try to get the extent of imbalance among classes in your dataset and the run a series of experiments with different undersampling ratios if you still ar ein doubt.
I have used Random Forrests in my task before. Although the data don't need be balanced, however if the positive samples are too few, the pattern of the data maybe drown in the noise. Most of classify methods even (random forrests and ada boost) should have this flaw more or less.'Over sample' may be a good idea to deal with this problem.
Perhaps the paper Logistic Regression in rareis useful with this sort of problem, although its topic is logistic regression.