i want to ask what metric can be used to evalutate my CNN model for multi class, i have 3 classes for now and i’m just using accuracy and confussion matrix also plot the loss of model, is there any metric can be used to evaluate my model performance?
Evaluating the performance of a model is one of the most crucial phase of any Machine Learning project cycle and must be done effectively. Since, you have mentioned that you are using accuracy and confusion metrics for the evaluation. I would like to add some points for developing a better evaluation strategy:
Consider you are developing a classifier that classifies an EMAIL into SPAM or NON - SPAM (HAM), now one of the possible evaluation criteria can be the FALSE POSITIVE RATE because it can be really annoying if a non-spam email ends in spam category (which means you will read a valuable email)
So, I recommend you to consider metrics based on the problem you are targeting. There are many metrics such as F1 score, recall, precision that you can choose based on the problem you are havning.
You can visit: https://medium.com/apprentice-journal/evaluating-multi-class-classifiers-12b2946e755b for better understanding.
Related
I'm using a grid search to tune the hyperparameters of my DNN, which has 2 depth layers. I'm currently scoring each model based on the average loss in the test set, but I'm not sure if this is the best approach. Would it be better to use the accuracy, or both the loss and accuracy, as a scoring metric? How do other people typically score their models during hyperparameter tuning? Any advice or insights would be greatly appreciated.
The first thing in your experimental setup is using the test set while making hyperparameter tunning. You should train your model with your train set and make your hyperparameter tunning with your validation set. After finishing this process, you need to use test set to get the model score is the best option to the way of using/splitting the dataset correctly.
The second part of your question is very open-ended, but you may benefit from the following tips:
Different metrics may be suitable for different tasks, so it is important to choose the right metric. For instance, in some classification tasks you would like to track accuracy, and some of them recall or precision etc. (or you can use and track multiple metrics to understand your model behavior more deeper)
The recent advancement on this topic is generally referred to as AutoML and there are many different applications/libraries/methodologies that are used for hyperparameter tuning. So you may also want to search other methods rather than just using GridSeach. If you want to continue with GridSearch, to find the optimal parameters for your problem, you can switch the GridSearchCV so you can test your model more than once with a different part of the dataset which makes your hyperparameter tunning operation more robust.
I am a beginner in data science and need help with a topic.
I have a data set about the customers of an institution. My goal is to first find out which customers will pay to this institution and then find out how much money the paying customers will pay.
In this context, I think that I can first find out which customers will pay by "classification" and then how much will pay by applying "regression".
So, first I want to apply "classification" and then apply "regression" to this output. How can I do that?
Sure, you can definitely apply a classification method followed by regression analysis. This is actually a common pattern during exploratory data analysis.
For your use case, based on the basic info you are sharing, I would intuitively go for 1) logistic regression and 2) multiple linear regression.
Logistic regression is actually a classification tool, even though the name suggests otherwise. In a binary logistic regression model, the dependent variable has two levels (categorical), which is what you need to predict if your customers will pay vs. will not pay (binary decision)
The multiple linear regression, applied to the same independent variables from your available dataset, will then provide you with a linear model to predict how much your customers will pay (ie. the output of the inference will be a continuous variable - the actual expected dollar value).
That would be the approach I would recommend to implement, since you are new to this field. Now, there are obviously many different other ways to define these models, based on available data, nature of the data, customer requirements and so on, but the logistic + multiple regression approach should be a sure bet to get you going.
Another approach would be to make it a pure regression only. Without working on a cascade of models. Which will be more simple to handle
For example, you could associate to the people that are not willing to pay the value 0 to the spended amount, and fit the model on these instances.
For the business, you could then apply a threshold in which if the predicted amount is under a more or less fixed threshold, you classify the user as "non willing to pay"
Of course you can do it by vertically stacking models. Assuming that you are using binary classification, after prediction you will have a dataframe with target values 0 and 1. You are going to filter where target==1 and create a new dataframe. Then run the regression.
Also, rather than classification, you can use clustering if you don't have labels since the cost is lower.
I am developing a software used to automate machine learning .
I have observed in some of the datasets with less number of features (4,5),if we apply feature selection and consequently my classifiers models the performance actually decreases(due to the loss of information)... But in cases of datasets with larger number of features if we apply feature selection the performance actually improves.......
So I am looking for some heurestic so as to determine whether to apply feature selection or not ?
Is there any paper /work which addresses this issue ?When to apply feature selection and when not to ?
There are quite a few heuristics. I don't know a single paper or source that addresses them all in a trivial amount of time.
When you say 'performance' I'm assuming you're referring to the accuracy of prediction for your test data set by your model which has been trained and cross validated by a training data set and cross validation data set.
There are a large number of ML algorithms as well, feature selection may not affect them all the same. Which are you using?
For example Applying feature selection for a Neural Network will result in changes that affect the Bias and Variance of you model which in turn will affect the accuracy of prediction on the test set:
too many features may result in overfitting (depending on sample training size) due to high varience
too few you may end up underfitting or high bias (regardless of sample training size)
Either will cause prediction on test sets to suffer. Also, accuracy alone isn't enough when 'tuning' a models (figuring out feature, degrees, regularization lambda's, etc...) To figure out what's best what you'll need to look at is the precision and recall of your model.
Unfortunately, there's no quick-and-easy way I can explain in a short SO answer in detail what you need to do to optimize your model.
I suggest you spend the time to take something like Andrew Ng's intro to machine learning course https://www.coursera.org/learn/machine-learning/home/welcome. Chapter 6 discusses how to determine how to optimize NN model.
I am new to Machine Learning.I am working on a project where the machine learning concept need to be applied.
Problem Statement:
I have large number(say 3000)key words.These need to be classified into seven fixed categories.Each category is having training data(sample keywords).I need to come with a algorithm, when a new keyword is passed to that,it should predict to which category this key word belongs to.
I am not aware of which text classification technique need to applied for this.do we have any tools that can be used.
Please help.
Thanks in advance.
This comes under linear classification. You can use naive-bayes classifier for this. Most of the ml frameworks will have an implementation for naive-bayes. ex: mahout
Yes, I would also suggest to use Naive Bayes, which is more or less the baseline classification algorithm here. On the other hand, there are obviously many other algorithms. Random forests and Support Vector Machines come to mind. See http://machinelearningmastery.com/use-random-forest-testing-179-classifiers-121-datasets/ If you use a standard toolkit, such as Weka, Rapidminer, etc. these algorithms should be available. There is also OpenNLP for Java, which comes with a maximum entropy classifier.
You could use the Word2Vec Word Cosine distance between descriptions of each your category and keywords in the dataset and then simple match each keyword to a category with the closest distance
Alternatively, you could create a training dataset from already matched to category, keywords and use any ML classifier, for example, based on artificial neural networks by using vectors of keywords Cosine distances to each category as an input to your model. But it could require a big quantity of data for training to reach good accuracy. For example, the MNIST dataset contains 70000 of the samples and it allowed me reach 99,62% model's cross validation accuracy with a simple CNN, for another dataset with only 2000 samples I was able reached only about 90% accuracy
There are many classification algorithms. Your example looks to be a text classification problems - some good classifiers to try out would be SVM and naive bayes. For SVM, liblinear and libshorttext classifiers are good options (and have been used in many industrial applcitions):
liblinear: https://www.csie.ntu.edu.tw/~cjlin/liblinear/
libshorttext:https://www.csie.ntu.edu.tw/~cjlin/libshorttext/
They are also included with ML tools such as scikit-learna and WEKA.
With classifiers, it is still some operation to build and validate a pratically useful classifier. One of the challenges is to mix
discrete (boolean and enumerable)
and continuous ('numbers')
predictive variables seamlessly. Some algorithmic preprocessing is generally necessary.
Neural networks do offer the possibility of using both types of variables. However, they require skilled data scientists to yield good results. A straight-forward option is to use an online classifier web service like Insight Classifiers to build and validate a classifier in one go. N-fold cross validation is being used there.
You can represent the presence or absence of each word in a separate column. The outcome variable is desired category.
I would like to know what are the various techniques and metrics used to evaluate how accurate/good an algorithm is and how to use a given metric to derive a conclusion about a ML model.
one way to do this is to use precision and recall, as defined here in wikipedia.
Another way is to use the accuracy metric as explained here. So, what I would like to know is whether there are other metrics for evaluating an ML model?
I've compiled, a while ago, a list of metrics used to evaluate classification and regression algorithms, under the form of a cheatsheet. Some metrics for classification: precision, recall, sensitivity, specificity, F-measure, Matthews correlation, etc. They are all based on the confusion matrix. Others exist for regression (continuous output variable).
The technique is mostly to run an algorithm on some data to get a model, and then apply that model on new, previously unseen data, and evaluate the metric on that data set, and repeat.
Some techniques (actually resampling techniques from statistics):
Jacknife
Crossvalidation
K-fold validation
bootstrap.
Talking about ML in general is a quite vast field, but I'll try to answer any way. The Wikipedia definition of ML is the following
Machine learning, a branch of artificial intelligence, concerns the construction and study of systems that can learn from data.
In this context learning can be defined parameterization of an algorithm. The parameters of the algorithm are derived using input data with a known output. When the algorithm has "learned" the association between input and output, it can be tested with further input data for which the output is well known.
Let's suppose your problem is to obtain words from speech. Here the input is some kind of audio file containing one word (not necessarily, but I supposed this case to keep it quite simple). You'd record X words N times and then use (for example) N/2 of the repetitions to parameterize your algorithm, disregarding - at the moment - how your algorithm would look like.
Now on the one hand - depending on the algorithm - if you feed your algorithm with one of the remaining repetitions, it may give you some certainty estimate which may be used to characterize the recognition of just one of the repetitions. On the other hand you may use all of the remaining repetitions to test the learned algorithm. For each of the repetitions you pass it to the algorithm and compare the expected output with the actual output. After all you'll have an accuracy value for the learned algorithm calculated as the quotient of correct and total classifications.
Anyway, the actual accuracy will depend on the quality of your learning and test data.
A good start to read on would be Pattern Recognition and Machine Learning by Christopher M Bishop
There are various metrics for evaluating the performance of ML model and there is no rule that there are 20 or 30 metrics only. You can create your own metrics depending on your problem. There are various cases wherein when you are solving real - world problem where you would need to create your own custom metrics.
Coming to the existing ones, it is already listed in the first answer, I would just highlight each metrics merits and demerits to better have an understanding.
Accuracy is the simplest of the metric and it is commonly used. It is the number of points to class 1/ total number of points in your dataset. This is for 2 class problem where some points belong to class 1 and some to belong to class 2. It is not preferred when the dataset is imbalanced because it is biased to balanced one and it is not that much interpretable.
Log loss is a metric that helps to achieve probability scores that gives you better understanding why a specific point is belonging to class 1. The best part of this metric is that it is inbuild in logistic regression which is famous ML technique.
Confusion metric is best used for 2-class classification problem which gives four numbers and the diagonal numbers helps to get an idea of how good is your model.Through this metric there are others such as precision, recall and f1-score which are interpretable.