How learn/detect the error/offset pattern with Machine Learning - machine-learning

Is there a chance to use Machine Learning Tools/Techniques to analyze the offset/error of true value from existing Prediction Model ?
The techniques to learn/detect the error pattern, which could give a clue about the situation where Prediction model cannot cover?

Related

Evaluate CNN model for multiclass image classification

i want to ask what metric can be used to evalutate my CNN model for multi class, i have 3 classes for now and i’m just using accuracy and confussion matrix also plot the loss of model, is there any metric can be used to evaluate my model performance?
Evaluating the performance of a model is one of the most crucial phase of any Machine Learning project cycle and must be done effectively. Since, you have mentioned that you are using accuracy and confusion metrics for the evaluation. I would like to add some points for developing a better evaluation strategy:
Consider you are developing a classifier that classifies an EMAIL into SPAM or NON - SPAM (HAM), now one of the possible evaluation criteria can be the FALSE POSITIVE RATE because it can be really annoying if a non-spam email ends in spam category (which means you will read a valuable email)
So, I recommend you to consider metrics based on the problem you are targeting. There are many metrics such as F1 score, recall, precision that you can choose based on the problem you are havning.
You can visit: https://medium.com/apprentice-journal/evaluating-multi-class-classifiers-12b2946e755b for better understanding.

How azure ML give an output for a value which is not used when training the model?

I am trying to predict the price of a house. Therefore I added no-of-rooms as one variable to get the prediction. Previous values for that variable was (3,2,1) when I was training the model. Now I am adding no-of-rooms as "6" to get an output(which was not use before to get the predicted value). How will it give the output for a new value?Is it only consider the variables except no-of-rooms ? I used Boosted decision tree regression as the model.
The short answer is that when you train your model on a set of features and then use a test set to run predictions, yes it will be able to utilize/understand feature values that the model hasn't previously seen during training. If you have large outliers in your test set that would differ significantly from what the model saw during training, it will affect accuracy, but it will still attempt a prediction.
This is less of a Azure Machine Learning question and more machine learning basics (or really just the basics of how regression works). I would do some research on both "linear regression", and the concept of "over-fitting in machine learning". These are two very basic conceptual topics that will help with your understanding. Understanding regression will help you see why a model can use a value it hasn't previously seen to create a prediction.

In stacking for machine learning which order should you train the models in?

I am currently learning to do stacking in a machine learning problem. I am going to get the outputs of the first model and use these outputs as features for the second model.
My question is: Does the order matter? I am using a lasso regression model and a boosted tree. In my problem the regression model outperforms the boosted tree. I am thinking therefore that I should use the regression tree second and the boosted tree first.
What are the factors I need to think about when making this decision?
Why don't you try feature engineering to create more features?
Don't try to use predictions from one model as features for another model.
You can try using K-means to cluster similar training samples.
For stacking, just use different models and then average the results (assuming that you have a continuous y variable).

machine learning algorithm that says which train data cause current decision

I need a learning model that when we test it with a data sample it says which train data cause the answer .
Is there anything that do this?
(I already know KNN will do this)
thanks
look for generative models
"It asks the question: based on generation assumptions, which category is most likely to generate this signal?"
This is not a very well worded question:
Which train data cause the answer? I already know KNN will do this
KNN will tell you what the K nearest neighbors are, but it's not just those K training samples that cause the answer, it's also all the other training samples by being farther away.
The objective of machine learning is to generalize from the whole of the training dataset, so all samples in the training dataset (after outlier filtering, dataset reduction steps) cause the answer.
If your question is 'Which class of machine learning algorithms makes a decision by comparing a new instance to instances seen in the training data, and can list the training examples which most strongly informed the decision?', the answer is: Instance based learning https://en.wikipedia.org/wiki/Instance-based_learning
(e.g. KNN, kernel machines, RBF)

The difference between supervised and unsupervised learning when using PCA

I have read the answer here. But, I can't apply it on one of my example so I probably still don't get it.
Here is my example:
Suppose that my program is trying to learn PCA (principal component analysis).
Or diagonalization process.
I have a matrix, and the answer is it's diagonalization:
A = PDP-1
If I understand correctly:
In supervised learning I will have all tries with it's errors
My question is:
What will I have in unsupervised learning?
Will I have error for each trial as I go along in trials and not all errors in advance? Or is it something else?
First of all, PCA is neither used for classification, nor clustering. It is an analysis tool for data where you find the principal components in the data. This can be used for e.g. dimensionality reduction. Supervised and unsupervised learning has no relevance here.
However, PCA can often be applied to data before a learning algorithm is used.
In supervised learning, you have (as you say) a labeled set of data with "errors".
In unsupervised learning you don't have any labels, i.e, you can't validate anything at all. All you can do is to cluster the data somehow. The goal is often to achieve clusters that internally are more homogeneous. Success can be measured, e.g., using the within-cluster variance metric.
Supervised Learning:
-> You give variously labeled example data as input along with correct answer.
-> This algorithm will learn form it and start predicting correct result based on input.
example: email spam filter
Unsupervised Learning:
-> You gave just data and don't tell anything like label or correct answer.
-> Algorithm automatically analyse pattern in the data.
example: google news

Resources