Difference between SVM and MPM (example) - machine-learning

I know the theorical difference between theses methods, but someone could give an example that make SVM != MPM? I think it's the same thing.
An image would be awesome.
SVM: Maximize margin between 2 samples classes
MPM: Minimize the probability of wrong classification
Thank you.

here's is a link to the picture.

Related

What is the mathematical principal of setting class weight in logestic regression in scikit-learn?

In logestic regression algorithm in scikit-learn library of python, there is a "class_weight" argument. I wish to know what is the mathematical principal of realizing setting class_weight during model fitting. Is it related to modify the target function:
https://drive.google.com/open?id=16TKZFCwkMXRKx_fMnn3d1rvBWwsLbgAU
And what is the specific modification?
Thank you in advance!
I will appreciate any help from you!
Yes, it affects the loss function and it is very commonly used when your labels are imbalanced. Mathematically, the loss function just becomes a weighted average of per sample losses where the weights depend on the class of the given sample. If no class_weight is used, then all samples are weighted uniformly (as in the picture you attached).
The idea is to punish mistakes on predictions of underrepresented classes more than mistakes on the overrepresented classes.
See a more detailed discussion here.

How do sample weights work in classification models?

What does it mean to provide weights to each sample for
classification? How does a classification algorithm like Logistic regression or SVMs use weights to emphasize certain examples more than others? I would love going into details to unpack how these algorithms leverage sample weights.
If you look at the sklearn documentation for logistic regression, you can see that the fit function has an optional sample_weight parameter which is defined as an array of weights assigned to individual samples.
this option is meant for imbalance dataset. Let's take an example: i've got a lot of datas and some are just noise. But other are really important to me and i'd like my algorithm to consider them a lot more than the other points. So i assigne a weight to it in order to make sure that it will be dealt with properly.
It change the way the loss is calculate. The error (residues) will be multiplie by the weight of the point and thus, the minimum of the objective function will be shifted. I hope it's clear enough. i don't know if you're familiar with the math behind it so i provide here a small introduction to have everything under hand (apologize if this was not needed)
https://perso.telecom-paristech.fr/rgower/pdf/M2_statistique_optimisation/Intro-ML-expanded.pdf
See a good explanation here: https://www.kdnuggets.com/2019/11/machine-learning-what-why-how-weighting.html .

people detection with haar cascade

I am working on a project in my school to detect how many students are in the classroom. Like in this picture.
I have been trying to use Haar Cascade in opencv for face detection to detect people, but the result is very bad. Like this:
I took thousands of pictures in classroom, and cropped the picture with people manually. There are about 4000 positive samples and 12000 negative samples. I was wondering what did I do wrong?
When I crop the image, should I only crop only head like this?
Or like this with body?
I think I had enough training samples, and I follow the exact procedure with this post:
http://note.sonots.com/SciSoftware/haartraining.html#v6f077ba
which should be working.
Or should I use a different algorithm like HOG or SVM. Any suggestion would be great for me, I have been stuck in this for months and don't have any clue. Thanks a lot!
Haar is better for human face. Hog with SVM is classic for human detection and there've been lots of source and blogs about them, it's not hard to train a classifier. For your scene, I think 'head and shoulder' is better than 'head alone'. But your multi-view samples increase the difficulty. A facing cam would be better. Add more hard neg samples if you always have much more false positive alarms.
This paper may help:
http://irip.buaa.edu.cn/~zxzhang/papers/icip2009-1.pdf
Normally, with Haar cascade, the result is very different when we change the parameters when we train the classifier. In my case, that object is very simple, but it cannot detect too:
opencv haar cascade for Varikont detection
When I changed the parameters, it can detect very nice.
You can have a look here: https://inspirit.github.io/jsfeat/sample_haar_face.html
Or more special and more professional, you can research about Bag of Visual Words (SIFT/SURF + SVM or SGD).
Personally, I think you don't need to use the method complex for person detection.
Regards,

Naive bayes classifier poor accuracy for positive negative classes

I am new to the concept of machine learning and I am trying to figure out this problem. I am using WEKA for this. I have 4 clusters who's means kind of form a square. The training dataset that I provide to Naive Bayes has 2 classes where opposite (across the center of entire plot) clusters are in same class. The accuracy for this model is not even 50 percent but when I change the classes from opposite to the same side, the accuracy becomes 100 percent. Why is this so?
Naive bays can not represent the solution to that problem.
There is more than one form of naive bays, but none can handle that problem. The solutions each one can solve are somewhat different.
Try to ask yourself what properties about the solution change when you "change the classes from opposite to the same side", and what it would take to represent that solution.
Start by loading a snapshot of the data to weka to understand the data. The decide if Naive Bayes is the right algorithm for the solution

what is the main difference between linear discriminant analysis and pronciple component analysis

"The Principal Component Analysis (PCA), which is the core of the Eigenfaces method, finds a linear combination of features that maximizes the total variance in data. While this is clearly a powerful way to represent data, it doesn’t consider any classes and so a lot of discriminative information may be lost when throwing components away." (Open CV)
What is mean by "CLASSES" here????
"
Linear Discriminant Analysis maximizes the ratio of between-classes to within-classes scatter, instead of maximizing the overall scatter. The idea is simple: same classes should cluster tightly together, while different classes are as far away as possible from each other in the lower-dimensional representation.
in here also what is mean by CLASSES????
Can some one please explain this in image processing view thanx
Classes in these contexts means groups or classifications. Like 'faces' or 'letters', things that have a set of geometric properties that can be identified with some degree of generality. PCA tried to classify objects in an image by them selves while LDS tries to classify things with some consideration to how many of the same thing they are near.
An example might be a picture of the ball "Wilson". By itself a it doesn't look much like a face and PCA would give it a low likelihood as being a face, but an LDS approach if the picture included Tom Hanks right next to it would classify Tom Hanks as having a face and cause the Wilson to more likely be a face as well. As you can see from this contrived example depending on what you are trying to achieve (and how good your data is) each approach has its upsides and downsides.
TO make it simple, PCA tries to represent the total data in minimum dimension. LDA also tries to do the same but also make sure that the different classes can be differentiated(classification). PCA does not help in classification. It helps only in dimesionality reduction. SO LDA = PCA + classification

Resources