What is the relation between variational infernece, variational bayes and variational EM? - machine-learning

What is the relation between variational infernece, variational bayes and variational EM?

This short answer assumes you are familiar with the basics of variational inference and also EM.
Variational Inference and variational Bayes are the same thing.
Variational EM, on the other hand, is a modification of the EM algorithm, where instead of the conditional probability p(z|x), the approximate probability q(z) is used in the E-step.

Related

Variational Autoencoders: MSE vs BCE

I'm working with a Variational Autoencoder and I have seen that there are people who uses MSE Loss and some people who uses BCE Loss, does anyone know if one is more correct that the another and why?
As far as I understand, if you assume that the latent space vector of the VAE follows a Gaussian distribution, you should use MSE Loss. If you assume it follows a multinomial distribution, you should use BCE. Also, BCE is biased towards 0.5.
Could someone clarify me this concept? I know that it's related with the Lower Variational Bound term of the expectancy of information...
Thank you so much!
In short: Maximizing likelihood of model whose prediction are normal distribution(multinomial distribution) is equivalent to minimizing MSE(BCE)
Mathematical details:
The real reason you use MSE and cross-entropy loss functions
DeepMind have an awesome lecture on Modern Latent Variable Models(Mainly about Variational Autoencoders), you can understand everything you need there

Should I use multinomial logistic regression or linear discriminant analysis?

In a classification problem, I cannot use a simple logit model if my data label (aka., dependent variable) has more than two categories. That leaves me with multinomial regression and Linear Discriminant Analysis (LDA) and the likes. Why is it that multinomial logit is not as popular as LDA in machine learning? What is the particular advantage that LDA offers?

Why is naïve Bayes generative?

I am working on a document which should contain the key differences between using Naive Bayes (generative) and Logistic Regression (discriminative) models for text classification.
During my research, I ran into this definition for Naive Bayes model: https://nlp.stanford.edu/IR-book/html/htmledition/naive-bayes-text-classification-1.html
The probability of a document d being in class c is computed as ... where p(tk|c) is the conditional probability of term tk occurring in a document of class c...
When I got to the part of comparing Generative and Discriminative models, I found this explanation on StackOverflow as accepted: What is the difference between a Generative and Discriminative Algorithm?
A generative model learns the joint probability distribution p(x,y) and a discriminative model learns the conditional probability distribution p(y|x) - which you should read as "the probability of y given x".
At this point I got confused: Naive Bayes is a generative model and uses conditional probabilities, but at the same time the discriminative models were described as if they learned the conditional probabilities as opposed to the joint probabilities of the generative models.
Can someone shed some light on this please?
Thank you!
It is generative in the sense that you don't directly model the posterior p(y|x) but rather you learn the model of the joint probability p(x,y) which can be also expressed as p(x|y) * p(y) (likelihood times prior) and then through the Bayes rule you seek to find the most probable y.
A good read I can recommend in this context is: "On Discriminative vs. Generative classifiers: A comparison of logistic regression and naive Bayes"
(Ng & Jordan 2004)

How to use Naive Bayes Binary Classification in CoreML?

I am trying to build a number classification model in CoreML and want to use the naive bayes classifier but not able to find how to use it. My algorithm is using naive bayes
At the moment, coremltools support only following types of classifiers:
SVMs (scikitlearn)
Neural networks (Keras, Caffe)
Decision trees and their ensembles (scikitlearn, xgboost)
Linear and logistic regression (scikitlearn)
However, implementing Naïve Bayes in Swift yourself is not that hard, check this implementation, for example.

Difference between Data Mining algorithms and methods

I read a lot of times in literature that there are several Data Mining methods (for example: decision trees, k-nearest neighbour, SVM, Bayes Classification) and the same for Data Mining algorithms (k-nearest neighbour algorithm, Naive Bayes Algorithm).
Is a DM method using different DM algorithms or is it the same?
An example to clarify - is there any difference between the below?
I'm using the Naive Bayes classification method.
I'm using the Naive Bayes classification algorithm.
Or is "Bayes" the method and "Naive Bayes" the algorithm?

Resources