In Topic Modeling can I use Logistic Normal distribution as a prior distribution and instead of a Gaussian distribution replace it with the Gaussian mixture component?
Model Logistic Normal distribution, mean, that you are using generalized linear model (GLM) and assume that errors follows the Normal distribution.The model using a Gaussian mixture component is essentially different from using Logistic Normal distribution.
Model using a Gaussian mixture component is type of generative models, this mean you can't replace, Logistic Normal distribution with Gaussian mixture component, but you can have an entirely different model, with different (assumption, parameters and equations).
Related
Do models such as MARS and GAM assume heteroscedasticity and IID errors? There seems to be a disagreement in the literature about certain assumptions. Looks like MARS is more robust than GAM but it is not clearly stated in the original papers.
If normality is an issue, should one use transformed data (Box-Cox or Yeo-Johnson) for the regression?
GAMs don't assume conditional normality; the "G" stands for generalised and indicates these models build off of the generalized linear model framework, which traditionally can model data as draws a distribution from the Exponential family of distributions.
If you fit a GAM with a Gaussian conditional distribution, then that model would assume conditional normality, but the general class of GAMs does not as one can choose an appropriate distribution for the response.
The Gaussian GAM also assumes the observations are conditionally homoscedastic. Other response distributions imply mean-variance relationships; e.g. with a Poisson response distribution, the variance equals the mean and hence larger counts are assumed to have higher variance.
GAMs do assume that the observations are i.i.d.; GEEs, GLMMs, and GAMMs are extensions that relax the assumption of independece.
MARS originally fitted via OLS so it would pick up some of the assumptions of the general linear model, but typically one uses some form of cross-validation to assess the model fit. As long as the cross-validation scheme reflects the properties of the data, then the classical assumptions of the linear model don't really apply as you're not relying on th theory to do inference.
I am trying out a basic architecture of Variational Autoencoder(VAE) in my project. But this generative model is not used for images, but for words. I came across a term called "Sampling from the normal distribution". What does this sampling mean? What is the purpose of it?
Normal distribution is a continuous probability distribution.
Sampling from normal distribution means getting a discrete value(or a set of discrete values) out of this equation.
This sampling is generally achieved by simple algorithms like Box-Muller Transform. Numpy and CUDA has facilities that can generate N values from a given normal distribution.
So as I understand it, to implement an unsupervised Naive Bayes, we assign random probability to each class for each instance, then run it through the normal Naive Bayes algorithm. I understand that, through each iteration, the random estimates get better, but I can't for the life of me figure out exactly how that works.
Anyone care to shed some light on the matter?
The variant of Naive Bayes in unsupervised learning that I've seen is basically application of Gaussian Mixture Model (GMM, also known as Expectation Maximization or EM) to determine the clusters in the data.
In this setting, it is assumed that the data can be classified, but the classes are hidden. The problem is to determine the most probable classes by fitting a Gaussian distribution per class. Naive Bayes assumption defines the particular probabilistic model to use, in which the attributes are conditionally independent given the class.
From "Unsupervised naive Bayes for data clustering with mixtures of
truncated exponentials" paper by Jose A. Gamez:
From the previous setting, probabilistic model-based clustering is
modeled as a mixture of models (see e.g. (Duda et al., 2001)), where
the states of the hidden class variable correspond to the components
of the mixture (the number of clusters), and the multinomial
distribution is used to model discrete variables while the Gaussian
distribution is used to model numeric variables. In this way we move
to a problem of learning from unlabeled data and usually the EM
algorithm (Dempster et al., 1977) is used to carry out the learning
task when the graphical structure is fixed and structural EM
(Friedman, 1998) when the graphical structure also has to be
discovered (Pena et al., 2000). In this paper we focus on the
simplest model with fixed structure, the so-called Naive Bayes
structure (fig. 1) where the class is the only root variable and all
the attributes are conditionally independent given the class.
See also this discussion on CV.SE.
I have been using scikit-learn's Dirichlet Process gaussian mixture model to cluster my dataset. And I have been using this excellent tutorial for this purpose : http://blog.echen.me/2012/03/20/infinite-mixture-models-with-nonparametric-bayes-and-the-dirichlet-process/
In the end, the author uses a dataset that clusters food items using their nutritional values (i.e- total fat, vitamin D, vitamin C etc) as features. Before the implementation of the algorithm, the author normalizes these features. What is the importance of the normalization ? Does every single item in the dataset need to have a feature set that is a gaussian distribution ? Is that an underlying assumption ?
Any help would be appreciated. Thanks!
A Dirichlet process Gaussian mixture model is the infinite limit of a Gaussian mixture model and therefore assumes Gaussian distribution of your data. Recap the generative process of a Gaussian mixture model. The formulation of a Dirichlet process mixture model itself however is independent of the observation distribution.
Normalisation of the data, e.g. z-score, is not necessary if you parametrize the base distribution of the model properly. If this is not possible in the implementation you use than normalisation is required.
I am a novice to machine learning, I have read about the HMM but I still have a few questions:
When applying the HMM for machine learning, how can the initial, emmission and transition probabilities be obtained?
Currently I have a set of values (consisting the angles of a hand which I would like to classify via an HMM), what should my first step be?
I know that there are three problems in a HMM (ForwardBackward, Baum-Welch, and Viterbi), but what should I do with my data?
In the literature that I have read, I never encountered the use of distribution functions within an HMM, yet the constructor that JaHMM uses for an HMM consists of:
number of states
Probability Distribution Function factory
Constructor Description:
Creates a new HMM. Each state has the same pi value and the transition probabilities are all equal.
Parameters:
nbStates The (strictly positive) number of states of the HMM.
opdfFactory A pdf generator that is used to build the pdfs associated to each state.
What is this used for? And how can I use it?
Thank you
You have to somehow model and learn the initial, emmision, and tranisition probabilities such that they represent your data.
In the case of discrete distributions and not to much variables/states you can obtain them form maximum likelihood fitting or you train a discriminative classifier that can give you a probability estimate like Random Forests or Naive Bayes. For continuous distributions have a look at Gaussian Processes or any other regression method like Gaussian Mixture Models or Regression Forests.
Regarding your 2. and 3. question: they are to general and fuzzy to be answered here. You should kindly refer to the following books: "Pattern Recognition and Machine Learning" by Bishop and "Probabilistic Graphical Models" by Koller/Friedman.