Let‘s assume that I have a Poisson distribution with gamma=10. I would like to fit a Gaussian distribution, which minimizes KL divergence to the Poisson distribution.
This is possible with variational inference. How can I use Stan to do this optimization?
The reference manual has a chapter on VI but only provides some high level information on how it is implemented internally, not how to use it.
The user guide mentions VI in chapter 22.2 but only with some general remarks on its efficiency.
A related question here on SO might be: Variational inference in PyStan API?
But that only asks whether advi has been implemented in PyStan (it has). There is no additional information.
CmdStanPy contains an example notebook:
https://cmdstanpy.readthedocs.io/en/latest/variational_bayes.html
https://github.com/stan-dev/cmdstanpy/blob/master/docs/notebooks/Variational%20Inference.ipynb
Related
I have been reading about probability distributions lately and got confused that what actually is the difference between probability distribution and data distribution or are they the same? Also what actually is the importance of probability distribution in Machine Learning?
Thanks
Data distribution is a function or a listing that shows all the possible values (or intervals) of the data. This can help you decide if the set of good that you have is good enough to apply any techniques over it. You want to avoid skewed data.
Probability distribution is a statistical function that describes all the possible values and likelihoods that a random variable can take within a given range. This helps you decide what type of statistical methods you can apply to your data. Example: If your data forms a Gaussian distribution then you already know how values would look like when they are 1-standard deviation away from normal and what is the probability of expecting more than 1-standard deviation.
NOTE: You may want to learn about how hypothesis testing is done for ML models.
I am referring Google's machine learning DataPrep course, in this lecture https://developers.google.com/machine-learning/data-prep/construct/sampling-splitting/imbalanced-data about solving class imbalanced problem, the technique mentioned is to first downsample and then upweight. This lecture talks about the theory but I couldn't find its practical implementation. Can someone guide?
Upweighting is done to calibrate the probablities provided by probabilistic classifiers so that the output of the predict_proba method can be directly interpreted as a confidence level.
Python implementation of the two calibration methods is provided here - https://scikit-learn.org/stable/auto_examples/calibration/plot_calibration.html#sphx-glr-auto-examples-calibration-plot-calibration-py
More details about probablity calibration is provided here - https://scikit-learn.org/stable/modules/calibration.html
I have some questions about SVM :
1- Why using SVM? or in other words, what causes it to appear?
2- The state Of art (2017)
3- What improvements have they made?
SVM works very well. In many applications, they are still among the best performing algorithms.
We've seen some progress in particular on linear SVMs, that can be trained much faster than kernel SVMs.
Read more literature. Don't expect an exhaustive answer in this QA format. Show more effort on your behalf.
SVM's are most commonly used for classification problems where labeled data is available (supervised learning) and are useful for modeling with limited data. For problems with unlabeled data (unsupervised learning), then support vector clustering is an algorithm commonly employed. SVM tends to perform better on binary classification problems since the decision boundaries will not overlap. Your 2nd and 3rd questions are very ambiguous (and need lots of work!), but I'll suffice it to say that SVM's have found wide range applicability to medical data science. Here's a link to explore more about this: Applications of Support Vector Machine (SVM) Learning in Cancer Genomics
I have little background knowledge of Machine Learning, so please forgive me if my question seems silly.
Based on what I've read, the best model-free reinforcement learning algorithm to this date is Q-Learning, where each state,action pair in the agent's world is given a q-value, and at each state the action with the highest q-value is chosen. The q-value is then updated as follows:
Q(s,a) = (1-α)Q(s,a) + α(R(s,a,s') + (max_a' * Q(s',a'))) where α is the learning rate.
Apparently, for problems with high dimensionality, the number of states become astronomically large making q-value table storage infeasible.
So the practical implementation of Q-Learning requires using Q-value approximation via generalization of states aka features. For example if the agent was Pacman then the features would be:
Distance to closest dot
Distance to closest ghost
Is Pacman in a tunnel?
And then instead of q-values for every single state you would only need to only have q-values for every single feature.
So my question is:
Is it possible for a reinforcement learning agent to create or generate additional features?
Some research I've done:
This post mentions A Geramifard's iFDD method
http://www.icml-2011.org/papers/473_icmlpaper.pdf
http://people.csail.mit.edu/agf/Files/13RLDM-GQ-iFDD+.pdf
which is a way of "discovering feature dependencies", but I'm not sure if that is feature generation, as the paper assumes that you start off with a set of binary features.
Another paper that I found was apropos is Playing Atari with Deep Reinforcement Learning, which "extracts high level features using a range of neural network architectures".
I've read over the paper but still need to flesh out/fully understand their algorithm. Is this what I'm looking for?
Thanks
It seems like you already answered your own question :)
Feature generation is not part of the Q-learning (and SARSA) algorithm. In a process which is called preprocessing you can however use a wide array of algorithms (of which you showed some) to generate/extract features from your data. Combining different machine learning algorithms results in hybrid architectures, which is a term you might look into when researching what works best for your problem.
Here is an example of using features with SARSA (which is very similar to Q-learning).
Whether the papers you cited are helpful for your scenario, you'll have to decide for yourself. As always with machine learning, your approach is highly problem-dependent. If you're in robotics and it's hard to define discrete states manually, a neural network might be helpful. If you can think of heuristics by yourself (like in the pacman example) then you probably won't need it.
I was reading the paper on Relational Fisher Kernel which involves Bayesian Logic Programs to calculate the Fisher score and then uses SVM to obtain the class labels for each data item.
I don't have strong background from Machine learning. Can someone please let me know about how to go about implementing an end-to-end Relational Fisher Kernel and what sort of input would it expect? I could not find any easy step-by-step flow showing this implementation. I am ok with using libraries for SVM etc. (e.g. libsvm), but I would like to know the end-to-end flow (in as easy language as possible). Any help will be highly appreciated.
libsvm does not implement the Relation Fisher Kernel, however, you can calculate the Fisher information matrix as described in the paper, and the use it as the precomputed kernel input to libsvm. See: using precomputed kernels with libsvm