Using Contrastive Divergence for Conditional Restricted Boltzmann Machines

Using Contrastive Divergence for Conditional Restricted Boltzmann Machines - machine-learning

I would like to use a Restricted Boltzmann Machine for pattern recognition.
It has come to my attention that they are actually used for finding distributions in patterns rather than pattern recognition. I looked at the following paper: http://www.cs.toronto.edu/~hinton/absps/uai_crbms.pdf which seems to use an extension of RBM, called ConditionalRBM. I would like to implement that. I already used Contrastive Divergence to implement RBM, and I would like to stick to that for CRBM, for simplicity. The paper focuses on replacing contrastive divergence, with more accurate algorithms.
From what I see in the paper, I now need to create three weight matrices (as now I also have to include the classification vectors)(see Figure1 in the paper), and I am not sure how to update each of them (ie how to create the vectors which will influence the change of the matrix.)
Could someone please clarify this for me or suggest an algorithm for classification using simple RBM, which I already implemented?
Thanks.

I found the following paper which clarifies the issue: http://uai.sis.pitt.edu/papers/11/p463-louradour.pdf . The poster here is also very helpful, especially for implementation: http://www.dmi.usherb.ca/~larocheh/publications/drbm-mitacs-poster.pdf . Instead of using 3 weight matrices it is enough to use 2, one for classification vectors and one for the actual patterns.
The formulas for the activation probabilities change, but the idea is the same.

Related

Combining different feature vectors for, SVM training for MRI classification

I've been currently working on my FYP on Brain tumor classification.Extracted features using wavelet transform ,glcm ,polynomial transform etc.
IS IT RIGHT TO APPEND THESE FEATURE VECTORS (columnwise) for training? like combinations of these feature vectors eg: glcm+wavelet
Can you suggest me any papers related to this?
THANK YOU FOR THE HELP

Yes, this method is known as early fusion.
In other words, early fusion is when you are concatenating 2 or more features sets prior to model training.
There are a number of other methods for feature fusion, including model-, and late-fusion.
Take a look at these papers which might help you:
Specific to a health-based application
figure which might help you to grasp the concept

How do sample weights work in classification models?

What does it mean to provide weights to each sample for
classification? How does a classification algorithm like Logistic regression or SVMs use weights to emphasize certain examples more than others? I would love going into details to unpack how these algorithms leverage sample weights.
If you look at the sklearn documentation for logistic regression, you can see that the fit function has an optional sample_weight parameter which is defined as an array of weights assigned to individual samples.

this option is meant for imbalance dataset. Let's take an example: i've got a lot of datas and some are just noise. But other are really important to me and i'd like my algorithm to consider them a lot more than the other points. So i assigne a weight to it in order to make sure that it will be dealt with properly.
It change the way the loss is calculate. The error (residues) will be multiplie by the weight of the point and thus, the minimum of the objective function will be shifted. I hope it's clear enough. i don't know if you're familiar with the math behind it so i provide here a small introduction to have everything under hand (apologize if this was not needed)
https://perso.telecom-paristech.fr/rgower/pdf/M2_statistique_optimisation/Intro-ML-expanded.pdf

See a good explanation here: https://www.kdnuggets.com/2019/11/machine-learning-what-why-how-weighting.html .

Can I implement a classifier using a function?

I was learning about different techniques for classification, like probablistic classifiers etc , and stubled upon the question Why cant we implement a binary classifier as a Regression function of all the attributes and classify on the basis of the output of the function , say if the output is less than a certain value it belongs to class A , else in class B . Is there any limitation to this method compared to probablistic approach ?

You can do this and it is often done in practice, for example in Logistic Regression. It is not even limited to binary classes. There is no inherent limitation compared to a probabilistic approach, although you should keep in mind that both are fundamentally different approaches and hard to compare.

I think you have some misunderstanding in classification. No matter what kind of classifier you are using (svm, or logistic regression), you can always view the output model as
f(x)>b ===> positive
f(x) negative
This applies to both probabilistic model and non-probabilistic model. In fact, this is something related to risk minimization which results the cut-off branch naturally.

Yes, this is possible. For example, a perceptron does exactly that.
However, it is limited in its use to linearly separable problems. But multiple of them can be combined to solve arbitrarily complex problems in general neural networks.
Another machine learning technique, SVM, works in a similar way. It first transforms the input data into some high dimensional space and then separates it via a linear function.

Which classification algorithm to choose?

I would like to classify text documents into four categories. Also I have lot of samples which are already classified that can be used for training. I would like the algorithm to learn on the fly.. please suggest an optimal algorithm that works for this requirement.

If by "on the fly" you mean online learning (where training and classification can be interleaved), I suggest the k-nearest neighbor algorithm. It's available in Weka and in the package TiMBL.
A perceptron will also be able to do this.
"Optimal" isn't a well-defined term in this context.

there are several algorithms which can be learned on fly. Examples: k-nearest neighbors, naive Bayes, neural networks. You can try how appropriate each of these methods are on a sample corpus.

Since you have unlabeled data you might want to use a model where this helps. The first thing that comes to my mind is nonlinear NCA: Learning a Nonlinear Embedding by Preserving
Class Neighbourhood Structure, (Salakhutdinov, Hinton).

Well....I have to say that document classification is kind of different what you guys are thinking.
Typically, in document classification, after preprocessing, the test data is always extremely huge, for example, O(N^2)...Therefore it might be too computationally expensive.
The another typical classifier that came into my mind is discriminant classifier...which doesn't need the generative model for your dataset. After training, you have to do is to put your single entry to the algorithm, and it is gonna be classified.
Good luck with this. For example, you can check E. Alpadin's book, Introduction to Machine Learning.

Unsupervised classification methods available

I'm doing a research which involves "unsupervised classification".
Basically I have a trainSet and I want to cluster data in X number of classes in unsupervised way. Idea is similar to what k-means does.
Let's say
Step1)
featureSet is a [1057x10] matrice and I want to cluster them into 88 clusters.
Step2)
Use previously calculated classes to compute how does the testData is classified
Question
-Is it possible to do it with SVM or N-N ? Anything else ?
-Any other recommendations ?

There are many clustering algorithms out there, and the web is awash with information on them and sample implementations. A good starting point is the Wikipedia entry on cluster analysis Cluster_analysis.
As you have a working k-means implementation, you could try one of the many variants to see if they yeild better results (k-means++ perhaps, seeing as you mentioned SVM). If you want a completely different approach, have a look at Kohonen Maps - also called Self Organising Feature Maps. If that looks too tricky, a simple hierarchical clustering would be easy to implement (find the nearest two items, combine, rinse and repeat).

This sounds like a classic clustering problem. Neither SVMs or neural networks are going to be able to directly solve this problem. You can use either approach for dimensionality reduction, for example to embed your 10-dimensional data in two-dimensional space, but they will not put the data into clusters for you.
There are a huge number of clustering algorithms besides k-means. If you wanted a contrasting approach, you might want to try an agglomerative clustering algorithm. I don't know what kind of computing environment you are using, but I quite like R and this (very) short guide on clustering.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

Using Contrastive Divergence for Conditional Restricted Boltzmann Machines - machine-learning

Related

Combining different feature vectors for, SVM training for MRI classification

How do sample weights work in classification models?

Can I implement a classifier using a function?

Which classification algorithm to choose?

Unsupervised classification methods available

Categories

Resources