Pycaret use for clustering - machine-learning

How to use compare model function from Pycaret for clustering algorithms. Unable to use multiple algorithms together to check the performance of each algorithm. Is there any other low code tool which provides this functionality.

Related

Importance of features in Dataset using Machine Learning?

How we can calculate the importance of features in data set using machine learning ? which algorithm will be better and why ?
There are several methods that fit a model to the data and based on the fit classify the features from most relevant to less relevant. If you want to know more just google feature selection.
I don't know which language you're using but here's a link to a python page about it:
http://scikit-learn.org/stable/modules/feature_selection.html
You can use this function:
http://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.RFE.html#sklearn.feature_selection.RFE
This will eliminate the less meaningful features from you dataset based on a fit from a classifier, you can choose for instance logistic regression or the SVM and select how many features you want left.
I think the choice of the best method depends on the data, so more information is necessary.

Clustering Analysis on images

I have a large set of scanned documents that I need to index however the the documents of interest are a small proportion of the entire package my classifier needs to identify. To get an idea of the optimum number of classes and how best to merge documents in a class I wanted to run an unsupervised clustering analysis.
Which distance method would work better to capture the structural information. Also would agglomerative Hierarchical clustering be the best clustering approach for the given task? Thanks
An unsupervised clustering technique fails on scanned documents since it fails to grasp the underlying structure and ends up giving non nonsensical clusters. So the approach is fundamentally flawed. However Classification using deep convolutional neural networks, with sufficient data and carefully chosen distinct classes, can outperform OCR techniques if the documents have a distinct structure.

Is it possible to use a support vector machine in combination with agglomerative clusterer?

Is it possible to use support vector machine in combination with a clustering algorithm somehow? What is a sample use-case where both of them need to communicate with each other?
You can always use clustering to partition your data set and learn multiple classifiers, then use ensemble methods to combine the classification results.
If a class consists of multiple clusters, this can improve accuracy to learn both sub-classes and merge them after classification.

How to categorize continuous data?

I have two dependent continuous variables and i want to use their combined values to predict the value of a third binary variable. How do i go about discretizing/categorizing the values? I am not looking for clustering algorithms, i'm specifically interested in obtaining 'meaningful' discrete categories i can subsequently use in in a Bayesian classifier.
Pointers to papers, books, online courses, all very much appreciated!
That is the essence of machine learning and problem one of the most studied problem.
Least-square regression, logistic regression, SVM, random forest are widely used for this type of problem, which is called binary classification.
If your goal is to pragmatically classify your data, several libraries are available, like Scikits-learn in python and weka in java. They have a great documentation.
But if you want to understand what's the intrinsics of machine learning, just search (here or on google) for machine learning resources.
If you wanted to be a real nerd, generate a bunch of different possible discretizations and then train a classifier on it, and then characterize the discretizations by features and then run a classifier on that, and see what sort of discretizations are best!?
In general discretizing stuff is more of an art and having a good understanding of what the input variable ranges mean.

K-Means alternatives and performance

I've been reading about similarity measures and image feature extraction; most of the papers refer to k-means as a good uniform clustering technique and my question is, is there any alternative to k-means clustering that performs better for an specific set?
You may want to look at MeanShift clustering which has several advantages over K-Means:
Doesn't require a preset number of clusters
K-Means clusters converge to n-dimensional voronoi grid, MeanShift allows other cluster shapes
MeanShift is implemented in OpenCV in the form of CAMShift which is a MeanShift adaptation for tracking objects in a video sequence.
If you need more info, you can read this excellent paper about MeanShift and Computer Vision:
Mean shift: A robust approach toward feature space analysis
A simple first step, you could generalize k-means to EM. But there are tons of clustering methods available and the kind of clustering you need depends on your data (features) and the applications. In some cases, even your distances you use matters and so might have to do some sort of distance transformation, if it is not in the kind of space you want it to be in.

Resources