I have a dataset with two cardinal attributes of comparable scale. I wish to divide the data points into 4 clusters, so as to have complete segmentation by attribute 1, while minimizing the variance within attribute 2.
E.g.: If plotting attribute 1 on the x-axis and attribute 2 on the y-axis, the resulting clusters should represent vertical cuts through the data set, which are sized horizontally so as to minimize the variance in attribute 2.
The only approach I have come up with so far is to employ k-means clustering and scale up attribute 1 so as to be the dominant factor in the distance function.
Any other suggestions for suitable unsupervised learning / clustering algorithms?
Related
I am using several U-Net variants for a brain tumor segmentation task. I get the following values for the performance measures including Dice, IOU, Area under receiver-operating characteristic (AUC) curves, and Area under Precision-Recall curves (AUPRC), otherwise called the average precision (AP) computed for varying IOU thresholds in the range [0.5:0.95] in intervals of 0.05.
From the above table, I could observe that Model-2 gave better values for the IOU and Dice metrics. I could understand that Dice coefficient gives more weightage for the TPs. However, Model - 1 gives superior values for the AUC, and AP#[0.5:0.95] metrics. What parameters need to be given higher importance in model selection under these circumstances?
My question is if we have 10 columns continuous variable,
can we do k-means to shrink 10 columns to 1 with corresponding cluster labels
and then do decision tree or logistic regression?
if a new data comes in, use k-mean result to determine its label and go to the machine learning model.
K-means is absolutely not a dimensionality reduction technique. Dimensionality reduction algorithms map the input space to a lower dimensional input space, while what you are proposing is mapping the input space directly to the output space which consists of the set of all integer labels.
I'd like to train a neural net to return a normalized color distance between two RGB pixels trained [at first] on a simple Euclidean distance with each color component being 0-255. there are 6 inputs (2 pixels x R,G,B components) and 1 output (a normalized 0-1 color distance). or should there be 48 binary inputs (2 pixels x 24 bits per px)?
the training set will probably consist of 1k - 10k randomly generated pixel pairs and their computed distances. i'm open to creating/chaining several simple nets rather than training one perfect one. e.g. splitting each by hue1/hue2 combo that can be cheaply determined in advance.
i'm new to ML and was hoping to get some advice since my intuition is basically zero for what type of net i need and ballpark for what the topology and params should be.
most supervised neural nets seem to do classification into buckets, but i don't quite understand how to get a net to predict a value that is not just one out of several categories. can i still use a simple feed-forward network for this task or something different?
thanks for any advice!
I'm looking to set up a linear regression using 2D Gaussian basis functions. My input training variables cover a two dimensional space. Before applying the machine learning (Bayesian linear regression), I need to select parameters for the Gaussians - mean and variance and also decide how many basis functions to use.
I am currently spacing the means (of a preallocated number of basis Gaussians) evenly over a grid, and just assuming constant variance. This is obviously not the best approach.
Any ideas on how to calculate these variables?
In the original paper of HOG (Histogram of Oriented Gradients) http://lear.inrialpes.fr/people/triggs/pubs/Dalal-cvpr05.pdf there are some images, which shows the hog representation of an image (Figure 6).In this figure the f, g part says "HOG descriptor weighted by respectively the positive and the negative SVM weights".
I don't understand what does this mean. I understand that when I train a SVM, I get a Weigth vector, and to classify, I have to use the features (HOG descriptors) as the input of the function. So what do they mean by positive and negative weigths? And how would I plot them like the paper?
Thanks in Advance.
The weights tell you how significant a specific element of the feature vector is for a given class. That means that if you see a high value in your feature vector you can lookup the corresponding weight
If the weight is a high positiv number it's more likely that your object is of the class
If your weight is a high negative number it's more likely that your object is NOT of the class
If your weight is close to zero this position is mostly irrelavant for the classification
Now your using those weights to scale the feature vector you have where the length of the gradients are mapped to the color-intensity. Because you can't display negative color intensities they decided to split the positive and negative visualization. In the visualizations you can now see which parts of the input-image contributes to the class (positiv) and which don't (negative).