What are the common algorithm to calculate similarity between zones of images - image-processing

I have already tried mean squared error and cross correlation, but they don't give me that much of a good result. I'm doing that for Brain MRI. Thank you.

I have seen principal component analysis used to compare separate brain scan images.
http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=5874201
This might be useful, but I am not entirely sure what you are trying to do with similarity between sones.

Related

Are there any ways to build an ML model using CBIR and SIFT for image comparison in my case?

I have this project I'm working on. A part of the project involves multiple test runs during which screenshots of an application window are taken. Now, we have to ensure that screenshots taken between consecutive runs match (barring some allowable changes). These changes could be things like filenames, dates, different logos, etc. within the application window that we're taking a screenshot of.
I had the bright idea to automate the process of doing this checking. Essentially my idea was this. If I could somehow mathematically quantify the difference between a screenshot from the N-1th run and the Nth run, I could create a binary labelled dataset that mapped feature vectors of some sort to a label (0 for pass or 1 for fail if the images do not adequately match up). The reason for all of this was so that my labelled data would help make the model understand what scale of changes are acceptable, because there are so many kinds that are acceptable.
Now lets say I have access to lots of data that I have meticulously labelled, in the thousands. So far I have tried using SIFT in opencv using keypoint matching to determine a similarity score between images. But this isn't an intelligent, learning process. Is there some way I could take some information from SIFT and use it as my x-value in my dataset?
Here are my questions:
what would that be the information I need as my x-value? It needs to be something that represents the difference between two images. So maybe the difference between feature vectors from SIFT? What do I do when those vectors are of slightly different dimensions?
Am I on the right track with thinking about using SIFT? Should I look elsewhere and if so where?
Thanks for your time!
The approach that is being suggested in the question goes like this -
Find SIFT features of two consecutive images.
Use those to somehow quantify the similarity between two images (sounds reasonable)
Use this metric to first classify the images into similar and non-similar.
Use this dataset to train a NN do to the same job.
I am not completely convinced if this is a good approach. Let's say that you created the initial classifier with SIFT features. You are then using this data to train a NN. But this data will definitely have a lot of wrong labels. Because if it didn't have a lot of wrong labels, what's stopping you from using your original SIFT based classifier as your final solution?
So if your SIFT based classification is good, why even train a NN? On the other hand, if it's bad, you are giving a lot of wrong labeled data to the NN for training. I think the latter is a probably a bad idea. I say probably because there is a possibility that maybe the wrong labels just encourage the NN to generalize better, but that would require a lot of data, I imagine.
Another way to look at this is, let's say that your initial classifier is 90% accurate. That's probably the upper limit of the performance for the NN that you are looking at when talking about training it with this data.
You said that the issue that you have with your first approach is that 'it's not a an intelligent, learning process'. I think it's the wrong approach to think that the former approach is always inferior to the latter. SIFT is a powerful tool that can solve a lot of problems without all the 'black-boxness' of an NN. If this problem can be solved with sufficient accuracy using SIFT, I think going after a learning based approach is not the way to go, because again, a learning based approach isn't necessarily superior.
However, if the SIFT approach isn't giving you good enough results, definitely start thinking of NN stuff, but at that point, using the "bad" method to label the data is probably a bad idea.
Also in relation, I think you could potentially be underestimating the amount of data that is needed for this. You mentioned data in the thousands, but that's honestly, not a lot. You would need a lot more, I think.
One way I would think about instead doing this -
Do SIFT keyponits detection for a sample reference image.
Manually filter out keypoints that does not belong to the things in the image that are invariant. That is, just take keypoints at the locations in the image that is guaranteed (or very likely) to be always present.
When you get a new image, compute the keypoints and do matching with the reference image.
Set some threshold of the ratio of good matches to the total number of matches.
Depending on your application, this might give you good enough results.
If not, and if you really want your solution to be NN based, I would say you need to manually label the dataset as opposed to using SIFT.

Non Convex Optimizations

I have a gd algorithm and I am trying to come up with a non-convex univariate optimization problem. I want to plot the function python and then show two runs of gd, one where it gets caught in a local minimum and one where it manages to make it to a global minimum. I am thinking of using different starting points to accomplish this.
That being said I am somewhat clueless about coming up with such a function or trying two different points, any help is appreciated.
Your question is really broad and really hard to answer because nonconvex optimization is rather complicated so is any iterative algorithm that solves such problems. As a quick hint, you can use the Mexican Hat function (or a simple polynomial that gives you what you want) for your test case. Also these papers can give you come context : Paper1 Paper2
Good luck.

Catalog of Features. Feature extraction from images for SVM

I'm looking for reliable features for classification of cell types in microscope images. I wonder what is the best approach.
1) I've tried the approach described by Pontil & Verii - using each pixel of normalized images as a feature. It is easy to implement, but the results are not fully satisfactory. And another problem is - the classification is done with some kind of statistic magic and I can't understand why some results are bad.
2) I've tried to extract high level features such as peaks, holes. My implementation is slow, but the advantage is I understand why one cell is identified as such and another not, as you can visualize these features in test images.
3) Recently I've found in an article such features:
angular second-order,
distance, contrast, entropy, anti-difference distance, relevant, mean
of sum, mean of difference, entropy of sum, entropy of difference,
variance, variance of sum, variance of difference.
I wonder whether there are some standard libraries for the extraction of these features (preferably in C/C++) ?
Is there a catalogue of feature-types with pros/cons, use-case description, etc?
Thank you for any suggestion in advance!
I can recommend this article by Lindblad et al, published in the scientific journal Cytometry. It covers some aspects of feature extraction and classification of cells. It does not utilize any standard libraries for feature extraction/classification, but it contains some information on how to build a classifier based on general features.
This might not solve your problem completely, but I hope it might help you move towards a better solution.
You should try Gabor feature extraction technique as it is supposed to extract features very similar to human visual cortical cells...by setting filters at different orientation and scale and then extracting features from each set-up .
you can Start learning from Wikipedea
I think that the Insight Segmentation and Registration Toolkit (ITK) or Visualization Toolkit (VTK) would work well.
Some other options (that might not necessarily include all the features you want) are
http://opencv.org/
http://gdal.org/
http://www.vips.ecs.soton.ac.uk/index.php?title=VIPS
http://www.xdp.it/cximage.htm
Finally I've found what I've searched for and would like to share:
https://sites.google.com/site/cvonlinewiki/home/geometric-feature-extraction-methods
The list looks pretty mature and complete.
EDIT
Another good article for features in biological cells is:
A feature set for cytometry on digitized
microscopic images
A good description of shape features:
http://www.math.uci.edu/icamp/summer/research_11/park/shape_descriptors_survey.pdf

How to calculate distance when we have sparse dataset in K nearest neighbour

I am implementing K nearest neighbour algorithm for a very sparse data. I want to calculate the distance between a test instance and each sample in the training set, but I am confused.
Because most of the features in training samples don't exist in test instance or vice versa (missing features).
How can I compute the distance in this situation?
To make sure I'm understanding the problem correctly: each sample forms a very sparsely filled vector. The missing data is different between samples, so it's hard to use any Euclidean or other distance metric to gauge similarity of samples.
If that is the scenario, I have seen this problem show up before in machine learning - in the Netflix prize contest, but not specifically applied to KNN. The scenario there was quite similar: each user profile had ratings for some movies, but almost no user had seen all 17,000 movies. The average user profile was quite sparse.
Different folks had different ways of solving the problem, but the way I remember was that they plugged in dummy values for the missing values, usually the mean of the particular value across all samples with data. Then they used Euclidean distance, etc. as normal. You can probably still find discussions surrounding this missing value problem on that forums. This was a particularly common problem for those trying to implement singular value decomposition, which became quite popular and so was discussed quite a bit if I remember right.
You may wish to start here:
http://www.netflixprize.com//community/viewtopic.php?id=1283
You're going to have to dig for a bit. Simon Funk had a little different approach to this, but it was more specific to SVDs. You can find it here: http://www.netflixprize.com//community/viewtopic.php?id=1283
He calls them blank spaces if you want to skip to the relevant sections.
Good luck!
If you work in very high dimension space. It is better to do space reduction using SVD, LDA, pLSV or similar on all available data and then train algorithm on trained data transformed that way. Some of those algorithms are scalable therefor you can find implementation in Mahout project. Especially I prefer using more general features then such transformations, because it is easier debug and feature selection. For such purpose combine some features, use stemmers, think more general.

Relating Machine learning Techniques to solve optimization Problem

Consider an optimization problem of some dimension n, Given some linear set of equations(inequalities) or constraints on the inputs which form a convex region, finding the maximum\minimum value of some expression which is some linear combination of inputs(or dimensions).
For larger dimension, these optimization problems take much time to give the exact answer.
So, can we use machine learning techniques, to get some approximate solution in lesser time.
if we can use machine learning techniques in this context, How the Training set should be??
Do you mean "How big should the training set be?" If so, then that is very much a "how long is a piece of string" question. It needs to be large enough for the algorithm being used, and to represent the data that is being modeled.
This doesn't strike me as being especially focused on machine learning, as is typically meant by the term anyway. It's just a straightforward constrained optimization problem. You say that it takes too long to find solutions now, but you don't mention how you're trying to solve the problem.
The simplex algorithm is designed for this sort of problem, but it's exponential in the worst case. Is that what you're trying that's taking too long? If so, there are tons of metaheuristics that might perform well. Tabu search, simulated annealing, evolutionary algorithms, variable depth search, even simple multistart hill climbers. I would probably try something along those lines before I tried anything exotic.

Resources