Never used R before but have spent the last few months working through Spatstat to analyse cell distributions in cancer tissue. I extract xy co-ordinates of positively stained cells for two different cell populations to generate a marked ppp. Exploring the data visually through density plots and K/G cross functions etc identifies some cases where distribution of the two cell populations are different in terms of intensity. furthermore, in some cases the cells appear to be exclusive of each other whilst other cases are characterised by 'hotspots' shared by both populations. However, I can't work out how best to demonstrate this apart from having side-by-side images as illustrations. Is there any way to integrate two density plots or create a graphic of shared regions of intensity? Or another function in Spatstat that I have missed?
Apologies if this is a silly question but struggling to progress things and would be very grateful for advice.
Regards,
Matt
Tests for significance (= non-constant ratio of the two intensities) are provided in the sparr package, which extends spatstat. Try the function tolerance.
Related
I have 96 features and the labels are represented by 1 and -1 for inputting to a deep learning model.
1- PCA
Here the 3 axis represent the 3 first principal components. The blue cloud represents the labels 1 and the red cloud represents the labels -1.
Even if we can identify two different clouds visually, they are stick together. I think we can face a problem during the training phase because of that.
2- t-SNE
For the same features and labels with t-SNE, we can still distinguish two clouds, but again they are stick together.
Questions :
1- Does the fact that the two clouds of dots are stick together can affect the % accuracy during the training and testing phase?
2- When we remove the red and blue color, we have somehow only one big cloud. Is there a way to work around the problem the two clouds ''stuck'' together?
What you call sticking together, means that in this space, your data isn't linearly separable. It doesn't seem to be nonlinearly separable either. I would expect with this these components, that you get poor accuracy for sure.
The way to work around the problem is more or different data. You have some options.
1) What about including more principal components? Maybe, 4, 5, 10 components would solve your problem. That might not work depending on your dataset, but it's the most obvious thing to try first.
2) You could try alternative matrix decomposition techniques. PCA isn't the only one. There's NMF, kernel PCA, LSA, and many others. Which one works best for you will fundamentally be determined by the distribution of your data.
3) Use any other type of feature selection. Frankly, 96 isn't that many, to begin with. You intend on doing deep learning? Wouldn't you normally put all 96 features into a deep learning model? There any many other ways to do feature selection besides matrix decomposition if you need to.
Good luck.
I have implemented a PCA in order to assign rotation information to connected 2D points extracted from images (edge fragments, see data points in image below for examples). I want the information to be robustly reproducible under rotation of the data so that I can use it for recognition purposes (comparable to 1). For this purpose, I want the principal components (eigenvectors) to rotate with the points (+- 180 deg).
My implementation includes a mean centring of the data. I have also tested the implementations of OpenCV and one in Python which yield to the same results. This is why I assume that my implementation is correct and that the problem is the method itself. I had quite good results for other 2D distributions. Nonetheless, for these specific data points, it does not seem to work.
I have done all the tests with and without normalization to the standard deviation (ie., dividing the data of the x and y values by their standard deviations).
Here are my results for different rotations of the data (extracted from images):
PCA Results
As can be seen, the method does not allow to find a reproducible rotation. The data is affected by quantization (because it is extracted from images) which is why I had the idea that this is the origin of the problem. Therefore I repeated the experiment with added random noise (4th column). As can be seen, this does not seem to be the problem.
I have no precise idea how to explain the displayed effects. I note that the general orientation of the principal axes seems to be similar in the first and second row, respectively. I think that this means something, but what exactly? Can I somehow solve the problem or are there possibly better methods for such a problem? Due to some preprocessing it can be assumed that there are no outliers.
Thanks for your help!
For symmetrycal shapes like you shown you can try symmetry detector like this: https://github.com/subokita/Sandbox/tree/master/FSD
On examples it give results like this:
Hi! I'm kinda new to OpenCV and Image processing. I've tried following approaches until now, but I believe there's gotta be a better approach.
1). Finding color range (HSV) manually using GColor2/Gimp tool/trackbar manually from a reference image which contains a single fruit (banana)
with a white background. Then I used inRange(), findContour(),
drawContour() on both reference banana image & target
image(fruit-platter) and matchShapes() to compare the contours in the
end.
It works fine as long as the color range chosen is appropriate. (See 2nd image). But since these fruits doesn’t have uniform solid color, this approach didn't seem like an ideal approach to me. I don't want to hard-code the color-range (Scalar values) inside inRange().
2). Manual thresholding and contour matching.
Same issue as (1). Don't wanna hard-code the threshold value.
3). OTSU thresholding and canny edge detection.
Doesn't work well for banana, apple and lemon.
4). Dynamically finding colors. I used the cropped banana reference
image. Calculated the mean & standard deviation of the image.
Don't know how to ignore the white background pixels in my mean/std-dev calculation without looping through each x,y pixels. Any suggestions on this are welcome.
5). Haar Cascade training gives inaccurate results. (See the image below). I believe proper training might give better results. But not interested in this for now.
Other approaches I’m considering:
6). Using floodfill to find all the connected pixels and
calculating
the average and standard deviation of the same.
Haven't been successful in this. Not sure how to get all the connected pixels. I dumped the mask (imwrite) and got the banana (from the reference banana image) in black & white form. Any suggestions on this are welcome.
7). Hist backprojection:- not sure how it would help me.
8). K-Means , not tried yet. Let me know, if it’s better than step
(4).
9). meanshift/camshift → not sure whether it will help. Suggestions are welcome.
10). feature detection -- SIFT/SURF -- not tried yet.
Any help, tips, or suggestions will be highly appreciated.
Answers to such generic questions (object detection), especially to ones like this that are very active research topics, essentially boil down to a matter of preference. That said, of the 10 "approaches" you mentioned, feature detection/extraction is probably the one deserving the most attention, as it's the fundamental building block of a variety of computer vision problems, including but not limited to object recognition/detection.
A very simple but effective approach you can try is the Bag-of-Words model, very commonly used in early attempts at fast object detection, with all global spatial relationship information lost.
Late object detection research trend from what I observed from annual computer vision conference proceedings is that you encode each object by a graph that store feature descriptors in the nodes and store the spatial relationship information in the edges, so part of the global information is preserved, as we can now match not only the distance of feature descriptors in feature space but also the spatial distance between them in image space.
One common pitfall specific to this problem you described is that the homogeneous texture on banana and apple skins may not warrant a healthy distribution of features and most features you detect will be on the intersections of (most commonly) 3 or more objects, which in itself isn't a commonly regarded "good" feature. For this reason I suggest looking into superpixel object recognition (Just Google it. Seriously.) approaches, so the mathematical model of class "Apple" or "Banana" will be a block of interconnecting superpixels, stored in a graph, with each edge storing spatial relationship information and each node storing information concerning the color distribution etc. of the neighborhood specified by the superpixel. Then recognition will be come a (partial) graph matching problem or a problem related to probabilistic graphical model with many existing research done w.r.t it.
I need to test whether two histograms are significantly different from each other in terms of mean and variance. Both histograms only consist of two bars. (When) should I use two sample Kolomgorov-Smirnov or (Pearson's) Chi Square? How big should the sample size be for each? Are there better alternatives?
For future reference: KS is for continuous data, while CS works for categorical data. Therefore for comparing two histograms CS works better. However, it also needs sufficient sample size of at least 5 observations per cell.
I'm searching for algorithms/methods that are used to classify or differentiate between two outdoor environments. Given an image with vehicles, I need to be able to detect whether the vehicles are in a natural desert landscape, or whether they're in the city.
I've searched but can't seem to find relevant work on this. Perhaps because I'm new at computer vision, I'm using the wrong search terms.
Any ideas? Is there any work (or related) available in this direction?
I'd suggest reading Prince's Computer Vision: Models, Learning, and Inference (free PDF available). It covers image classification, as well as many other areas of CV. I was fortunate enough to take the Machine Vision course at UCL which the book was designed for and it's an excellent reference.
Addressing your problem specifically, a simple MAP or MLE model on pixel colours will probably provide a reasonable benchmark. From there you could look at more involved models and feature engineering.
Seemingly complex classifications similar to "civilization" vs "nature" might be able to be solved simply with the help of certain heuristics along with classification based on color. Like Gilevi said, city scenes are sure to contain many flat lines and right angles, while desert scenes are dominated by rolling dunes and so on.
To address this directly, you could use OpenCV's hough - lines algorithm on the images (tuned for this problem of course) and look at:
a) how many lines are fit to the image at a given threshold
b) of the lines that are fit what is the expected angle between two of them; if the angles are uniformly distributed then chances are its nature, but if the angles are clumped up around multiples of pi/2 (more right angles and straight lines) then it is more likely to be a cityscape.
Color components, textures, and degree of smoothness(variation or gradient of image) may differentiate the desert and city background. You may also try Hough transform, which is used for line detection that can be viewed as city feature (building, road, bridge, cars,,,etc).
I would recommend you this research very similar with your project. This article presents a comparison of different classification techniques to obtain the scene classifier (urban, highway, and rural) based on images.
See my answer here: How to match texture similarity in images?
You can use the same method. I already solved in the past problems like the one you described with this method.
The problem you are describing is that of scene categorization. Search for works that use the SUN database.
However, you only working with two relatively different categories, so I don't think you need to kill yourself implementing state-of-the-art algorithms. I think taking GIST features + color features and training a non-linear SVM would do the trick.
Urban environments is usually characterized with a lot of horizontal and vertical lines, GIST captures that information.