I know tl;dr;
I'll try to explain my problem without bothering you with ton's of crappy code. I'm working on a school assignment. We have pictures of smurfs and we have to find them with foreground background analysis. I have a Decision Tree in java that has all the data (HSV histograms) 1 one single node. Then tries to find the best attribute (from the histogram data) to split the tree on. Then executes the split and creates a left and a right sub tree with the data split over both node-trees. All the data is still kept in the main tree to be able to calculate the gini index.
So after 26 minutes of analysing smurfs my pc has a giant tree with splits and other data. Now my question is, can anyone give me a global idea of how to analyse a new picture and determine which pixels could be "smurf pixels". I know i have to generate a new array of data points with the HSV histograms of the new smurf and then i need to use the generated tree to determine which pixels belong to a smurf.
Can anyone give me a pointer on how to do this?
Some additional information.Every Decision Tree object has a Split object that has the best attribute to split on, the value to split on and a gini index.
If i need to provide any additional information I'd like to hear it.
OK. Basically, in unoptimized pseudo-code: In order to label pixels in a new image:
For each pixel in the new image:
Calculate pixel's HSV features
Recursively, starting from the tree's root :
Is this a leaf? if it is, give the pixel the dominant label of the node.
Otherwise, check the splitting criterion against the pixel's features, and go to the right or left child accordingly
I hope this makes sense in your context.
Related
I have a project to build a 3D model of the spinal roots in order to simulate the stimulation by an electrode. For the moment I've been handled two things, the extracted positions of the spinal roots (from the CT scans) and the selected segments out of the points (see both pictures down below). The data I'm provided is in 3D and all the segments are clearly distinct although it does not look like it on the figures below as it is zoomed out.
Points and segments extracted from the spinal cord CT scans:
.
Selected segments out of the points:
I'm now trying to connect these segments so as to have the centrelines for all the spinal roots at the end. The segments are not classified, simply of different colors to differentiate them on the plot. The task is then about vertically connecting the segments that look to be part of the same root path.
I've been reviewing the literature on how I could tackle that issue. As I'm still quite new to the field I don't have much intuition on what could work and what could not. I have two subtasks to solve here, connecting the lines and classifying the roots, and while connecting the segments after classification seems no big deal, classifying them seems decently harder. So I'm not sure in which order to proceed.
Here are the few options I'm considering to deal with the task :
Use a Kalman filter to extract the vertical lines from the selected segments and the missing parts
Use of a Hough transform to detect vertical lines, by trying to express the spinal root segments in the parametric space and see how they cluster and see if anything can be inferred from there.
Apply some sort of SVM classification algorithm on the segments to classify them by roots. I could characterize each segment by its orientation and position, and classify them based on similarities in the parameters I'm selecting, and then connect the segments. Or use the endpoint position of each segment and connect it to one of the nearest neighbours if their orientation/position is matching.
I'm open to any suggestions, any piece of advice, or any other ideas on how to deal with the current problem.
Thanks for taking your time to read this.
We have fixed stereo pairs of cameras looking into a closed volume. We know the dimensions of the volume and have the intrinsic and extrinsic calibration values
for the camera pairs. The objective being to be able to identify the 3d positions of multiple duplicate objects accurately.
Which naturally leads to what is described as the correspondence problem in litrature. We need a fast technique to match ball A from image 1 with Ball A from image 2 and so on.
At the moment we use the properties of epipolar geomentry (Fundamental matrix) to match the balls from different views in a crude way and works ok when the objects are sparse,
but gives a lot of false positives if the objects are densely scattered. Since ball A in image 1 can lie anywhere on the epipolar line going across image 2, it leads to mismatches
when multiple objects lie on that line and look similar.
Is there a way to re-model this into a 3d line intersection problem or something? Since the ball A in image 1 can only take a bounded limit of 3d values, Is there a way to represent
it as a line in 3d? and do a intersection test to find the closest matching ball in image 2?
Or is there a way to generate a sparse list of 3d values which correspond to each 2d grid of pixels in image 1 and 2, and do a intersection test
of these values to find the matching objects across two cameras?
Because the objects can be identical, OpenCV feature matching algorithms like FLANN, ORB doesn't work.
Any ideas in the form of formulae or code is welcome.
Thanks!
Sak
You've set yourself quite a difficult task. Because one point can occlude another in a view, it's not generally possible even to count the number of points. If each view has two points, but those points fall on the same epipolar line on the other view, then you can count anywhere between 2 and 4 points.
Assuming you want to minimize the points, this starts to look like Minimum Vertex Cover in a dense bipartite graph, with each edge representing the association of a point from each view, and the weight of each edge taken from the registration error of associating the corresponding points (vertices) from each view. MVC is, of course, NP-hard, and if you treat the problem as a general MVC problem then you'll never do better than O(n^2) because that's how many edges there are to examine.
Your particular MVC problem might have structure that can be exploited to perform a more efficient approximation. In particular, I might suggest calculating the epipolar lines in one view, ordering them by angle from the epipole, and similarly sorting the points in that view from the epipole. You can then iterate over the two sorted lists roughly in parallel, greedily associating each point with a nearby epipolar line. Then you can do the same in the other view, but only looking at points in that view which had not yet been associated during the previous pass. I think that a more regimented and provably optimal approach might be possible with dynamic programming (particularly if you strictly bound the registration error) which wouldn't require the second pass, but I can't sketch it out offhand.
For different types of objects it's easy- to find the match using sum-of-absolute-differences. For similar objects, the idea(s) could lead to publish a good paper. Anyway here's one quick algorithm:
detect the two balls in first image (using object detection methods).
divide the image into two segments cantaining two balls.
repeat steps 1 & 2 for second image also.
the direction of segments in two images should give correspondence of the two balls.
Try this, it should work for two balls.
Our core aim is:
to use Image Processing to read/scan an architectural Floor Plan Image (exported from a CAD software)
to use Image Processing to read/scan an architectural Floor Plan Image (exported from a CAD software) extract the various lines and curves, group them into Structural Entities like walls, columns, beams etc. – ‘Wall_01’, ‘Beam_03’ and so on
extract the dimensions of each of these Entities based on the scale and the length of the lines in the Floor Plan Image (since AutoCAD lines are dimensionally accurate as per the specified Scale)
and associate each of these Structural Entities (and their dimensions) with a ‘Room’.
We have flexibility in that we can define the exact shapes of the different Structural Entities in the Floor Plan Image (rectangles for doors, rectangles with hatch lines for windows etc.) and export them into a set of images for each Structural Entity (e.g. one image for walls, one for columns, one for doors etc.).
For point ‘B’ above, our current approach based on OpenCV is as follows:
Export each Structural Entity into its own image
Use Canny and HoughLine Transform to identify lines within the image
Group these lines into individual Structural Elements (like ‘Wall_01’)
We have managed to detect/identify the line segments using Canny+HoughLine Transform with a reasonable amount of accuracy.
Original Floor Plan Image
Individual ‘Walls’ Image:
Line Segments identified using Canny+HoughLine:
(I don't have enough reputation to post images yet)
So the current question is - what is the best way to group these lines together into a logical Structural Entity like ‘Wall_01’?
Moreover, are there any specific OpenCV based techniques that can help us group the line segments into logical Entities? Are we approaching the problem correctly? Is there a better way to solve the problem?
Update:
Adding another image of valid wall input image.
You mention "exported from a CAD software". If the export format is PDF, it contains vector data for all graphic elements. You might be better off trying to extract and interpret that. Seems a bit cumbersome to go from a vector format to a pixel format which you then try to bring back to a numerical model.
If you have clearly defined constraints as to what your walls, doors, etc will look like in your image, you would use exactly those. If you are generating the CAD exports yourself, modify the settings there so as to facilitate this
For instance, the doors are all brown and are closed figures.
Same for grouping the walls. In the figures, it looks like you can group based on proximity (i.e, anything within X pixels of each other is one group). Although, the walls to the right of the text 'C7' and below it may get grouped into one.
If you do not have clear definitions, you may be looking at some generic image recognition problems, which means A.I or Machine Learning. This would require a large variety of inputs for it to learn from, and may get very complex
I want to design an algorithm that would find matches in images of the same apartment, when put up by different real estate agents.
Photos are relatively taken in similar time so the interior of the rooms should not change that much but of course every guys takes different pictures from different angles, etc.
(TLDR; a apartment goes for sale, and different real estate guys come in and make their own pictures, and I want to know if the given pictures from various guys are of the same place)
I know that image processing and recognition algorithm selections highly depend on the use case, so could you point me in correct direction given my use-case?
http://reality.bazos.sk/inzerat/56232813/Prenajom-1-izb-bytu-v-sirsom-centre.php
http://reality.bazos.sk/inzerat/56371292/-PRENAJOM-krasny-1i-byt-rekonstr-Kupeckeho-Ruzinov-BA-II.php
You can actually use Clarifai's Custom Training API endpoint, fairly simple and straightforward. All you would have to do is train the initial image and then compare the second to it. If the probability is high, it is likely the same apartment. For example:
In javascript, to declare a positive it is:
clarifai.positive('http://example.com/apartment1.jpg', 'firstapartment', callback);
And a negative is:
clarifai.negative('http://example.com/notapartment1.jpg', 'firstapartment', callback);
You don't necessarily have to do a negative, but it could only help. Then, when you are comparing images to the first aparment, you do:
clarifai.predict('http://example.com/someotherapartment.jpg', 'firstapartment', callback);
This will give you a probability regarding the likeness of the photo to what you've trained ('firstapartment'). This API is basically doing machine learning without the hassle of the actual machine. Clarifai's API also has a tagging input that is extremely accurate with some basic tags. The API is free for a certain number of calls/month. Definitely worth it to check out for this case.
As user Shaked mentioned in a comment, this is a difficult problem. Even if you knew the position and orientation of each camera in space, and also the characteristics of each camera, it wouldn't be a trivial problem to match the images.
A "bag of words" (BoW) approach may be of use here. Rather than try to identify specific objects and/or deduce the original 3D scene, you determine what "feature descriptors" can distinguish objects from one another in your image sets.
https://en.wikipedia.org/wiki/Bag-of-words_model_in_computer_vision
Imagine you could describe the two images by the relative locations of textures and colors:
horizontal-ish line segments at far left
red blob near center left
green clumpy thing at bottom left
bright round object near top left
...
then for a reasonably constrained set of images (e.g. photos just within a certain zip code), you may be able to yield a good match between the two images above.
The Wikipedia article on BoW may look a bit daunting, but I think if you hunt around you'll find an article that describes "bag of words" for image processing clearly. I've seen a very good demo of a BoW approach used to identify objects such as boats and delivery vans in arbitrary video streams, and it worked impressively well. I wish I had a copy of the presentation to pass along.
If you don't suspect the image to change much, you could try the standard first step of any standard structure-from-motion algorithm to establish a notion of similarity between a pair of images. Any pair of images are similar if they contain a number of matching image features larger than a threshold which satisfy the geometrical constraint of the scene as well. For a general scene, that geometrical constraint is given by a Fundamental Matrix F computed using a subset of matching features.
Here are the steps. I have inserted the opencv method for each step, but you could write your methods too:
Read the pair of images. Use img = cv2.imread(filename).
Use SIFT/SURF to detect image features/descriptors in both images.
sift = cv2.xfeatures2d.SIFT_create()
kp, des = sift.detectAndCompute(img,None)
Match features using the descriptors.
bf = cv2.BFMatcher(cv2.NORM_HAMMING, crossCheck=True)
matches = bf.match(des1,des2)
Use RANSAC to compute funamental matrix.
cv2.findFundamentalMatrix(pts1, pts2, cv2.FM_RANSAC, 3, 0.99, mask)
mask contains all the inliers. Simply count them to determine if the number of matches satisfying geometrical constraint is large enough.
CAUTION: In case of a planar scene, we use homography instead of a fundamental matrix and the steps described above work out pretty nicely because homography takes a point to a corresponding point in the other image. However, Fundamental matrix takes a point to the corresponding epipolar line in the other image, which makes the entire process a bit less stable. So I would recommend trying these steps a few more times with a little bit of jitter to the feature locations and collating the evidence over more than one trial to make the decision. You can also use more advanced steps to introduce robustness to this process but only if the steps described above don't yield the results you need.
Is there any way to reduce the dimension of the following features from 2D coordinate (x,y) to one dimension?
Yes. In fact, there are infinitely many ways to reduce the dimension of the features. It's by no means clear, however, how they perform in practice.
A feature reduction usually is done via a principal component analysis (PCA) which involves a singular value decomposition. It finds the directions with highest variance -- that is, those direction in which "something is going on".
In your case, a PCA might find the black line as one of the two principal components:
The projection of your data onto this one-dimensional subspace than yields the reduced form of your data.
Already with the eye one can see that on this line the three feature sets can be separated -- I coloured the three ranges accordingly. For your example, it is even possible to completely separate the data sets. A new data point then would be classified according to the range in which its projection onto the black line lies (or, more generally, the projection onto the principal component subspace) lies.
Formally, one could obtain a division with further methods that use the PCA-reduced data as input, such as for example clustering methods or a K-nearest neighbour model.
So, yes, in case of your example it could be possible to make such a strong reduction from 2D to 1D, and, at the same time, even obtain a reasonable model.