I need to stitch images without overlaps.
The task will be more clear from the example:
Source:
Target:
Basicly I need a method that determines how well two images are joined to each other.
UPDATE
Using of random forest from OpenCv library allows to reach 80% of successful responses. Trained forest shows how well the two parts of puzzle fit each other.
Assuming you don't want the software to have a 5year old's encyclopdic knowledge of Disney characters - then your match is based on the point at which lines meet?
Just store a list of coords that a line hits the edge of a square and then compare each pair of squares minimising the difference in hit positions.
ps . Assuming the squares don't rotate just store a list of distance along each side for each side of the square.
Want to improve this post? Provide detailed answers to this question, including citations and an explanation of why your answer is correct. Answers without enough detail may be edited or deleted.
You may consider dilate edges in each fragment, which could probably make up the losing edges in the read lines. Then, stitching fragments from this point.
Related
I am trying to train a learning model to recognize which part (left/central/right) of a known object is represented in an image, assuming that the model's input can be one of the following: an image of the left part of the object (whole left part or some smaller portion of the left part); an image of its central part (whole central part or some smaller portion of the central part); an image of its right part (whole right part or some smaller portion of the right part). The position of the object is always fixed so all images are taken in front of the object and this will be also the case in which the model will be asked to make a prediction.
I've collected a few thousands images belonging to the left, central and right part of the object; for each part, as explained, some images represent the whole part while others represent a smaller portion of that part,
anyway i'm just interested to predict which part of the object (left/central/right) the image belongs to, so i've tackled this as a classification task over 3 labels but I'm wondering if the best choice for this task is to use a convolutional neural network or some other approach.
Any suggestion is very appreciated.
Thanks
Since you are trying to distinguish between three independent states, a classification approach is a sensible place to start. You could start with this tutorial and then, as you suggested, add some convolutional layers.
There are alternative approaches: A classification task normally means that each of the wrong answers is equally wrong. If the correct answer is 'left' and your algorithm gives the answer 'right', is this worse than giving the answer 'middle'? If so you might consider this as a regression problem.
Lastly, you may prefer to use the AI stack exchange forum for conceptual questions, as stackoverflow is normally reserved for specific coding questions.
This might be a very broad question so I'm sorry in advance. I'd like to also point out I'm new in the CV field, so my insight in this field is minimum.
I am trying to find correspondences between points from a FLIR image and a VIS image. I'm currently building 40x40 pixels regions around keypoints, over which I'm applying the LoG. I'm trying to compare them to find the most similar regions.
For example, I have these data sets:
Where the columns represent, in this order:
the image for which I'm trying to find a correspondent
the candidate images
the LoG of the first column
the LoG of the second column
It is very clear, for the human eye, that the third image is the best match for the first set, while the first image is the best image for the second set.
I have tried various ways of expressing a similarity/disimilarity between these images, such as SSD, Cross Correlation, or Mutual Information, but they all fail to be consistent (they only work in some cases).
Now, my actual question is:
What should I use to express the similarity between images in a more semantic way, such that shapes would be more important in deciding the best match, rather than actual intensities of the pixels? Do you know of any technique that would aid me in my quest of finding these matches?
Thank you!
Note: I'm using OpenCV with Python right now, but the programming language and library is not important.
I want to design an algorithm that would find matches in images of the same apartment, when put up by different real estate agents.
Photos are relatively taken in similar time so the interior of the rooms should not change that much but of course every guys takes different pictures from different angles, etc.
(TLDR; a apartment goes for sale, and different real estate guys come in and make their own pictures, and I want to know if the given pictures from various guys are of the same place)
I know that image processing and recognition algorithm selections highly depend on the use case, so could you point me in correct direction given my use-case?
http://reality.bazos.sk/inzerat/56232813/Prenajom-1-izb-bytu-v-sirsom-centre.php
http://reality.bazos.sk/inzerat/56371292/-PRENAJOM-krasny-1i-byt-rekonstr-Kupeckeho-Ruzinov-BA-II.php
You can actually use Clarifai's Custom Training API endpoint, fairly simple and straightforward. All you would have to do is train the initial image and then compare the second to it. If the probability is high, it is likely the same apartment. For example:
In javascript, to declare a positive it is:
clarifai.positive('http://example.com/apartment1.jpg', 'firstapartment', callback);
And a negative is:
clarifai.negative('http://example.com/notapartment1.jpg', 'firstapartment', callback);
You don't necessarily have to do a negative, but it could only help. Then, when you are comparing images to the first aparment, you do:
clarifai.predict('http://example.com/someotherapartment.jpg', 'firstapartment', callback);
This will give you a probability regarding the likeness of the photo to what you've trained ('firstapartment'). This API is basically doing machine learning without the hassle of the actual machine. Clarifai's API also has a tagging input that is extremely accurate with some basic tags. The API is free for a certain number of calls/month. Definitely worth it to check out for this case.
As user Shaked mentioned in a comment, this is a difficult problem. Even if you knew the position and orientation of each camera in space, and also the characteristics of each camera, it wouldn't be a trivial problem to match the images.
A "bag of words" (BoW) approach may be of use here. Rather than try to identify specific objects and/or deduce the original 3D scene, you determine what "feature descriptors" can distinguish objects from one another in your image sets.
https://en.wikipedia.org/wiki/Bag-of-words_model_in_computer_vision
Imagine you could describe the two images by the relative locations of textures and colors:
horizontal-ish line segments at far left
red blob near center left
green clumpy thing at bottom left
bright round object near top left
...
then for a reasonably constrained set of images (e.g. photos just within a certain zip code), you may be able to yield a good match between the two images above.
The Wikipedia article on BoW may look a bit daunting, but I think if you hunt around you'll find an article that describes "bag of words" for image processing clearly. I've seen a very good demo of a BoW approach used to identify objects such as boats and delivery vans in arbitrary video streams, and it worked impressively well. I wish I had a copy of the presentation to pass along.
If you don't suspect the image to change much, you could try the standard first step of any standard structure-from-motion algorithm to establish a notion of similarity between a pair of images. Any pair of images are similar if they contain a number of matching image features larger than a threshold which satisfy the geometrical constraint of the scene as well. For a general scene, that geometrical constraint is given by a Fundamental Matrix F computed using a subset of matching features.
Here are the steps. I have inserted the opencv method for each step, but you could write your methods too:
Read the pair of images. Use img = cv2.imread(filename).
Use SIFT/SURF to detect image features/descriptors in both images.
sift = cv2.xfeatures2d.SIFT_create()
kp, des = sift.detectAndCompute(img,None)
Match features using the descriptors.
bf = cv2.BFMatcher(cv2.NORM_HAMMING, crossCheck=True)
matches = bf.match(des1,des2)
Use RANSAC to compute funamental matrix.
cv2.findFundamentalMatrix(pts1, pts2, cv2.FM_RANSAC, 3, 0.99, mask)
mask contains all the inliers. Simply count them to determine if the number of matches satisfying geometrical constraint is large enough.
CAUTION: In case of a planar scene, we use homography instead of a fundamental matrix and the steps described above work out pretty nicely because homography takes a point to a corresponding point in the other image. However, Fundamental matrix takes a point to the corresponding epipolar line in the other image, which makes the entire process a bit less stable. So I would recommend trying these steps a few more times with a little bit of jitter to the feature locations and collating the evidence over more than one trial to make the decision. You can also use more advanced steps to introduce robustness to this process but only if the steps described above don't yield the results you need.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I am doing a project which is hole detection in road. I am using a laser to emit beam on the road and using a camera to take a image of the road. the image may be like this
Now i want to process this image and give a result that is it straight or not. if it curve then how big the curve is.
I dont understand how to do this. i have search a lot but cant find a appropriate result .Can any one help me for that?
This is rather complicated and your question is very broad, but lets have a try:
Perhaps you have to identify the dots in the pixel image. There are several options to do this, but I'd smoothen the image by a blur filter and then find the most red pixels (which are believed to be the centers of the dots). Store these coordinates in a vector array (array of x times y).
I'd use a spline interpolation between the dots. This way one can simply get the local derivation of a curve touching each point.
If the maximum of the first derivation is small, the dots are in a line. If you believe, the dots belong to a single curve, the second derivation is your curvature.
For 1. you may also rely on some libraries specialized in image processing (this is the image processing part of your challenge). One such a library is opencv.
For 2. I'd use some math toolkit, either octave or a math library for a native language.
There are several different ways of measuring the straightness of a line. Since your question is rather vague, it's impossible to say what will work best for you.
But here's my suggestion:
Use linear regression to calculate the best-fit straight line through your points, then calculate the mean-squared distance of each point from this line (straighter lines will give smaller results).
You may need to read this paper, it is so interesting one to solve your problem
As #urzeit suggested, you should first find the points as accurately as possible. There's really no way to give good advice on that without seeing real pictures, except maybe: try to make the task as easy as possible for yourself. For example, if you can set the camera to a very short shutter time (microseconds, if possible) and concentrate the laser energy in the same time, the "background" will contribute less energy to the image brightness, and the laser spots will simply be bright spots on a dark background.
Measuring the linearity should be straightforward, though: "Linearity" is just a different word for "linear correlation". So you can simply calculate the correlation between X and Y values. As the pictures on linked wikipedia page show, correlation=1 means all points are on a line.
If you want the actual line, you can simply use Total Least Squares.
What method is suitable to capture (detect) MRZ from a photo of a document? I'm thinking about cascade classifier (e.g. Viola-Jones), but it seems a bit weird to use it for this problem.
If you know that you will look for text in a passport, why not try to find passport model points on it first. Match template of a passport to it by using ASM/AAM (Active shape model, Active Appearance Model) techniques. Once you have passport position information you can cut out the regions that you are interested in. This will take some time to implement though.
Consider this approach as a great starting point:
Black top-hat followed by a horisontal derivative highlights long rows of characters.
Morphological closing operation(s) merge the nearby characters and character rows together into a single large blob.
Optional erosion operation(s) remove the small blobs.
Otsu thresholding followed by contour detection and filtering away the contours which are apparently too small, too round, or located in the wrong place will get you a small number of possible locations for the MRZ
Finally, compute bounding boxes for the locations you found and see whether you can OCR them successfully.
It may not be the most efficient way to solve the problem, but it is surprisingly robust.
A better approach would be the use of projection profile methods. A projection profile method is based on the following idea:
Create an array A with an entry for every row in your b/w input document. Now set A[i] to the number of black pixels in the i-th row of your original image.
(You can also create a vertical projection profile by considering columns in the original image instead of rows.)
Now the array A is the projected row/column histogram of your document and the problem of detecting MRZs can be approached by examining the valleys in the A histogram.
This problem, however, is not completely solved, so there are many variations and improvements. Here's some additional documentation:
Projection profiles in Google Scholar: http://scholar.google.com/scholar?q=projection+profile+method
Tesseract-ocr, a great open source OCR library: https://code.google.com/p/tesseract-ocr/
Viola & Jones' Haar-like features generate many (many (many)) features to try to describe an object and are a bit more robust to scale and the like. Their approach was a unique approach to a difficult problem.
Here, however, you have plenty of constraint on the problem and anything like that seems a bit overkill. Rather than 'optimizing early', I'd say evaluate the standard OCR tools off the shelf and see where they get you. I believe you'll be pleasantly surprised.
PS:
You'll want to preprocess the image to isolate the characters on a white background. This can be done quite easily and will help the OCR algorithms significantly.
You might want to consider using stroke width transform.
You can follow these tips to implement it.