Structure from Motion (SfM) in a tunnel-like structure? - opencv

I have a very specific application in which I would like to try structure from motion to get a 3D representation. For now, all the software/code samples I have found for structure from motion are like this: "A fixed object that is photographed from all angle to create the 3D". This is not my case.
In my case, the camera is moving in the middle of a corridor and looking forward. Sometimes, the camera can look on other direction (Left, right, top, down). The camera will never go back or look back, it always move forward. Since the corridor is small, almost everything is visible (no hidden spot). The corridor can be very long sometimes.
I have tried this software and it doesn't work in my particular case (but it's fantastic with normal use). Does anybody can suggest me a library/software/tools/paper that could target my specific needs? Or did you ever needed to implement something like that? Any help is welcome!
Thanks!

What kind of corridors are you talking about and what kind of precision are you aiming for?
A priori, I don't see why your corridor would not be a fixed object photographed from different angles. The quality of your reconstruction might suffer if you only look forward and you can't get many different views of the scene, but standard methods should still work. Are you sure that the programs you used aren't failing because of your picture quality, arrangement or other reasons?
If you have to do the reconstruction yourself, I would start by
1) Calibrating your camera
2) Undistorting your images
3) Matching feature points in subsequent image pairs
4) Extracting a 3D point cloud for each image pair
You can then orient the point clouds with respect to one another, for example via ICP between two subsequent clouds. More sophisticated methods might not yield much difference if you don't have any closed loops in your dataset (as your camera is only moving forward).
OpenCV and the Point Cloud Library should be everything you need for these steps. Visualization might be more of a hassle, but the pretty pictures are what you pay for in commercial software after all.
Edit (2017/8): I haven't worked on this in the meantime, but I feel like this answer is missing some pieces. If I had to answer it today, I would definitely suggest looking into the keyword monocular SLAM, which has recently seen a lot of activity, not least because of drones with cameras. Notably, LSD-SLAM is open source and may not be as vulnerable to feature-deprived views, as it operates directly on the intensity. There even seem to be approaches combining inertial/odometry sensors with the image matching algorithms.
Good luck!

FvD is right in the sense that your corridor is a static object. Your scenario is the same and moving around and object and taking images from multiple views. Your views are just not arranged to provide a 360 degree view of the object.
I see you mentioned in your previous comment that the data is coming from a video? In that case, the problem could very well be the camera calibration. A camera calibration tells the SfM algorithm about the internal parameters of the camera (focal length, principal point, lens distortion etc.) In the absence of knowledge about these, the bundler in VSfM uses information from the EXIF data of the image. However, I don't think video stores any EXIF information (not a 100% sure). As a result, I think the entire algorithm is running with bad focal length information and cannot solve for the orientation.
Can you extract a few frames from the video and see if there is any EXIF information?

Related

Image registration algorithms for images with varying distances

I have two cameras side by side. I'd like to register two images taken from each camera. I will assume there won't be any rotational differences between cameras, that is, there will be only translation factor for the images.
I think global transformation will not work for this issue since changes of distances between two images for the closer objects are significantly higher. What should I do in this case? I tried to read some papers but I am not sure which one is the perfect match for me. Do you have any suggestions such as "read this paper", "apply this algorithm", "know this and that" etc.
The project that I'm working on is real-time and image registration will be implemented in the GPU.
First of all, do not assume such constraints like just translational differences. You will have inaccuracies and you will not spare any effort.
Second, a global transformation (I assume you mean a linear transformation i.e. a homography applied to your images) will only work if you have pictures of completely planar surfaces, so you're right, it won't work. You will need a non-rigid image registration. Further, I hope the distance between these two cameras is not too far. Due to parallax, you may have artifacts.
I would recommend to google for terms like "non-rigit image registration for hdr", "stack-based hdr" or "image registration for hdr". You'll find a lot.
However I found this nice overview paper. I think it is a good start.

PARABOLIC (not panoramic) video stitching?

I want to do something like this but in reverse-- so that the cameras are outside and pointing inward. Let's start with the abstract and get specific:
1) Are there any TOOLS that will do this for me? How close can I get using existing software?
2) Say the nearest tool is a graphics library like OpenCV. I've taken linear algebra and have an undergraduate degree in CS but without any special training in graphics. Where should I go from there?
3) If I really am undergoing a decade-long spiritual quest of a self-teaching+programming exercise to make this happen, are there any papers or other resources that you aware of that might aid me?
I think the demo you linked uses a 360° camera (see the black circle on the bottom) and does not involve stitching in any way.
About your question, are you aware of this work? They don't do stitching either, just blending between different views.
If you use inward views, then the objects you will observe will probably be quite close to the cameras, while standard stitching assumes that objects are far away. Close 3D objects mean high distortion when you change the viewpoint (i.e. parallax & occlusions), which makes it difficult to interpolate between two views. Hence, if you want stitching, then your main problem is to correctly handle parallax effects & occlusions between the views.
In my opinion, the most promising approach would be to do live stereo matching (i.e. dense 3D reconstruction) between the two camera images closest to your current viewpoint, and then interpolate the estimated disparities to generate an expected image. However, it's not likely to run in real-time, as demonstrated in the demo you linked, and the result could be quite ugly...
EDIT
You can also have a look at this paper, which uses a different but interesting approach, however maybe not directly useful in your case since it requires the new viewpoint to be visible in the available images.

Position Estimation From Multiple Images

First off, I'd like to state that I'm very new to this field and apologize if the question is a little too repetitive. I've looked around but in vain. I'm working on reading Hartley and Zisserman's book but it's taking me a while.
My problem is That I've got 3 Video Sources of an area and I need to find the camera position at each frame of the video. I do not have any information about the cameras that took the videos (i.e no Intrinsics).
Looking for a solution I came across SfM and tried existing software that exists namely Bundler & Vsfm and they both seem to have worked quite well. However I've got a couple of questions about it.
1) Is SfM really required in my case? Since SfM does a sparse reconstruction and the common points between images are also an output, is it fully necessary? or are there more suitable methods that can do it without since positions are all I really need? Or are there less complex methods I may use instead?
2) From what I've read, I need to calibrate the camera and find it's Intrinsics and Extrinsics. How can I do this without knowing either? I've looked at the 5-pt problem and others but most of them require you to know the intrinsic properties of the camera which I don't have and I cannot use a pattern such as a chessboard to calibrate them since they come from a source outside my control.
Thanks for your time!
Based on my experience, the short answer is:
1) You cannot reliably estimate the 3D pose of the cameras independently from the 3D of the scene. Moreover, since your cameras are moving independently, I think SfM is the right way to approach your problem.
2) You need to estimate the cameras' intrinsics in order to estimate useful (i.e. Euclidian) poses and scene reconstruction. If you cannot use the standard calibration procedure, with chessboard and co, you can have a look at the autocalibration techniques (see also chapter 19 in Hartley's & Zisserman's book). This calibration procedure is done independently for each camera and only require several image samples at different positions, which seems appropriate in your case.
You can actually accomplish your task in a massive bundle adjacent procedure up to a scaling parameter. But is is a very complicated thing even if you aren't novice. You dont need 3d reconstruction, just an essential matrix that can be obtained from 2d projections and decomposed i to rotation and translation but this does require Iintrinsic Paramus. To get them you have to have at least three frames.
Finally, Drop Zimmerman book it will drive you crazy. Read Simon Princes "Computer Vision"instead.

Using the coca cola logo as the calibration pattern for camera

I am looking for camera calibration techniques with OpenCV and saw the chessboard and circles methods, but I wanted to calibrate the camera with something that is in the real world and you don't have to print (printers are also not very accurate in what they print).
Is it possible to do calibration with complex shapes like the Coca Cola logo on the cans? Is it a problem that the surface is curved?
Thanks
Depending on what you want to achieve this is not at all necessarily a bad idea, and you are not the first one who had it. There was a technology that uses a CD, which is a strongly standardised object which at least used to exist on most households, for a simple camera calibration task. (There is little technical to be found online about this, as the technology was proprietary. This is business document, where the use of the CD is mentioned. Algorithmically, however, it is not difficult if you know camera calibration.)
The question is whether the precision you get is sufficient for your application. Don't expect any miracles here. Generally you can use almost any object you like to learn something about a camera, as long as you can detect it reliably and you know its geometry. Almost certainly you will have to take several pictures of the object. Curved surfaces are no problem per see. I regularly used a cylinder (larger than a beverage can, though, with a simple to detect pattern) to calibrate a complete camera rig of 12 SLRs.
Don't expect to find out of the box solutions and don't expect implementation to be trivial. You will have to work your way through the math. I recommend the book by Hartley and Zisserman, Multiple View Geometry for Computer vision. This paper describes an analysis-by-synthesis approach to calibration, which is the way to go for here (it does not describe exactly what you want, but the approach should generalise to arbitrary objects as long as you can detect them).
i can understand your wish, but it's a bad idea.
the calibration algorithm works by comparing real world points from the cam with a synthetical model ( yes, you have to supply that , too! ). so, while it's easy to calculate a 2d chessboard grid on the fly and use that, it will be very hard to do for your tin can, or any arbitrary household item you grab.
just give in, and print a rectangular chessbord grid to a piece of paper
(opencv comes with a pdf for that already).
don't use a real-life chessboard, a quadratic one is ambiguous to 90° rotation.
interesting idea.
What about displaying a checkerboard pattern (or sth else) on an lcd screen display and use that display as calibration pattern?? You would have to know the displaying size of the pattern though.
Googling I found this paper:
CAMERA CALIBRATION BASED ON LIQUID CRYSTAL DISPLAY (LCD)
ZHAN Zongqian
http://www.isprs.org/proceedings/XXXVII/congress/3b_pdf/04.pdf
comment: this doesn't answer the question about the coca-cola can but gives and idea for a solution to the grounding problem: camera calibration with a common object.

Map image model to data

I'm working on a software that checks if some laser-cut parts were cut correctly, using AutoCAD data as reference. I have parsed the dxf-files, converted them to a bmp (and to an xml File that gives me all the information), and now I want to compare this to the real, acquired data.
I have applied enough preprocessing to get a reasonably thresholded, binary picture. This is, however, distorted (unfortunately, telecentric lenses are expensive and the user places the object into a device, causing some translation, some scalation and a tiny amount of rotation, as in 1-2degs).
I have considered Hough transform, but memory is an issue. I have played around with bounding box transformation, but the unknown shape makes this hard. I've read about TILT (no symmetry) and registration algorithms, but I'd like to get another opinion.
I'm looking for some papers, some ideas, some pointers on how to go on.
Thanks.
First step is to undistort the image ( see camera calibration - ignore the 3d part).
Then think about the shape matching. Depending on how small the error you are trying to find, this could be very easy or very very difficult, but those links should get you started
You may want to look at features that can discriminate the two. Are there simple features that can accurately distinguish a properly cut piece vs. an incorrectly cut piece? If so, you can use the same idea as the Hough transform/template matching, but reducing the template to certain distinguishing features (edges, corners, etc.) to reduce the memory required.
You may want to look at the SIFT/SURF features that aim to match images by a certain set of features while being invariant to the rotation and scale of the objects within the image. There are libraries out there that implement these features (shown on the SURF page).
This however, wont help with the distortion. If you're using the same camera for all images, then you should be able to de-skew them accordingly.

Resources