Pipeline for generating a 3D Pointcloud out of feature matches - opencv

I am currently stuck at my project of reconstructing an object from different images of the same object.
So far I have calculated the feature matches of each image using AKAZE features.
Now I need to derive the camera parameters and the 3D Point coordinates.
However, I am a little bit confused. In my opinion I need the camera parameters to determine the 3D Points and vice versa.
My question is how can I get the 3D points and the Camera parameters in one step?
I have also looked into the bundle adjustment approach provided by http://scipy-cookbook.readthedocs.io/items/bundle_adjustment.html but there you need an initial guess for camera parameters and 3D coordinates.
Can somebody refer to a pseudocode? or has a pipeline for me?
Thanks in advance

As you seem to be new at this, I strongly recommend you first play with an interactive tool to gain an intuition for the issues involved in solving for camera motion and structure. Try Blender: it's free, and you can find plenty of video tutorials on YouTube on how to use it for matchmoving: examples 1, 2

Take a look at VisualSFM (http://ccwu.me/vsfm/). It is an interactive tool for such tasks. It will give you an idea which algorithms to use.
The Computer Vison book of Richard Szeliski (http://szeliski.org/Book/) will give the theoretical background.

Related

Add 2D or 3D Face Filters like MSQRD/SnapChat Using Google Vision API for iOS

Here's some research I have done so far:
- I have used Google Vision API to detect various face landmarks.
Here's the reference: https://developers.google.com/vision/introduction
Here's the link to Sample Code to get the facial landmarks. It uses the same Google Vision API. Here's the reference link: https://github.com/googlesamples/ios-vision
I have gone through the various blogs on internet which says MSQRD based on the Google's cloud vision. Here's the link to it: https://medium.com/#AlexioCassani/how-to-create-a-msqrd-like-app-with-google-cloud-vision-802b578b30a0
For Android here's the reference:
https://www.raywenderlich.com/158580/augmented-reality-android-googles-face-api
There are multiple paid SDK's which full fills the purpose. But they are highly priced. So cant able to afford it.
For instance:
1) https://deepar.ai/contact/
2) https://www.luxand.com/
There is possibility might have some see this question as duplicate of this:
Face filter implementation like MSQRD/SnapChat
But the thread is almost 1.6 years old with no right answers to it.
I have gone through this article:
https://dzone.com/articles/mimic-snapchat-filters-programmatically-1
It describes all the essential steps to achieve the desired results. But they advice to use their own made SDK.
As per my research no good enough material is around which helps to full fill the desired results like MSQRD face filters.
One more Github repository around which has same implementation but it doesn't gives much information about same.
https://github.com/rootkit/LiveFaceMask
Now my question is:
If we have the facial landmarks using Google Vision API (or even using
DiLib), how I can add 2d or 3d models over it. In which format this
needs to be done like this require some X,Y coordinates with vertices
calculation.
NOTE: I have gone through the Googles "GooglyEyesDemo" which adds the
preview layer over eyes. It basically adds a view over the face. So I
dont want to add UIView one dimensional preview layers over it. Image
attached for reference :
https://developers.google.com/vision/ios/face-tracker-tutorial
Creating Models: I also want to know how to create models for live
filters like MSQRD. I welcome any software or format recommendations.
Hope the research I have done will help others and someone else
experience helps me to achieve the desired results. Let me know if any
more details are required.**
Image attached for more reference:
Thanks
Harry
Canvas class is used in android for drawing such 3D / 2D models or core graphics for IOS can be used.
What you can do is detect the face components, take their location points and draw images on top of them. Consider going through this
You need to either predict x,y,z coordinates(check out this demo), either use x,y predictions but then find parameters of universal 3d-model & camera that will give the closest projection of current x,y.

3D Object tracking detection using Kinect

I am working on identifying an object by using Kinect sensor so to get x,y,z coordinates of the object.
I am trying to find the related information for this but could not able to find much. I have seen the videos as well but nobody is sharing the information or any sample code?
This is what I want to achieve https://www.youtube.com/watch?v=nw3yix3XomY
Proabably, few people may asked same question but as I am new to the Kinect and these libraries due to which I need little more guidance.
I read somewhere that object detection is not possible using Kinect v1. We need to use 3rd party libraries like open CV or point-clouds (pcl).
Can somebody help me that even by using third party libraries how exactly can I identify object via a Kinect sensor?
It will be really helpful.
Thank you.
As the author of the video you linked stated in the comment, following this PCL tutorial will help you. As you found out already, realizing this may not be possible using the standalone SDK. Relying on PCL will help you not reinvent the wheel.
The idea there is to:
Downsample the cloud to have less data to deal with in the next steps (this also reduces noise a bit).
Identify keypoints/features (i.e. points, areas, textures that remain somehow invariant to some transformations).
Compute the keypoint descriptors, mathematical representations of these features.
For each scene keypoint descriptor, find nearest neighbor into the model keypoints descriptor cloud and add it to the correspondences vector.
Perform clustering on the keypoints and detect the model in the scene.
The software in the tutorial needs the user to manually feed in the model and scene files. It doesn't do that on live feed, as the video you linked.
The process should be pretty similar though. I'm not sure how cpu-intensive the detection is, so it might require additional performance tweaking.
Once you have frame-by-frame detection in place, you could start thinking about actually tracking an object across the frames. But that's another topic.

Generic web camera calibration

I am building a website that does cool things using computer vision techniques, with videos live recorded and uploaded by users using their webcam. For this, I need camera intrinsic and distortion parameters. I am trying to figure out what would be the best way to compute these given the user uploaded videos. We can make no assumptions about what videos user might upload - but a reasonable assumption is that a human might be present in the video. I am still in the initial stages of this, but I am interested in knowing how others have solved this problem.
To be specific, below are the questions that I would appreciate someone experienced in the group might comment upon:
What algorithms, libraries and techniques are available to extract intrinsic and distortion parameters of any generic webcam available in the market? [I say "extract" and not "calibrate" to include cases where intrinsic parameters are just a method call away with no calibration necessary].
In general, how much variance have you observed in the intrinsic and distortion parameters in the webcams available in the market? Did you approximate them with a single intrinsic and distortion parameters or what approach did you follow?
What camera self-calibration methods, if any, could be employed in these scenarios? Are there any opensource or commercial libraries available which might be of some help?
If we aim to calibrate the webcams using the videos user record and upload, what assumptions in the parameters [like fx==fy or no distortion params] makes sense and sounds reasonable to you?
Would a reasonable approximation of intrinsic and distortion params for all the cameras make sense? What would be a reasonable approach to validate how good particular intrinsic and distortion parameters are for a specific webcam?
Are there any other issues that need to be considered?
Sometimes I am the one who comes with the bad news :) So do I now.
For almost all your points there the clear answer is No, None, Not, and so on. Only for the last point, with the other issues, the answer is not a no, but a long list :).
Actually, camera calibration without a chessboard and some specific constraints is almost impossible.
The closest implementation to a no-assumptions calibration is found in the stitching module in OpenCV. Hovewer, it is not perfect, and it's not working on random videos. Give it a try.
There is the famous Camera Calibration Toolbox, a good Matlab implementation of extracting intrinsic and extrinsic parameters.
There is a variance not only amongst webcams, but also of:
Different modules
Different zoom levels (Affects the optics)
I think that this is a really hard problem, if you restrict yourself to making no assumptions regarding the video. Both the calibration and the evaluation is hard if you don't use something known - such as checker board in Camera Calibration Toolbox.
Many algorithms, including the currently used in opencv requires that known points can be detected (e.g corners in a chess board). You would have to require that your users took pictures of this known patterns, which ruin the concept of random videos. I dont have a solution to this but you might want to consider requiring users to record videos of structures scenes(no specific patterns or objects) and use the algorithm described in:
"Camera calibration with lens distortion from low-rank textures"
http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=5995548&tag=1
Haven't tried it myself though.

Using Augmented Reality libraries for Academic Project

I'm planning on doing my Final Year Project of my degree on Augmented Reality. It will be using markers and there will also be interaction between virtual objects. (sort of a simulation).
Do you recommend using libraries like ARToolkit, NyARToolkit, osgART for such project since they come with all the functions for tracking, detection, calibration etc? Will there be much work left from the programmers point of view?
What do you think if I use OpenCV and do the marker detection, recognition, calibration and other steps from scratch? Will that be too hard to handle?
I don't know how familiar you are with image or video processing, but writing a tracker from scratch will be very time-consuming if want it to return reliable results. The effort also depends on which kind of markers you plan to use. Artoolkit e.g. compares the marker's content detected from the video stream to images you earlier defined as markers. Hence it tries to match images and returns a value of probability that a certain part of the video stream is a predefined marker. Depending on the threshold you are going to use and the lighting situation, markers are not always recognized correctly. Then there are other markers like datamatrix, qrcode, framemarkers (used by QCAR) that encode an id optically. So there is no image matching required, all necessary data can be retrieved from the video stream. Then there are more complex approaches like natural feature tracking, where you can use predefined images, given that they offer enough contrast and points of interest so they can be recognized later by the tracker.
So if you are more interested in the actual application or interaction than in understanding how trackers work, you should base your work on an existing library.
I suggest you to use OpenCV, you will find high quality algorithms and it is fast. They are continuously developing new methods so soon it will be possible to run it real-time in mobiles.
You can start with this tutorial here.
Mastering OpenCV with Practical Computer Vision Projects
I did the exact same thing and found Chapter 2 of this book immensely helpful. They provide source code for the marker tracking project and I've written a framemarker generator tool. There is still quite a lot to figure out in terms of OpenGL, camera calibration, projection matrices, markers and extending it, but it is a great foundation for the marker tracking portion.

GUI version of OpenCV for feature-detection (SIFT etc.) prototyping before actual project development?

I had an idea for which I need to be able to recognize certain objects or models from a rendered three dimensional digital movie.
After limited research, I know now that what I need is called feature detection in the field of Computer Vision.
So, what I want to do is:
create a few screenshots of a certain character in the movie (eg. front/back/leftSide/rightSide)
play the movie
while playing the movie, continuously create new screenshots of the movie
for each screenshot, perform feature detection (SIFT?, with openCV?) to see if any of our character appearances are there (they must still be recognized if the character is further away and thus appears smaller, or if the character is eg. lying down).
give a notice whenever the character is found
This would be possible with OpenCV, right?
The "issue" is that I would have to learn c++ or python to develop this application. This is not a problem if my movie and screenshots are applicable for what I want to do.
So, I would like to first test my screenshots of the movie. Is there a GUI version of OpenCV that I can input my test data and then execute it's feature detection algorithms manually as a means of prototyping?
Any feedback is appreciated. Thanks.
There is no GUI of OpenCV able to do what you want. You will be able to use OpenCV for some aspects of your problem, but there is no ready-made solution waiting there for you.
While it's definitely possible to solve your problem, the learning curve for this problem is quite long. If you're a professional, then an alternative to learning about it yourself would be to hire an expert to do it for you. It would cost money, but save you time.
EDIT
As far as template matching goes, you wouldn't normally use it to solve such a problem because the thing you're looking for is changing appearance and shape. There aren't really any "dynamic parameters to set". The closest thing you could try is have a massive template collection that would try to cover the expected forms that your target may take. But it would hardly be an elegant solution. Plus it wouldn't scale.
Next, to your point about face recognition. This is kind of related, but most facial recognition applications deal with a controlled environment: lighting, distance, pose, angle, etc. Outside of that controlled environment face detection effectiveness drops significantly. If you're detecting objects in a movie, then your environment isn't really controlled.
You may want to first try a simpler problem of accurately detecting where the characters are, without determining who they are (video surveillance, essentially). While it may sound simple, you'll find that it's actually non-trivial for arbitrary scenes. The result of solving that problem may be useful in identifying the characters.
There is Find-Object by Mathieu Labbé. It was very helpful for me to start getting an understanding of the descriptors since you can change them while your video is running to see what happens.
This is probably too late, but might help someone else looking for a solution.
Well, using OpenCV you would of taking a frame of a video file and do any computations on it.
You can do several different methods of detecting a character on that image, but it's not so easy to have it as flexible so you can even get that person if it's lying on the floor for example, if you only entered reference images of that character standing.
Basically you could try extracting all important features from your set of reference pictures and have a (in your case supervised) learning algorithm that gets a good feature-vector of that character for classification.
You then need to write your code that plays the video and which takes a video frame let's say each 500ms (or other as you desire), gets a segmentation of the object you thing would be that character and compare it with the reference values you get from your learning algorithm. If there's a match, your code can yell "Yehaaawww!" or do other things...
But all this depends on how flexible you want this to be. You could also try a template match or cross-correlation which basically shifts the reference image(s) over the frame and checks how equal both parts are. But this unfortunately is very sensitive for rotation, deformations or other noise... so you wouldn't get that person if its i.e. laying down. And I doubt you can get all those calculations done in realtime...
Basically: Yes OpenCV is good to use for your image processing/computer vision tasks. But it offers a lot of methods and ways and you'd need to find a way that works for your images... it's not a trivial task though...
Hope that helps...
Have you tried looking at some of the work of the Oxford visual geometry group?
Their Video Google system describes to a large extent what you want, instance detection.
Their work into Naming People in TV shows is also pretty relevant. A face detection and facial feature pipeline is included that can be run from Matlab. Are you familiar with Matlab?
Have you tried computer vision frameworks like Cassandra? There you can exactly do that just by some mouse clicks.

Resources