Augmented Reality with large and complex markers - opencv

does anyone have any experience with using large and complex images as markers (e.g. magazine layout, photo, text-layout) for a.r.?
i am not sure which way to go:
flash, papervision and flar would be nice for distribution but i suspect them to be too bad in terms of performance for a more complex marker than the usual 9x9 or 12x12 blocks. i had difficulties achieving both a good 3d performance and a smooth and solid detection.
i can also do java or objective-c with opengl/opencv and this is definitely also an option for this project.
i just would like to know before if anyone has had experiences in this field and could give me a few hints or warnings. i know it has been done already so there is a way to do it smoothly.
thanks,
anton

It sounds like you might want to start investigating natural feature tracking libraries. In general the tracking is smoother and more robust than with markers, and any feature-full natural image can be used as the marker. The downside is, I'm not aware of any non-proprietary solutions.
Metaio Unifeye works in a web-browser via flash if I recall correctly, something like that might be what you're looking for.

You should look at MOPED.
MOPED is a real-time Object Recognition and Pose Estimation system. It recognizes objects from point-based features (e.g. SIFT, SURF) and their geometric relationships extracted from rigid 3D models of objects.
See this video for a demonstration.

Related

What is NeRF(Neural Radiance Fields) used for?

Recently I am studying the research NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis(https://www.matthewtancik.com/nerf), and I am wondering: What is it used for? Will there be any application of NeRF?
The result of this technique is very impressive, but what is it used for? I keep thinking of this question over and over again. It is very realistic, the quality is perfect, but we don't want to see the camera swinging around all the time, right?
Personally, this technique has some limitations:
Cannot generate views that never seen in input images. This technique interpolates between two views.
Long training and rendering time: According to the authors, it takes 12 hours to train a scene, and 30s to render one frame.
The view is static and not interactable.
I don't know if it is appropriate to compare NeRF with Panorama and 360° image/video, essentially they are different, only NeRF uses deep learning to generate new views, the others basically are just using smart phone/camera to capture scenes plus some computer vision techniques. Still, the long training time makes NeRF less competitive in this application area. Am I correct?
Another utility I can think of is product rendering, however, NeRF doesn't show advantages compare to using 3D software to render. Like commercial advertisement, usually it requires animation and special effects, then definitely 3D software can do better.
The potential use of NeRF might be 3D reconstruction, but that would be out of the scope, although it is able to do that. Why do we need to use NeRF for 3D reconstruction? Why not use other reconstruction techniques? The unique feature of NeRF is the ability of creating photo-realistic views, if we use NeRF for 3D reconstruction, then this feature becomes pointless.
Does anyone have new ideas? I would like to know.
Why do we need to use NeRF for 3D reconstruction?
The alternative would be multi-view stereo, which produces point clouds of finite resolution and is susceptible to illumination changes. If you then render such point cloud without non-trivial post-processing, it will not look photorealistic.
I don't know if it is appropriate to compare NeRF with Panorama and 360° image/video,
Well, if you deal with exactly flat scene with simple lighting (i.e. ambient light and Lambertian objects), then you can use panorama techniques for new view synthesis. In general though, it won’t produce the result you expect. You have to know the depth to interpolate correctly.
When it comes to practical limitations (slow; does not model deformations), NeRF should be considered a milestone that provided a proof of concept that representing surface as a level set of MLP-modelled function can result in sharp rendering. There is already good progress in addressing those limitations, and multiple works apply this idea for practical tasks.

Blur people from 1000+ images

I need to anonymize people (and maybe later license plates) from thousand images automatic.
I search through the internet to make a solution on my own with openCV/emguCV, but so far the detection rate is rather bad.
Then I came across Amazon Rekognition, which also looks good but has a steep learning curve for me.
I am somewhat confused that there is no software out there to anonymize pictures without userinput, I though in the age of StreetView this would be easier.
Am I missing something out here?
One of the simplest face localisation API I'm aware of is this one (Python, but based on dlib, which is C++ library).
It's well documented and almost ridiculously easy to use from Python.
It will give you the coordinates of a bounding box which you can blur.
Note that two different detectors you can use. The "classic" one is quite fast but misses some faces, especially when not seen full frontal. The one based on a deep learning model is much better but is quite slow without a GPU.
If you want to be a bit more sophisticated, it can give you facial feature locations (but only with the "classic" detector) and you could place the center of a blurring circe on the nose or so, but for a large number of images, I would just go for the bounding box.

PARABOLIC (not panoramic) video stitching?

I want to do something like this but in reverse-- so that the cameras are outside and pointing inward. Let's start with the abstract and get specific:
1) Are there any TOOLS that will do this for me? How close can I get using existing software?
2) Say the nearest tool is a graphics library like OpenCV. I've taken linear algebra and have an undergraduate degree in CS but without any special training in graphics. Where should I go from there?
3) If I really am undergoing a decade-long spiritual quest of a self-teaching+programming exercise to make this happen, are there any papers or other resources that you aware of that might aid me?
I think the demo you linked uses a 360° camera (see the black circle on the bottom) and does not involve stitching in any way.
About your question, are you aware of this work? They don't do stitching either, just blending between different views.
If you use inward views, then the objects you will observe will probably be quite close to the cameras, while standard stitching assumes that objects are far away. Close 3D objects mean high distortion when you change the viewpoint (i.e. parallax & occlusions), which makes it difficult to interpolate between two views. Hence, if you want stitching, then your main problem is to correctly handle parallax effects & occlusions between the views.
In my opinion, the most promising approach would be to do live stereo matching (i.e. dense 3D reconstruction) between the two camera images closest to your current viewpoint, and then interpolate the estimated disparities to generate an expected image. However, it's not likely to run in real-time, as demonstrated in the demo you linked, and the result could be quite ugly...
EDIT
You can also have a look at this paper, which uses a different but interesting approach, however maybe not directly useful in your case since it requires the new viewpoint to be visible in the available images.

Real time tracking of hand

I am trying to detect and track hand in real time using opencv. I thought haar cascade classifiers would yield a fair result. After training with 10k and 20k positive and negative images respectively, I obtained a classifier xml file. Unfortunately, it detects hand only in certain positions, proving that it works best only for rigid objects. So I am now thinking of adopting another algorithm that can track hand, once detected through haar classifier.
My question is,if I make sure that haar classifier detects hand in a certain frame, certain position, what method would yield robust tracking of hand further?
I searched web a bit, and have understood I can go for optical flow of the detected hand , or kalman filter or particle filter, but also have come across their own disadvantages.
also, If I incorporate stereo vision, would it help me, as I can possibly reconstruct hand in 3d.
You concluded rightly about Haar features - they aren't that useful when it comes to non-rigid objects.
Take a look at the following papers which use skin colour to detect hands.
Interaction between hands and wearable cameras
Markerless inspection of augmented reality objects
and this paper that uses KLT features to track the hand after the first detection:
Fast 2D hand tracking with flocks of features and multi-cue integration
I would say that a stereo camera will not help your cause much, as 3D reconstruction of non-rigid objects isn't straightforward and would require a whole lot of innovation and development. However, you can take a look at the papers in the hand pose estimation section of this page if you wish to pursue 3D tracking.
EDIT: Also take a look at this recent paper, which seems to get good results.
Zhang et al.'s Real-time Compressive Tracking does a reasonable job of tracking an object, once it has been detected by some other method, provided that the motion is not too fast. They have an OpenCV implementation (but it would need a bit of work to reuse).
This research paper describes a method to track hands, without using gloves by using a stereo camera setup.
there have been similar questions on stack overflow...
have a look at my answer and that of others: https://stackoverflow.com/a/17375647/1463143
you can for certain get better results by avoiding haar training and detection for deformable entities.
CamShift algorithm is generally fast and accurate, if you want to track the hand as a single entity. OpenCV documentation contains a good, easy-to-understand demo program that you can easily modify.
If you need to track fingers etc., however, further modeling will be needed.

Visual Odometry (aka. Egomotion estimation) with OpenCV

I'm planning to implement an application with augmented reality features. For one of the features I need an egomotion estimation. Only the camera is moving, in a space with fixed objects (nothing or only small parts will be moving, so that they might be ignored).
So I searched and read a lot and stumbled upon OpenCV. Wikipedia explicitly states that it could be used for egomotion. But I cannot find any documentation about it.
Do I need to implement the egomotion algorithm by myself with OpenCV's object detection methods? (I think it would be very complex, because objects will move in different speed depending on their distance to the camera. And I also need to regard rotations.)
If so, where should I start? Is there a good code example for a Kanade–Lucas–Tomasi feature tracker with support for scaling and rotation?
P.S.: I also know about marker based frameworks like vuforia, but using a marker is something I would like to prevent, as it restricts the possible view points.
Update 2013-01-08: I learned that Egomotion Estimation is better known as Visual Odometry. So I updated the title.
You can find a good implementation of monocular visual odometry based on optical flow here.
It's coded using emgucv (C# opencv wrapper) but you will find no issues on convert it back to pure opencv.
Egomotion (or visual odometry) is usually based on optical flow, and OpenCv has some motion analysis and object tracking functions for computing optical flow (in conjunction with a feature detector like cvGoodFeaturesToTrack()).
This example might be of use.
Not a complete solution, but might at least get you going in the right direction.

Resources