How to recognize the slightly 3D object - augmented-reality

I am making the AR application.
I would like to recognize slightly 3d object like this ,
(The buttons beside on the handle)
It looks like the both flat 2D image or 3D image ....
Howevere sometimes light or shadow disturbs the recognize ....
Then I think about some solution.
Do you thinkg 3D scanner and object recognition is helpfull for it? or
2D normal recognize ,if so is there good way to recoginize (such as using many pattern of pictures depending on light)??

Standard image recognition algorithms work best when used with textured images with good image contrast. The scenario you are describing would require specific algorithms tailored to this special situation. OpenCV has a couple of algorithms that might help you in your use case (Feature Detection, Feature Descriptors).

Related

Easy to detect shapes/patterns to put on corners of a form

I am trying to create a form which will be filled and photographed later on. An issue that I am facing is that of alignment. I came across some deep learning solutions which detect the corners of form. But this is a lot of times inaccurate in my use case where the sheet of paper is folded-reopened/crumpled. I also don't have a lot of flexibility/hard-coding options in the deeplearning process.
Are there any patterns which OpenCV can detect with ~100% accuracy no matter the orientation of the pattern? I will be putting different patterns on 4 corners of the sheet. I am thinking of using the inbuilt template matching function or other pattern recognition algorithms. There are some common patters like a big '+' sign or a star etc that I am trying to avoid. I also tried putting barcodes on the corners because they are also detected fairly easily(Not concerned with the contents of the barcode only their relative positioning). But depending on the quality of image the barcode isn't always detected.
ArUco markers sound like the best option for you, they can easily be implemented in OpenCV.
Aruco example and documentation:https://docs.opencv.org/4.x/d5/dae/tutorial_aruco_detection.html
Python example: https://pyimagesearch.com/2020/12/21/detecting-aruco-markers-with-opencv-and-python/

PARABOLIC (not panoramic) video stitching?

I want to do something like this but in reverse-- so that the cameras are outside and pointing inward. Let's start with the abstract and get specific:
1) Are there any TOOLS that will do this for me? How close can I get using existing software?
2) Say the nearest tool is a graphics library like OpenCV. I've taken linear algebra and have an undergraduate degree in CS but without any special training in graphics. Where should I go from there?
3) If I really am undergoing a decade-long spiritual quest of a self-teaching+programming exercise to make this happen, are there any papers or other resources that you aware of that might aid me?
I think the demo you linked uses a 360° camera (see the black circle on the bottom) and does not involve stitching in any way.
About your question, are you aware of this work? They don't do stitching either, just blending between different views.
If you use inward views, then the objects you will observe will probably be quite close to the cameras, while standard stitching assumes that objects are far away. Close 3D objects mean high distortion when you change the viewpoint (i.e. parallax & occlusions), which makes it difficult to interpolate between two views. Hence, if you want stitching, then your main problem is to correctly handle parallax effects & occlusions between the views.
In my opinion, the most promising approach would be to do live stereo matching (i.e. dense 3D reconstruction) between the two camera images closest to your current viewpoint, and then interpolate the estimated disparities to generate an expected image. However, it's not likely to run in real-time, as demonstrated in the demo you linked, and the result could be quite ugly...
EDIT
You can also have a look at this paper, which uses a different but interesting approach, however maybe not directly useful in your case since it requires the new viewpoint to be visible in the available images.

Object Recognition by Outlines vs Features

Context:
I have the RGB-D video from a Kinect, which is aimed straight down at a table. There is a library of around 12 objects I need to identify, alone or several at a time. I have been working with SURF extraction and detection from the RGB image, preprocessing by downscaling to 320x240, grayscale, stretching the contrast and balancing the histogram before applying SURF. I built a lasso tool to choose among detected keypoints in a still of the video image. Then those keypoints are used to build object descriptors which are used to identify objects in the live video feed.
Problem:
SURF examples show successful identification of objects with a decent amount of text-like feature detail eg. logos and patterns. The objects I need to identify are relatively plain but have distinctive geometry. The SURF features found in my stills are sometimes consistent but mostly unimportant surface features. For instance, say I have a wooden cube. SURF detects a few bits of grain on one face, then fails on other faces. I need to detect (something like) that there are four corners at equal distances and right angles. None of my objects has much of a pattern but all have distinctive symmetric geometry and color. Think cellphone, lollipop, knife, bowling pin. My thought was that I could build object descriptors for each significantly different-looking orientation of the object, eg. two descriptors for a bowling pin: one standing up and one laying down. For a cellphone, one laying on the front and one on the back. My recognizer needs rotational invariance and some degree of scale invariance in case objects are stacked. Ability to deal with some occlusion is preferable (SURF behaves well enough) but not the most important characteristic. Skew invariance would be preferable and SURF does well with paper printouts of my objects held by hand at a skew.
Questions:
Am I using the wrong SURF parameters to find features at the wrong scale? Is there a better algorithm for this kind of object identification? Is there something as readily usable as SURF that uses the depth data from the Kinect along with or instead of the RGB data?
I was doing something similar for a project, and ended up using a super simple method for object recognition, which was using OpenCV blob detection, and recognizing objects based on their areas. Obviously, there needs to be enough variance for this method to work.
You can see my results here: http://portfolio.jackkalish.com/Secondhand-Stories
I know there are other methods out there, one possible solution for you could be approxPolyDP, which is described here:
How to detect simple geometric shapes using OpenCV
Would love to hear about your progress on this!

Detect custom image marker in real time using OpenCV on iOS

I would like some hints, maybe more, on detecting a custom image marker in a real-time video feed. I'm using OpenCV, iPhone and the camera feed.
By custom image marker I'm referring to a predefined image, but it can be any kind of image (not a specific designed marker). For example, it can be a picture of some skyscrapers.
I've already worked with ARTags and understand how they are detected, but how would I detect this custom image and especially find out its position & orientation?
What makes a good custom image to be detected successfully?
Thanks
The most popular markers used in AR are
AR markers (a simple form of QR codes) - those detected by AR tookit & others
QR codes. There are plenty of examples on how to create/detect/read QR.
Dot grids. Similar with the chess grids used in calibration. It seems their detection can be more robust than the classical chess grid. OpenCV has codes related to dot grid detection in the calibration part. Also, the OpenCV codebase offers a good starting point to extract 3D position and orientation.
Chess grids. Similar to dot grids. They were the standard calibration pattern, and some people used them for marker detection of a long time. But they lost their position to dot grids recently, when some people discovered that dots can be detected with better accuracy.
Note:
Grids are symmetrical. I bet you already know that. But that means you will not be able to
recover full orientation data from them. You will get the plane where the grid lies, but nothing more.
Final note:
Code and examples for the first two are easily found on the Internet. They are considered the best by many people. If you decide to use the grid patterns, you have to enjoy some math and image processing work :) And it will take more.
This answer is valid no more since Vuforia is now a paid engine.
I think you should give Vuforia a try. It's a AR engine that can use any image you want as a marker. What makes a good marker for Vuforia is high frequency images.
http://www.qualcomm.com/solutions/augmented-reality
Vuforia is a free to use engine.

Finding subpattern position in an image/pattern

Lets say I have an image or two dimensional pattern similar to QRcode and call it a template. Now I have a set of subimages that I want to match with my template and what's important - find their precise location in the template. I think similar problem is being solved in 'smart papers' http://en.wikipedia.org/wiki/Anoto and in kinect's grid of infrared dot pattern.
Does anyone have some clues how something similar can be implemented (even just
keywords to look up)?
I had few ideas:
opencv template matching method - poor results when rotated, scaled, skewed
SURF feature detection and matching - it's pretty good but result is worse when subimage is a really small chunk of the template. Besides I think that specificly picked up pattern would improve location finding rather than arbitary image. Also I think SURF is an overkill and I need something efficient that can handle real time mobile camera streams.
creating an image consisting of many QRcodes that only stores coordinates as data - drawback i that QRcodes will have to pretty small to allow
fine-grained positioning but then it's difficult to recognise them. Pros - they use only black color and have many white spaces (ink conservation)
2-dimensional colorful gradient image (similar to color model map) - I think this will be sensitive to lightness
QRCodes are square. Using feature detection to find the grid, you can unproject it. Then opencv's template matching will work fine.

Resources