Detect custom image marker in real time using OpenCV on iOS - ios

I would like some hints, maybe more, on detecting a custom image marker in a real-time video feed. I'm using OpenCV, iPhone and the camera feed.
By custom image marker I'm referring to a predefined image, but it can be any kind of image (not a specific designed marker). For example, it can be a picture of some skyscrapers.
I've already worked with ARTags and understand how they are detected, but how would I detect this custom image and especially find out its position & orientation?
What makes a good custom image to be detected successfully?
Thanks

The most popular markers used in AR are
AR markers (a simple form of QR codes) - those detected by AR tookit & others
QR codes. There are plenty of examples on how to create/detect/read QR.
Dot grids. Similar with the chess grids used in calibration. It seems their detection can be more robust than the classical chess grid. OpenCV has codes related to dot grid detection in the calibration part. Also, the OpenCV codebase offers a good starting point to extract 3D position and orientation.
Chess grids. Similar to dot grids. They were the standard calibration pattern, and some people used them for marker detection of a long time. But they lost their position to dot grids recently, when some people discovered that dots can be detected with better accuracy.
Note:
Grids are symmetrical. I bet you already know that. But that means you will not be able to
recover full orientation data from them. You will get the plane where the grid lies, but nothing more.
Final note:
Code and examples for the first two are easily found on the Internet. They are considered the best by many people. If you decide to use the grid patterns, you have to enjoy some math and image processing work :) And it will take more.

This answer is valid no more since Vuforia is now a paid engine.
I think you should give Vuforia a try. It's a AR engine that can use any image you want as a marker. What makes a good marker for Vuforia is high frequency images.
http://www.qualcomm.com/solutions/augmented-reality
Vuforia is a free to use engine.

Related

Easy to detect shapes/patterns to put on corners of a form

I am trying to create a form which will be filled and photographed later on. An issue that I am facing is that of alignment. I came across some deep learning solutions which detect the corners of form. But this is a lot of times inaccurate in my use case where the sheet of paper is folded-reopened/crumpled. I also don't have a lot of flexibility/hard-coding options in the deeplearning process.
Are there any patterns which OpenCV can detect with ~100% accuracy no matter the orientation of the pattern? I will be putting different patterns on 4 corners of the sheet. I am thinking of using the inbuilt template matching function or other pattern recognition algorithms. There are some common patters like a big '+' sign or a star etc that I am trying to avoid. I also tried putting barcodes on the corners because they are also detected fairly easily(Not concerned with the contents of the barcode only their relative positioning). But depending on the quality of image the barcode isn't always detected.
ArUco markers sound like the best option for you, they can easily be implemented in OpenCV.
Aruco example and documentation:https://docs.opencv.org/4.x/d5/dae/tutorial_aruco_detection.html
Python example: https://pyimagesearch.com/2020/12/21/detecting-aruco-markers-with-opencv-and-python/

iOS:Which Augmented Reality SDK for virtual try room to be used?

I am working on iOS Augmented Reality project, Where i need to integrate virtual dressing concept.
I tried OpenCV, it worked as desired for me in Face Detection Scenario Only but when i did Upper Body Portion, That didn't work for me as desired.
I used UPPER_BODY_HAAR_CASCADE but it didn't work as it was desired
it came as something like
but my desired output is something like this
If someone has achieved this functionality in iOS, Please Reply me
Not exactly answer you are looking for. You make your app depending on the sdk you choose. Most of them are quite expensive to use and may suffer from changing the use policy. Additionally you drag all the extensive functionality you don't need into your app. So at the end of day your app is 60-100MB in size.
If I was you (and I was in similar situation), I would develop own little sdk with the functionality you need. If you know how to do it then it takes couple days for the basic things to work. Plus opencv and you are in good shape.
PS. #Tommy asked interesting question. How one can approach to implement something like on this video: youtube.com/watch?v=IBE11ROpxHE
Adding some info which is too long for comment.
#Tommy Nice video. It seems to have all we need to proceed. First of all, for any AR application you need your camera (mobile phone camera) calibration info. In simple case, it contains two matrixes: camera matrix and distortion matrix. Camera matrix is then used for creating opengl projection matrix (how the 3d model is projected to 2d flat screen, field of view, planes, etc). And distortions matrix is used for example, for warping parts of your input frame in case of detecting something. In the example with watches, we need to detect the belt and watches body in order to place the 3d model in that position. Given the paper watches is not having ideal perspective with 90 degrees angle to the eye, it needs to be transformed to this view.
In other words, your paper watches looks like this:
/---/
/ /
/---/
And for the analysis and detecting the model name you need it look like this:
---
| |
| |
---
This is where distortion matrix is used in order to have precise transformation. And different cameras have their own distortions.
Most of application use so called offline calibration. There is a chessboard and its feed into opencv functions that detect cells on series of frames with different perspective, and build the matrices based on how the cells are shaped.
In your case, the belt of your watch may be designed in a way that it will contain all the needed for online calibration. On your video it has special pattern, I'm pretty sure its done exactly for this purpose. You may do the same and use chessboard pattern for simplicity.
Then you could use lets say 25 first frames for online calibration and then having all the matrixes you go for detecting paper watches, building projection matrix and replace it with your 3d model. If all is done right then your paper watcthes will have coord 0 0 0 in 3d space and you could easily place something else in that position.

Difference Between Marker based and Markerless Augmented Reality

I am totally new to AR and I searched on the internet about marker based and markerless AR but I am confused with marker based and markerless AR..
Lets assume an AR app triggers AR action when it scans specific images..So is this marker based AR or markerless AR..
Isn't the image a marker?
Also to position the AR content does marker based AR use devices' accelerometer and compass as in markerless AR?
In a marker-based AR application the images (or the corresponding image descriptors) to be recognized are provided beforehand. In this case you know exactly what the application will search for while acquiring camera data (camera frames). Most of the nowadays AR apps dealing with image recognition are marker-based. Why? Because it's much more simple to detect things that are hard-coded in your app.
On the other hand, a marker-less AR application recognizes things that were not directly provided to the application beforehand. This scenario is much more difficult to implement because the recognition algorithm running in your AR application has to identify patterns, colors or some other features that may exist in camera frames. For example if your algorithm is able to identify dogs, it means that the AR application will be able to trigger AR actions whenever a dog is detected in a camera frame, without you having to provide images with all the dogs in the world (this is exaggerated of course - training a database for example) when developing the application.
Long story short: in a marker-based AR application where image recognition is involved, the marker can be an image, or the corresponding descriptors (features + key points). Usually an AR marker is a black&white (square) image,a QR code for example. These markers are easily recognized and tracked => not a lot of processing power on the end-user device is needed to perform the recognition (and optionally tracking).
There is no need of an accelerometer or a compass in a marker-based app. The recognition library may be able to compute the pose matrix (rotation & translation) of the detected image relative to the camera of your device. If you know that, you know how far the recognized image is and how it is rotated relative to your device's camera. And from now on, AR begins... :)
Well. Since I got downvoted without explanation. Here is a little more detail on markerless tracking:
Actual there are several possibilities for augmented reality without "visual" markers but none of them called markerless tracking.
Showing of the virtual information can be triggered by GPS, Speech or simply turning on your phone.
Also, people tend to confuse NFT(Natural feature tracking) with markerless tracking. With NFT you can take a real life picture as a marker. But it is still a "marker".
This site has a nice overview and some examples for each marker:
Marker-Types
It's mostly in german but so beware.
What you call markerless tracking today is a technique best observed with the Hololens(and its own programming language) or the AR-Framework Kudan. Markerless Tracking doesn't find anything on his own. Instead, you can place an object at runtime somewhere in your field of view.
Markerless tracking is then used to keep this object in place. It's most likely uses a combination of sensor input and solving the SLAM( simultaneous localization and mapping) problem at runtime.
EDIT: A Little update. It seems the hololens creates its own inner geometric representation of the room. 3D-Objects are then put into that virtual room. After that, the room is kept in sync with the real world. The exact technique behind that seems to be unknown but some speculate that it is based on the Xbox Kinect technology.
Let's make it simple:
Marker-based augmented reality is when the tracked object is black-white square marker. A great example that is really easy to follow shown here: https://www.youtube.com/watch?v=PbEDkDGB-9w (you can try out by yourself)
Markerless augmented reality is when the tracked object can be anything else: picture, human body, head, eyes, hand or fingers etc. and on top of that you add virtual objects.
To sum it up, position and orientation information is the essential thing for Augmented Reality that can be provided by various sensors and methods for them. If you have that information accurate - you can create some really good AR applications.
It looks like there may be some confusion between Marker tracking and Natural Feature Tracking (NFT). A lot of AR SDK's tote their tracking as Markerless (NFT). This is still marker tracking, in that a pre-defined image or set of features is used. It's just not necessarily a black and white AR Toolkit type of marker. Vuforia, for example, uses NFT, which still requires a marker in the literal sense. Also, in the most literal sense, hand/face/body tracking is also marker tracking in that the marker is a shape. Markerless, inherent to the name, requires no pre-knowledge of the world or any shape or object be present to track.
You can read more about how Markerless tracking is achieved here, and see multiple examples of both marker-based and Markerless tracking here.
Marker based AR uses a Camera and a visual marker to determine the center, orientation and range of its spherical coordinate system. ARToolkit is the first full featured toolkit for marker based tracking.
Markerless Tracking is one of best methods for tracking currently. It performs active tracking and recognition of real environment on any type of support without using special placed markers. Allows more complex application of Augmented Reality concept.

Motion Sensing by Camera in iOS

I am working on an app in iOS that will occur an event if camera detects some changes in image or we can say motion in image. Here I am not asking about face recognition or a particular colored image motion, And I got all result for OpenCV when I searched, And I also found that we can achieve this by using gyroscope and accelerometer both , but how??
I am beginner in iOS.So my question is , Is there any framework or any easy way to detect motion or motion sensing by camera.And How to achieve?
For Example if I move my hand before camera then it will show some message or alert.
And plz give me some useful and easy to understand links about this.
Thanx
If all you want is some kind of crude motion detection, my open source GPUImage framework has a GPUImageMotionDetector within it.
This admittedly simple motion detector does frame-to-frame comparisons, based on a low-pass filter, and can identify the number of pixels that have changed between frames and the centroid of the changed area. It operates on live video and I know some people who've used it for motion activation of functions in their iOS applications.
Because it relies on pixel differences and not optical flow or feature matching, it can be prone to false positives and can't track discrete objects as they move in a frame. However, if all you need is basic motion sensing, this is pretty easy to drop into your application. Look at the FilterShowcase example to see how it works in practice.
I don't exactly understand what you mean here:
Here I am not asking about face recognition or a particular colored
image motion, because I got all result for OpenCV when I searched
But I would suggest to go for opencv as you can use opencv in IOS. Here is a good link which helps you to setup opencv in ios.
There are lot of opencv motion detection codes online and here is one among them, which you can make use of.
You need to convert the UIImage ( image type in IOS ) to cv::Mat or IplImage and pass it to the opencv algorithms. You can convert using this link or this.

Finding subpattern position in an image/pattern

Lets say I have an image or two dimensional pattern similar to QRcode and call it a template. Now I have a set of subimages that I want to match with my template and what's important - find their precise location in the template. I think similar problem is being solved in 'smart papers' http://en.wikipedia.org/wiki/Anoto and in kinect's grid of infrared dot pattern.
Does anyone have some clues how something similar can be implemented (even just
keywords to look up)?
I had few ideas:
opencv template matching method - poor results when rotated, scaled, skewed
SURF feature detection and matching - it's pretty good but result is worse when subimage is a really small chunk of the template. Besides I think that specificly picked up pattern would improve location finding rather than arbitary image. Also I think SURF is an overkill and I need something efficient that can handle real time mobile camera streams.
creating an image consisting of many QRcodes that only stores coordinates as data - drawback i that QRcodes will have to pretty small to allow
fine-grained positioning but then it's difficult to recognise them. Pros - they use only black color and have many white spaces (ink conservation)
2-dimensional colorful gradient image (similar to color model map) - I think this will be sensitive to lightness
QRCodes are square. Using feature detection to find the grid, you can unproject it. Then opencv's template matching will work fine.

Resources