I would like to build a very simple AR app, which is able to detect a white sheet of A4 paper in its surrounding. I thought it would be enough to use Apple's image recognition sample project as well as a white sample image in the ratio of a A4 sheet but the ARSession will fail.
One or more reference images have insufficient texture: white_a4,
NSLocalizedRecoverySuggestion=One or more images lack sufficient
texture and contrast for accurate detection. Image detection works
best when an image contains multiple high-contrast regions distributed
across its extent.
Is there a simple way, to detect sheets of paper using ARKit? Thanks!
I think even ARKit 3.0 isn't ready for an abstract white sheet's detection at the moment.
If you have a white sheet with some markers at its corners, or some text on it, or, even, a white sheet placed inside definite environment (it's a kind of detection based on surroundings, not on the sheet itself) – then it has some sense.
But simple white paper has no distinct marks on it, hence ARKit has no understanding what it is, what its color is (outside a room it has cold tint, for instance, but inside a room it has warm tint), what a contrast is (contrast's important property in image detection) and how it's oriented (this mainly depends on your PoV).
Suppose the common sense of image detection is that ARKit detects image, not its absence.
So, for successive detection you'll need to give ARKit not only a sheet but its surrounding as well.
Also, you can look at Apple's recommendations when working with image detection technique:
Enter the physical size of the image in Xcode as accurately as possible. ARKit relies on this information to determine the distance of the image from the camera. Entering an incorrect physical size will result in an ARImageAnchor that’s the wrong distance from the camera.
When you add reference images to your asset catalog in Xcode, pay attention to the quality estimation warnings Xcode provides. Images with high contrast work best for image detection.
Use only images on flat surfaces for detection. If an image to be detected is on a nonplanar surface, like a label on a wine bottle, ARKit might not detect it at all, or might create an image anchor at the wrong location.
Consider how your image appears under different lighting conditions. If an image is printed on glossy paper or displayed on a device screen, reflections on those surfaces can interfere with detection.
I must add that you need a unique texture pattern, not a repetitive one.
What you could do is run a simple ARWorldTrackingConfiguration where you periodically analyze the camera image for rectangles using the Vision framework.
This post (https://medium.com/s23nyc-tech/using-machine-learning-and-coreml-to-control-arkit-24241c894e3b) describes how to use ARKit in combination with CoreML
Related
I need to make an app that detects images and their position, and displays AR content on them. These images will change during the lifetime of the app, and there can be many of them. I'm wondering how to design this kind of app. ARKit can provide this functionality - detect image and it's orientation, and display AR content on it. But the problem is that ARKit can detect only a limited number of images at a time. If I have for example 300 images, then there can be problem. Maybe I could prepare some ML dataset to pre-detect image, and then assign it as an ARKit trackable on the fly? Is this the right approach? What else could I do to make such an app with dynamic and large set of images to detect?
Regarding a ML approach, you can use just about any state-of-the-art object detection network to pull the approximate coordinates of your desired target and extract that section of the frame, passing positives to ARKit or similar. The downside is that training will probably be resource-intensive. It could work, but I can't speak to its efficiency relative to other approaches.
In looking to extend this explanation, I see the ARKit 2.0 handles (what seems to be) what you're trying to do; is this insufficient?
To answer your question in the comments, CoreML seems to offer models for object recognition but not localization, so I suspect it'd be necessary to use their converter after training a model such as these. The input to this network would be frames from camera, and output would be detected classes with probabilities of detection, and approximate coordinates; if your targets are present, and roughly where they are.
Again, though, if you're looking for 2D images rather than 3D+ objects, and especially if it's an ARKit app anyway, it really looks like ARKit's built-in tracking will be much more effective at substantially lower development cost.
At WWDC '19 ARKit 3 was touted to support up to 100 images for image detection. Image tracking supports a lower number of images, which I believe is still under 10. You have to recognize images yourself if you want more than that, currently.
As an idea, you can identify rectangles in the camera feed and then apply a CIPerspectiveCorrection filter to extract a fully 2D image based on the detected rectangle. See Tracking and Altering Images sample code which does something similar.
You then compare the rectangle's image data against your set of 300 source images. ARKit stopped at 100 likely due to performance concerns, but it's possible you can surmount those numbers with a performance metric that's acceptable to your own criteria.
Overview
I am attempting to build a prototype of a vision system that would apply pattern matching to figure out the orientation of boxes (eg. soap boxes).
Image sample
Below are real-time captured images of soap boxes in actual environment having two of four possible orientations. (Front_Straight and Back_Inverted orientations).
The real-time images will be very similar to these (300x200 pixels per image approx.)
____
The template images will be fed to the system in prior and it has to determine the orientation of boxes moving on a conveyor. The boxes on conveyor are guided so that they can take only one of 4 possible orientations Front_Straight, Front_Inverted, Back_Straight and Back_Inverted i.e boxes cannot be angular. The camera and the conveyor are fixed so the image size of real-time boxes is constant 300px by 200px. (I have used monochrome camera, if needed colour camera can be used too)
Some properties of the vision system prototype:
Fixed constant lighting.
The real-time image of box will be quite
low-res as attached(300x200 per box)
Minimal motion blur or imaging artefacts
OpenCV C++ based coding environment.
Intel core i5 CPU based PC will
be used.
Problem Statement
I am looking for a light weight yet robust algorithm that can fairly match template image with real-time images of boxes on conveyor to extract the face and orientation. I am new to feature matching so please guide me as to which feature detector and matcher will be most suitable for this particular case. Also please let me know if it is possible to attain 97% plus accuracy using the low-res realtime image as attached.
You have a very fortunate case, having the images with very little variation. Any feature detector should perform very well in this scenario. Since, in OpenCV, the interface is common, they are very easy to compare against each other. From my experience, ORB tends to be quite fast and with good results, but I expect SIFT/SURF to work in your case too.
I wouldn't expect the resolution to be a problem.
Using iOS 11 and iOS 12 and ARKit, we are currently able to detect planes on horizontal surfaces, and we may also visualize that plane on the surface.
I am wondering if we can declare, through some sort of image file, specific surfaces in which we want to detect planes? (possibly ignoring all other planes that ARKit detects from other surfaces)
If that is not possible, could we then perhaps capture the plane detected (via an image), to which we could then process through a CoreML model which identifies that specific surface?
ARKit has no support for such thing at the moment. You can indeed capture the plane detected as an image and if you're able to match this through core ML in real time, I'm sure lot of people would be interested!
You should:
get the 3D position of the corners of the plane
find their 2D position in the frame, using sceneView.projectPoint
extract the frame from the currentFrame.capturedImage
do an affine transform on the image to be left with the your plane, reprojected to a rectangle
do some ML / image processing to detect a match
Keep in mind that the ARKit rectangle detection is often not well aligned, and can have only part of the full plane.
Finally, unfortunately, the feature points that ARKit exposes are not useful since they dont contain any characteristics used for matching feature points across frames, and Apple has not say what algorithm they use to compute their feature points.
Here is small demo code for Find horizontal surface. In #Swift5 Github
My goal is to find known logos in static image and videos. I want to achieve that by using feature detection with KAZE or AKAZE and RanSac.
I am aiming for a similar result to: https://www.youtube.com/watch?v=nzrqH...
While experimenting with the detection example from the docs which is great btw, i was facing several issues:
Object resolution: Differences in size between the known object and
the resolution of the scene where the object should be located
sometimes breaks the detection algorithm - the object won't be
recognized in images with a low resolution although the image quality
is still allright for a human eye.
Color contrast with the background: It seems, that the detection can
easily be distracted by different background contrasts (eg: object is
logo black on white background, logo in scene is white on black
background). How can I make the detection more robust against
different luminations and background contrasts?
Preprocessing: Should there be done any kind of preprocessing of the
object / scene? For example enlarge the scene up to a specific size?
Is there any guideline how to approach the feature detection in
several steps to get the best results?
I think your issue is more complicated than feature-descriptor-matching-homography process.
It is more likely oriented to pattern recognition or classification.
You can check this extended paper review of shape matching:
http://www.staff.science.uu.nl/~kreve101/asci/vir2001.pdf
Firstly, the resolution of images is very important,
because usually matching operation makes a pixel intensity cross-correlation
between your sample image (logo) and your process image, so you will get the best-crosscorrelated area.
In the same way, the background colour intensity
is very important because background illumination could affect severally to your final result.
Feature-based methods are widely researched:
http://docs.opencv.org/2.4/modules/features2d/doc/feature_detection_and_description.html
http://docs.opencv.org/2.4/modules/features2d/doc/common_interfaces_of_descriptor_extractors.html
So for example, you can try alternative methods such as:
Hog descritors: Histogram oriented gradients:
https://en.wikipedia.org/wiki/Histogram_of_oriented_gradients
Pattern matching or template matching
http://docs.opencv.org/2.4/doc/tutorials/imgproc/histograms/template_matching/template_matching.html
I think the lastest (Pattern matching) is the easiest to check your algorithm.
Hope these references helps.
Cheers.
Unai.
I am using OpenCV to process some videos where a user is placing their hands on different parts of a wall. I've selected some regions of interest and I'm currently just using cv2.absdiff on the original image of the wall with no user and the current frame to detect whether the user has their hand in a region of interest by looking at the average pixel difference. If it's above some threshold, I consider that region "activated".
The problem I'm having is that some of the video clips contain lighting and positions that result in the user casting a shadow over certain ROIs, such that they are above the threshold. Is there a good way to filter out shadows when diffing images?
OpenCV has a Mixture of Gaussian based background subtractor which also has an option to account for shadow. You can use this instead of absdiff. MOG can be a bit slow though, compared to absdiff.
Alternatively, you can convert to HSV, and check that the Hue doesn't change.
You could first detect shadow regions in the original images, and exclude them from the difference imaging part. This paper provides a simple but effective method to detect shadows in images. They explore a colour space that is invariant to shadows.