Is ARCore object recognition possible? - augmented-reality

My goal is to overlay material/texture on a physical object (it would be an architectural model) that I would have an identical 3d model of. The model would be static (on a table if that helps), but I obviously want to look at the object from any side. The footprint area of my physical models would tend to be no smaller than 15x15cm and could be as large as 2-3m^2, but I would be willing to change the size of the model to work with ARCore's capability.
I know ARCore is mainly designed to anchor digital objects to flat horizontal planes. My main question is, in its current state, is it capable of accompliahing my end goal? If i have this right, it would record physical point cloud data and attempt to match it to point cloud data of my digital model, then overlapping the two on the phone screen?
If that really isn't what ARCore is for, is there an alternative that I should be focusing on? In my head this sounded fairly straightforward, but I'm sure I'll get way out of my depth if I go about it an inefficient way. Speaking of depth, I would prefer not to use a depth sensor, since my target devices are phones.

I most definitely hope that it will be possible in the future - after all an AR toolkit without Computer Vision is not that helpful.
Unfortunately, according to the ARCore employee Ian, this is currently not directly supported but you could try to access the pixels via glReadPixels and then use OpenCV with these image bytes.
Quote from Ian:
I can't speak to future plans, but I agree that it's a desirable
capability. Unfortunately, my understanding is that current Android
platform limitations prevent providing a single buffer that can be
used as both a GPU texture and CPU-accessible image, so care must be
taken in providing that capability.

Updated: 25 September, 2022.
At the moment there's still no 3D Object Recognition API in ARCore 1.33.
But... You can use ML Kit framework and Augmented Images API (ARCore 1.2+) for some tasks.
According to Google documentation, you can use ARCore as input for Machine Learning models.

Related

Using OpenCv with no image proccessing background to detect objects on a pavement to avoid them

I'm a Software Engineering student in my last year in a 4-year bachelor degree program, I'm required to work on a graduation project of my own choice.
we are trying to find a way to notify the user of any thing the gets on his/her way while walking, this will be implemented as an android application so we have the ability to use the camera, we thought of Image processing and computer vision but neither me or any of my group members have any Image processing background, we searched a little bit and we found out about OpenCv.
So my question is do I need any special background to deal with OpenCv? and is it a good choice for the objective of my project to use computer vision, if not what alternatives do u advise me to use?
I appreciate your help.. thanks in advance!
At the first glance I would use 2 standard cameras to find depth image - stereo vision (similar to MS Kinect depth sensor)
from that it would be easy to fix a threshold to some distance.
Those algorithms are very CPU hungry so I do not think it will work on Android (although I have zero experience).
I you must use Android, I would look for some depth sensor (to avoid extracting depth data from 2 images)
For prototyping I would use MATLAB (or Octave), then I would switch to OpenCV (pointers, mem. allocations, blah...)

iOS: Real Time OCR on top of live camera feed (similar to iTunes Redeem Gift Card)

Is there a way to accomplish something similar to what the iTunes and App Store Apps do when you redeem a Gift Card using the device camera, recognizing a short string of characters in real time on top of the live camera feed?
I know that in iOS 7 there is now the AVMetadataMachineReadableCodeObject class which, AFAIK, only represents barcodes. I'm more interested in detecting and reading the contents of a short string. Is this possible using publicly available API methods, or some other third party SDK that you might know of?
There is also a video of the process in action:
https://www.youtube.com/watch?v=c7swRRLlYEo
Best,
I'm working on a project that does something similar to the Apple app store redeem with camera as you mentioned.
A great starting place on processing live video is a project I found on GitHub. This is using the AVFoundation framework and you implement the AVCaptureVideoDataOutputSampleBufferDelegate methods.
Once you have the image stream (video), you can use OpenCV to process the video. You need to determine the area in the image you want to OCR before you run it through Tesseract. You have to play with the filtering, but the broad steps you take with OpenCV are:
Convert the images to B&W using cv::cvtColor(inputMat, outputMat, CV_RGBA2GRAY);
Threshold the images to eliminate unnecessary elements. You specify the threshold value to eliminate, and then set everything else to black (or white).
Determine the lines that form the boundary of the box (or whatever you are processing). You can either create a "bounding box" if you have eliminated everything but the desired area, or use the HoughLines algorithm (or the probabilistic version, HoughLinesP). Using this, you can determine line intersection to find corners, and use the corners to warp the desired area to straighten it into a proper rectangle (if this step is necessary in your application) prior to OCR.
Process the portion of the image with Tesseract OCR library to get the resulting text. It is possible to create training files for letters in OpenCV so you can read the text without Tesseract. This could be faster but also could be a lot more work. In the App Store case, they are doing something similar to display the text that was read overlaid on top of the original image. This adds to the cool factor, so it just depends on what you need.
Some other hints:
I used the book "Instant OpenCV" to get started quickly with this. It was pretty helpful.
Download OpenCV for iOS from OpenCV.org/downloads.html
I have found adaptive thresholding to be very useful, you can read all about it by searching for "OpenCV adaptiveThreshold". Also, if you have an image with very little in between light and dark elements, you can use Otsu's Binarization. This automatically determines the threshold values based on the histogram of the grayscale image.
This Q&A thread seems to consistently be one of the top search hits for the topic of OCR on iOS, but is fairly out of date, so I thought I'd post some additional resources that might be useful that I've found as of the time of writing this post:
Vision Framework
https://developer.apple.com/documentation/vision
As of iOS 11, you can now use the included CoreML-based Vision framework for things like rectangle or text detection. I've found that I no longer need to use OpenCV with these capabilities included in the OS. However, note that text detection is not the same as text recognition or OCR so you will still need another library like Tesseract (or possibly your own CoreML model) to translate the detected parts of the image into actual text.
SwiftOCR
https://github.com/garnele007/SwiftOCR
If you're just interested in recognizing alphanumeric codes, this OCR library claims significant speed, memory consumption, and accuracy improvements over Tesseract (I have not tried it myself).
ML Kit
https://firebase.google.com/products/ml-kit/
Google has released ML Kit as part of its Firebase suite of developer tools, in beta at the time of writing this post. Similar to Apple's CoreML, it is a machine learning framework that can use your own trained models, but also has pre-trained models for common image processing tasks like Vision Framework. Unlike Vision Framework, this also includes a model for on-device text recognition of Latin characters. Currently, use of this library is free for on-device functionality, with charges for using cloud/SAAS API offerings from Google. I have opted to use this in my project, as the speed and accuracy of recognition seems quite good, and I also will be creating an Android app with the same functionality, so having a single cross platform solution is ideal for me.
ABBYY Real-Time Recognition SDK
https://rtrsdk.com/
This commercial SDK for iOS and Android is free to download for evaluation and limited commercial use (up to 5000 units as of time of writing this post). Further commercial use requires an Extended License. I did not evaluate this offering due to its opaque pricing.
'Real time' is just a set of images. You don't even need to think about processing all of them, just enough to broadly represent the motion of the device (or the change in the camera position). There is nothing built into the iOS SDK to do what you want, but you can use a 3rd party OCR library (like Tesseract) to process the images you grab from the camera.
I would look into Tesseract. It's an open source OCR library that takes image data and processes it. You can add different regular expressions and only look for specific characters as well. It isn't perfect, but from my experience it works pretty well. Also it can be installed as a CocoaPod if you're into that sort of thing.
If you wanted to capture that in real time you might be able to use GPUImage to catch images in the live feed and do processing on the incoming images to speed up Tesseract by using different filters or reducing the size or quality of the incoming images.
There's a project similar to that on github: https://github.com/Devxhkl/RealtimeOCR

Professional Object Tracking Solution

I want to build a video based tracking software. I can manage the control and display quite easily but the actual object tracking in a video stream is very difficult (color tracking is not an option).
Solutions like openCV would probably require a very long learning curve which I can't afford ATM.
Are there professional packages which expose a simple API for object tracking? C# and C++ are the preferred languages but other would be fine as well. Price is also less of an issue.
Computer Vision System Toolbox for MATLAB provides tracking functionality. Please check out the following examples:
Tracking a face
Tracking multiple objects
Generally, a lot depends on the specific problem you are trying to solve. Is the camera moving or stationary? Do you need to track a single object or multiple objects? Does your object have a distinctive color or texture? Does your object move in some predictable way?
Use OpenTLD. It tracks almost anything, but at a time, track only one thing. And code is in matlab.

Using Augmented Reality libraries for Academic Project

I'm planning on doing my Final Year Project of my degree on Augmented Reality. It will be using markers and there will also be interaction between virtual objects. (sort of a simulation).
Do you recommend using libraries like ARToolkit, NyARToolkit, osgART for such project since they come with all the functions for tracking, detection, calibration etc? Will there be much work left from the programmers point of view?
What do you think if I use OpenCV and do the marker detection, recognition, calibration and other steps from scratch? Will that be too hard to handle?
I don't know how familiar you are with image or video processing, but writing a tracker from scratch will be very time-consuming if want it to return reliable results. The effort also depends on which kind of markers you plan to use. Artoolkit e.g. compares the marker's content detected from the video stream to images you earlier defined as markers. Hence it tries to match images and returns a value of probability that a certain part of the video stream is a predefined marker. Depending on the threshold you are going to use and the lighting situation, markers are not always recognized correctly. Then there are other markers like datamatrix, qrcode, framemarkers (used by QCAR) that encode an id optically. So there is no image matching required, all necessary data can be retrieved from the video stream. Then there are more complex approaches like natural feature tracking, where you can use predefined images, given that they offer enough contrast and points of interest so they can be recognized later by the tracker.
So if you are more interested in the actual application or interaction than in understanding how trackers work, you should base your work on an existing library.
I suggest you to use OpenCV, you will find high quality algorithms and it is fast. They are continuously developing new methods so soon it will be possible to run it real-time in mobiles.
You can start with this tutorial here.
Mastering OpenCV with Practical Computer Vision Projects
I did the exact same thing and found Chapter 2 of this book immensely helpful. They provide source code for the marker tracking project and I've written a framemarker generator tool. There is still quite a lot to figure out in terms of OpenGL, camera calibration, projection matrices, markers and extending it, but it is a great foundation for the marker tracking portion.

What is an augmented reality mobile application?

I've heard the term "augmented reality" used before, but what does it mean?
In particular, what is an augmented reality iPhone application?
From: http://en.wikipedia.org/wiki/Augmented_reality
Augmented reality (AR) is a term for a
live direct or indirect view of a
physical, real-world environment whose
elements are augmented by virtual
computer-generated sensory input, such
as sound or graphics. It is related to
a more general concept called mediated
reality, in which a view of reality is
modified (possibly even diminished
rather than augmented) by a computer.
As a result, the technology functions
by enhancing one’s current perception
of reality.
In the case of Augmented Reality, the
augmentation is conventionally in
real-time and in semantic context with
environmental elements, such as sports
scores on TV during a match. With the
help of advanced AR technology (e.g.
adding computer vision and object
recognition) the information about the
surrounding real world of the user
becomes interactive and digitally
usable. Artificial information about
the environment and the objects in it
can be stored and retrieved as an
information layer on top of the real
world view. The term augmented reality
is believed to have been coined in
1990 by Thomas Caudell, an employee of
Boeing at the time.
Incidentally, there are some images at the above URL that should make what's being discussed above fairly evident.
An augmented reality application is software that adds (augments) data or visuals to your experience on your camera.
Popular examples include snapchat filters, yelp monocle, and various map applications.
"Augmented reality (AR) is a live direct or indirect view of a physical, real-world environment whose elements are "augmented" by computer-generated or extracted real-world sensory input such as sound, video, graphics or GPS data. It is related to a more general concept called computer-mediated reality, in which a view of reality is modified (possibly even diminished rather than augmented) by a computer. Augmented reality enhances one’s current perception of reality, whereas in contrast, virtual reality replaces the real world with a simulated one.1 Augmentation techniques are typically performed in real time and in semantic context with environmental elements, such as overlaying supplemental information like scores over a live video feed of a sporting event." source: wikipedia.org

Resources