I am currently working with Google Tango and Microsoft Hololens. I got the idea of scanning a room or an object using google Tango and then converting and showing it as hologram with the Hololens.
For that I need to get the ADF file on my computer.
Does someone know of a way to import adf-files onto a computer?
Do you know if it is possible to convert adf-files into usable 3d files?
An ADF is not a 3D scan of the room, it's a collection of feature descriptors from the computer vision algorithms with associated positional data, but the format is not documented.
You will want to use the point cloud from the depth sensor, convert it to a mesh (there are existing apps to do this) and import the mesh into a render engine on Hololens.
Related
I am trying to get the raw LiDAR data from Helios2 time of flight camera. How do I remove the built in features which sharpens the LiDAR data output of pointcloud.
I am trying to access code of the SDK where I can make some changes but could not find that in Windows version of the software.
Here's some research I have done so far:
- I have used Google Vision API to detect various face landmarks.
Here's the reference: https://developers.google.com/vision/introduction
Here's the link to Sample Code to get the facial landmarks. It uses the same Google Vision API. Here's the reference link: https://github.com/googlesamples/ios-vision
I have gone through the various blogs on internet which says MSQRD based on the Google's cloud vision. Here's the link to it: https://medium.com/#AlexioCassani/how-to-create-a-msqrd-like-app-with-google-cloud-vision-802b578b30a0
For Android here's the reference:
https://www.raywenderlich.com/158580/augmented-reality-android-googles-face-api
There are multiple paid SDK's which full fills the purpose. But they are highly priced. So cant able to afford it.
For instance:
1) https://deepar.ai/contact/
2) https://www.luxand.com/
There is possibility might have some see this question as duplicate of this:
Face filter implementation like MSQRD/SnapChat
But the thread is almost 1.6 years old with no right answers to it.
I have gone through this article:
https://dzone.com/articles/mimic-snapchat-filters-programmatically-1
It describes all the essential steps to achieve the desired results. But they advice to use their own made SDK.
As per my research no good enough material is around which helps to full fill the desired results like MSQRD face filters.
One more Github repository around which has same implementation but it doesn't gives much information about same.
https://github.com/rootkit/LiveFaceMask
Now my question is:
If we have the facial landmarks using Google Vision API (or even using
DiLib), how I can add 2d or 3d models over it. In which format this
needs to be done like this require some X,Y coordinates with vertices
calculation.
NOTE: I have gone through the Googles "GooglyEyesDemo" which adds the
preview layer over eyes. It basically adds a view over the face. So I
dont want to add UIView one dimensional preview layers over it. Image
attached for reference :
https://developers.google.com/vision/ios/face-tracker-tutorial
Creating Models: I also want to know how to create models for live
filters like MSQRD. I welcome any software or format recommendations.
Hope the research I have done will help others and someone else
experience helps me to achieve the desired results. Let me know if any
more details are required.**
Image attached for more reference:
Thanks
Harry
Canvas class is used in android for drawing such 3D / 2D models or core graphics for IOS can be used.
What you can do is detect the face components, take their location points and draw images on top of them. Consider going through this
You need to either predict x,y,z coordinates(check out this demo), either use x,y predictions but then find parameters of universal 3d-model & camera that will give the closest projection of current x,y.
the 3D point is generated by my laser-scaner, I want to save it as the format ADF,so Google Tango could use it
The short answer is... you probably can't.
There is no information on the ADF format but in any case it uses more than the 3D points from the depth camera. If you watch the Google IO videos it shows how it uses the angular camera to obtain some image features and recognize the environment. I guess using only 3D data would be too expensive and could not use information from distant points.
Is there a way to accomplish something similar to what the iTunes and App Store Apps do when you redeem a Gift Card using the device camera, recognizing a short string of characters in real time on top of the live camera feed?
I know that in iOS 7 there is now the AVMetadataMachineReadableCodeObject class which, AFAIK, only represents barcodes. I'm more interested in detecting and reading the contents of a short string. Is this possible using publicly available API methods, or some other third party SDK that you might know of?
There is also a video of the process in action:
https://www.youtube.com/watch?v=c7swRRLlYEo
Best,
I'm working on a project that does something similar to the Apple app store redeem with camera as you mentioned.
A great starting place on processing live video is a project I found on GitHub. This is using the AVFoundation framework and you implement the AVCaptureVideoDataOutputSampleBufferDelegate methods.
Once you have the image stream (video), you can use OpenCV to process the video. You need to determine the area in the image you want to OCR before you run it through Tesseract. You have to play with the filtering, but the broad steps you take with OpenCV are:
Convert the images to B&W using cv::cvtColor(inputMat, outputMat, CV_RGBA2GRAY);
Threshold the images to eliminate unnecessary elements. You specify the threshold value to eliminate, and then set everything else to black (or white).
Determine the lines that form the boundary of the box (or whatever you are processing). You can either create a "bounding box" if you have eliminated everything but the desired area, or use the HoughLines algorithm (or the probabilistic version, HoughLinesP). Using this, you can determine line intersection to find corners, and use the corners to warp the desired area to straighten it into a proper rectangle (if this step is necessary in your application) prior to OCR.
Process the portion of the image with Tesseract OCR library to get the resulting text. It is possible to create training files for letters in OpenCV so you can read the text without Tesseract. This could be faster but also could be a lot more work. In the App Store case, they are doing something similar to display the text that was read overlaid on top of the original image. This adds to the cool factor, so it just depends on what you need.
Some other hints:
I used the book "Instant OpenCV" to get started quickly with this. It was pretty helpful.
Download OpenCV for iOS from OpenCV.org/downloads.html
I have found adaptive thresholding to be very useful, you can read all about it by searching for "OpenCV adaptiveThreshold". Also, if you have an image with very little in between light and dark elements, you can use Otsu's Binarization. This automatically determines the threshold values based on the histogram of the grayscale image.
This Q&A thread seems to consistently be one of the top search hits for the topic of OCR on iOS, but is fairly out of date, so I thought I'd post some additional resources that might be useful that I've found as of the time of writing this post:
Vision Framework
https://developer.apple.com/documentation/vision
As of iOS 11, you can now use the included CoreML-based Vision framework for things like rectangle or text detection. I've found that I no longer need to use OpenCV with these capabilities included in the OS. However, note that text detection is not the same as text recognition or OCR so you will still need another library like Tesseract (or possibly your own CoreML model) to translate the detected parts of the image into actual text.
SwiftOCR
https://github.com/garnele007/SwiftOCR
If you're just interested in recognizing alphanumeric codes, this OCR library claims significant speed, memory consumption, and accuracy improvements over Tesseract (I have not tried it myself).
ML Kit
https://firebase.google.com/products/ml-kit/
Google has released ML Kit as part of its Firebase suite of developer tools, in beta at the time of writing this post. Similar to Apple's CoreML, it is a machine learning framework that can use your own trained models, but also has pre-trained models for common image processing tasks like Vision Framework. Unlike Vision Framework, this also includes a model for on-device text recognition of Latin characters. Currently, use of this library is free for on-device functionality, with charges for using cloud/SAAS API offerings from Google. I have opted to use this in my project, as the speed and accuracy of recognition seems quite good, and I also will be creating an Android app with the same functionality, so having a single cross platform solution is ideal for me.
ABBYY Real-Time Recognition SDK
https://rtrsdk.com/
This commercial SDK for iOS and Android is free to download for evaluation and limited commercial use (up to 5000 units as of time of writing this post). Further commercial use requires an Extended License. I did not evaluate this offering due to its opaque pricing.
'Real time' is just a set of images. You don't even need to think about processing all of them, just enough to broadly represent the motion of the device (or the change in the camera position). There is nothing built into the iOS SDK to do what you want, but you can use a 3rd party OCR library (like Tesseract) to process the images you grab from the camera.
I would look into Tesseract. It's an open source OCR library that takes image data and processes it. You can add different regular expressions and only look for specific characters as well. It isn't perfect, but from my experience it works pretty well. Also it can be installed as a CocoaPod if you're into that sort of thing.
If you wanted to capture that in real time you might be able to use GPUImage to catch images in the live feed and do processing on the incoming images to speed up Tesseract by using different filters or reducing the size or quality of the incoming images.
There's a project similar to that on github: https://github.com/Devxhkl/RealtimeOCR
I'm planning on doing my Final Year Project of my degree on Augmented Reality. It will be using markers and there will also be interaction between virtual objects. (sort of a simulation).
Do you recommend using libraries like ARToolkit, NyARToolkit, osgART for such project since they come with all the functions for tracking, detection, calibration etc? Will there be much work left from the programmers point of view?
What do you think if I use OpenCV and do the marker detection, recognition, calibration and other steps from scratch? Will that be too hard to handle?
I don't know how familiar you are with image or video processing, but writing a tracker from scratch will be very time-consuming if want it to return reliable results. The effort also depends on which kind of markers you plan to use. Artoolkit e.g. compares the marker's content detected from the video stream to images you earlier defined as markers. Hence it tries to match images and returns a value of probability that a certain part of the video stream is a predefined marker. Depending on the threshold you are going to use and the lighting situation, markers are not always recognized correctly. Then there are other markers like datamatrix, qrcode, framemarkers (used by QCAR) that encode an id optically. So there is no image matching required, all necessary data can be retrieved from the video stream. Then there are more complex approaches like natural feature tracking, where you can use predefined images, given that they offer enough contrast and points of interest so they can be recognized later by the tracker.
So if you are more interested in the actual application or interaction than in understanding how trackers work, you should base your work on an existing library.
I suggest you to use OpenCV, you will find high quality algorithms and it is fast. They are continuously developing new methods so soon it will be possible to run it real-time in mobiles.
You can start with this tutorial here.
Mastering OpenCV with Practical Computer Vision Projects
I did the exact same thing and found Chapter 2 of this book immensely helpful. They provide source code for the marker tracking project and I've written a framemarker generator tool. There is still quite a lot to figure out in terms of OpenGL, camera calibration, projection matrices, markers and extending it, but it is a great foundation for the marker tracking portion.