Can ARCore recognise documents as well? - augmented-reality

ARCore can determine images and perform some action when the image is recognized.
But if I want to recognize a formal document, can ARCore be useful. So there are two parts of question :
1.) Can ARcore recognize a document (say Telephone bill). ?
2.) Can we add some actions on different parts of the document. Like showing 'Pay Bill' button near bill amount or show a graph near data usage ?

Today no. ARCore was created focusing in Object 3D rendering in a real world.
What you can do is use another "tool" like OpenCV to read and recognised the "bill" and with ARCore you could set a anchor close to where you read the bill with a chart contain some information and maybe a button.
But even, hit a Object 3D in the air is not trivial.
Maybe you should try to understand why this would e useful?
Why not just read the bill and open informations in the app without ARCore.
ARCore has a heavy memory use, restricts devices and in this case the anchor/chart would be connect with the place, so if you move the bill (today) the chart would not follow.
Here you can have more information about the uses for ARCore

Related

I want to build an AR tool to place and store text files in a virtual space

It's called a memory palace (Read: 'Moonwalking with Einstein') it's an ancient tool used to memorize, in my case coding concepts and Spanish and Indonesian phrases.
I'm learning python now, but I'm not really sure what direction to move in and what stack should be used to build a project like this. it wouldn't be too complex, I just want to store and save "text files" in a virtual space like my bedroom or on my favorite hikes.
If anyone has insights or suggestions it'd be much appreciated.
Probably the two most common AR frameworks, on mobile devices anyway, at the moment are ARKit for iOS devices and ARCore for Android devices.
I am sure you can find comparisons of the strengths and weaknesses of each one but it is likely your choice will be determined by the type of device you have.
In either case, it sounds like you want to have 'places' you can return to over time and see your stored content. For this you could build on some common techniques:
link the AR object to some sort of image in the real world and when this image is recognised by the AR app, launch your AR object, in your case a text file.
use 'Cloud Anchors' - these are essentially anchors for AR objects that can persist over time, when you close the app and come back to it later, and even be shared between users on different devices.
You can find more information on cloud Anchors at the link below, including information on using them with iOS and on Android:
https://developers.google.com/ar/develop/java/cloud-anchors/overview-android

ARKit with multiplayer experience to share same planes [duplicate]

What is the best way, if any, to use Apple's new ARKit with multiple users/devices?
It seems that each devices gets its own scene understanding individually. My best guess so far is to use raw features points positions and try to match them across devices to glue together the different points of views since ARKit doesn't offer any absolute referential reference.
===Edit1, Things I've tried===
1) Feature points
I've played around and with the exposed raw features points and I'm now convinced that in their current state they are a dead end:
they are not raw feature points, they only expose positions but none of the attributes typically found in tracked feature points
their instantiation doesn't carry over from frame to frame, nor are the positions exactly the same
it often happens that reported feature points change by a lot when the camera input is almost not changing, with either a lot appearing or disappearing.
So overall I think it's unreasonable to try to use them in some meaningful way, not being able to make any kind of good point matching within one device, let alone several.
Alternative would to implement my own feature point detection and matching, but that'd be more replacing ARKit than leveraging it.
2) QR code
As #Rickster suggested, I've also tried identifying an easily identifiable object like a QR code and getting the relative referential change from that fixed point (see this question) It's a bit difficult and implied me using some openCV to estimate camera pose. But more importantly very limiting
As some newer answers have added, multiuser AR is a headline feature of ARKit 2 (aka ARKit on iOS 12). The WWDC18 talk on ARKit 2 has a nice overview, and Apple has two developer sample code projects to help you get started: a basic example that just gets 2+ devices into a shared experience, and SwiftShot, a real multiplayer game built for AR.
The major points:
ARWorldMap wraps up everything ARKit knows about the local environment into a serializable object, so you can save it for later or send it to another device. In the latter case, "relocalizing" to a world map saved by another device in the same local environment gives both devices the same frame of reference (world coordinate system).
Use the networking technology of your choice to send the ARWorldMap between devices: AirDrop, cloud shares, carrier pigeon, etc all work, but Apple's Multipeer Connectivity framework is one good, easy, and secure option, so it's what Apple uses in their example projects.
All of this gives you only the basis for creating a shared experience — multiple copies on your app on multiple devices all using a world coordinate system that lines up with the same real-world environment. That's all you need to get multiple users experiencing the same static AR content, but if you want them to interact in AR, you'll need to use your favorite networking technology some more.
Apple's basic multiuser AR demo shows encoding an ARAnchor
and sending it to peers, so that one user can tap to place a 3D
model in the world and all others can see it. The SwiftShot game example builds a whole networking protocol so that all users get the same gameplay actions (like firing slingshots at each other) and synchronized physics results (like blocks falling down after being struck). Both use Multipeer Connectivity.
(BTW, the second and third points above are where you get the "2 to 6" figure from #andy's answer — there's no limit on the ARKit side, because ARKit has no idea how many people may have received the world map you saved. However, Multipeer Connectivity has an 8 peer limit. And whatever game / app / experience you build on top of this may have latency / performance scaling issues as you add more peers, but that depends on your technology and design.)
Original answer below for historical interest...
This seems to be an area of active research in the iOS developer community — I met several teams trying to figure it out at WWDC last week, and nobody had even begun to crack it yet. So I'm not sure there's a "best way" yet, if even a feasible way at all.
Feature points are positioned relative to the session, and aren't individually identified, so I'd imagine correlating them between multiple users would be tricky.
The session alignment mode gravityAndHeading might prove helpful: that fixes all the directions to a (presumed/estimated to be) absolute reference frame, but positions are still relative to where the device was when the session started. If you could find a way to relate that position to something absolute — a lat/long, or an iBeacon maybe — and do so reliably, with enough precision... Well, then you'd not only have a reference frame that could be shared by multiple users, you'd also have the main ingredients for location based AR. (You know, like a floating virtual arrow that says turn right there to get to Gate A113 at the airport, or whatever.)
Another avenue I've heard discussed is image analysis. If you could place some real markers — easily machine recognizable things like QR codes — in view of multiple users, you could maybe use some form of object recognition or tracking (a ML model, perhaps?) to precisely identify the markers' positions and orientations relative to each user, and work back from there to calculate a shared frame of reference. Dunno how feasible that might be. (But if you go that route, or similar, note that ARKit exposes a pixel buffer for each captured camera frame.)
Good luck!
Now, after releasing ARKit 2.0 at WWDC 2018, it's possible to make games for 2....6 users.
For this, you need to use ARWorldMap class. By saving world maps and using them to start new sessions, your iOS application can now add new Augmented Reality capabilities: multiuser and persistent AR experiences.
AR Multiuser experiences. Now you may create a shared frame of a reference by sending archived ARWorldMap objects to a nearby iPhone or iPad. With several devices simultaneously tracking the same world map, you may build an experience where all users (up to 6) can share and see the same virtual 3D content (use Pixar's USDZ file format for 3D in Xcode 10+ and iOS 12+).
session.getCurrentWorldMap { worldMap, error in
guard let worldMap = worldMap else {
showAlert(error)
return
}
}
let configuration = ARWorldTrackingConfiguration()
configuration.initialWorldMap = worldMap
session.run(configuration)
AR Persistent experiences. If you save a world map and then your iOS application becomes inactive, you can easily restore it in the next launch of app and in the same physical environment. You can use ARAnchors from the resumed world map to place the same virtual 3D content (in USDZ or DAE format) at the same positions from the previous saved session.
Not bulletproof answers more like workarounds but maybe you'll find these helpful.
All assume the players are in the same place.
DIY ARKit sets up it's world coordinate system quickly after the AR session has been started. So if you can have all players, one after another, put and align their devices to the same physical location and let them start the session there, there you go. Imagine the inside edges of an L square ruler fixed to whatever available. Or any flat surface with a hole: hold phone agains surface looking through the hole with camera, (re)init session.
Medium Save the player aligning phone manually, instead detect a real world marker with image analysis just like #Rickster described.
Involved Train an Core ML model to recognize iPhones and iPads and their camera location. Like it's done with human face and eyes. Aggregate data on a server, then turn off ML to save power. Note: make sure your model is cover-proof. :)
I'm in the process of updating my game controller framework (https://github.com/robreuss/VirtualGameController) to support a shared controller capability, so all devices would receive input from the control elements on the screens of all devices. The purpose of this enhancement is to support ARKit-based multiplayer functionality. I'm assuming developers will use the first approach mentioned by diviaki, where the general positioning of the virtual space is defined by starting the session on each device from a common point in physical space, a shared reference, and specifically I have in mind being on opposite sides of a table. All the devices would launch the game at the same time and utilize a common coordinate space relative to physical size, and using the inputs from all the controllers, the game would remain theoretically in sync on all devices. Still testing. The obvious potential problem is latency or disruption in the network and the sync falls apart, and it would be difficult to recover except by restarting the game. The approach and framework may work for some types of games fairly well - for example, straightforward arcade-style games, but certainly not for many others - for example, any game with significant randomness that cannot be coordinated across devices.
This is a hugely difficult problem - the most prominent startup that is working on it is 6D.ai.
"Multiplayer AR" is the same problem as persistent SLAM, where you need to position yourself in a map that you may not have built yourself. It is the problem that most self driving car companies are actively working on.

AR Google Tango Project

I'd like to know how to create a target for architectural large scale AR on a real site.In other words, I need that Google superimposed my 3d model on a specific place.
I have tried Google tango Area Learning tutorials (https://developers.google.com/tango/apis/unity/unity-codelab-area-learning), but after showing the message WALK AROUND TO RELOCALIZE the tablet does nothing, although I walk around to detect the real space, then after few minutes the message Unity project has stopped appears on the Google Tango tablet screen.
Could ADF file used instead of relocalizing the environment?
I've detected some interior scenes by Tango explorer and saved them,but I'm not able to use them for environment recognition purpose
I work on Unity and Google Tango tablet.
Thank you in advance for your response.
For anyone else facing this problem - the likely cause is not having a recent ADF file already on the device.
You need to first create a Area Description file (ADF) by scanning, and then you can separately Localise to that ADF - so you cannot "use an ADF instead of relocalising."
The tutorial you link above needs you to have separately created an ADF for your location - it simply chooses the most recent one you have.
You can use the Area Learning example to create your ADFs, and try localising to them. It also shows superimposing 3D models.
Also, look at the augmented reality one to see how to have objects load already in a specific place.

What is the way to parse a string of a well known format from an image on iOS (some library created specifically for this purpose)?

Local travel cards in Saint-Petersburg, Russia have got huge id numbers that aren't easy to read and type into a web page when topping up the card online. So I want to build a small app that would take a photo of a travel card and parse the number out.
The task is a bit easier than a free form recognition:
card is of the very well known size
id numbers are of known size, are located in the very well known location on a card and they are number only, no letters (okay, there are two variations I think and maybe they will add 1-2 more in the future)
even the font is known in advance
even the first several numbers are the same for most of the card (so far there are only two prefixes used)
How would you do it? Are there any libraries tuned not for the general OCR, but for a "hinted" OCR like I need?
Best regards,
Artem.
P.S.
Actually a free/cheap web service for this task would also be good enough
Yes Google has a library called Tesseract and there is an iOS SDK on Github you can import into your application. So you can use this SDK and it has some documentation that you can read that will explain how to set it up in your app. It has methods that will return you a string with the text of the card in the string. BUT it will be ALL of the text from the card. So best thing to do would be to:
1 "clip" the original image to extract a sub image that displays only the portion of the card you wish to get the numbers from.
2 Process this sub image through Tesseract to retrieve the string you are looking for.
3 Then parse through the string and pick out the data that you need.
But just be warned, it can be a bit quirky. This SDK tends to recognize words best from images that are scanned, not "taken a picture of". Because although it is an advance piece of technology, it isn't perfect. So to get it to work as perfectly as possible for you, try to get scanned copies of the originals.
Best of luck.
The ideal solution for you would have three components:
1) Detection of the card. This is useful because if you have the detection, then the end users have much easier time actually using the scanner, because they can place the phone above the card in an arbitrary direction
2) Accurate OCR component. Ideally, customizable for this exact font you have on the card, for the exact position on the card.
3) Parsing mechanism. This would enable you to obtain the exact string written on the card without writing huge amount of OCR parsing code.
BlinkID SDK has all this. It has a preset for detection cards in the ID-1 format. It has integrated OCR engine. And it provides RegexParser, where you can define the exact format of the text which you're trying to extract from the document.
BlinkID was initially built for scanning ID documents which have very similar properties as the problem you're trying to solve.
Note. I'm one of the developers working on BlinkID.

Can augmented reality be realized in a website?

Nowadays, I wanna do some research of augmented reality technology.Especially, I would like to match a 2d image and a 3d model.And then, I will see the 3d model if scanning the 2d image. What's more, I know that there are a lot of SDKs(like metaio,and wikitude) and software can realize this in mobile app. However, what I want to do is realizing this in a website. I hope the people who use this don't need to download a particular mobile app, but just open a website and then scan a picture.
So, until now, I's like to know that,as the tile asked, can AR be realized in a website? If yes, how can I do it or is there any software like Metaio Creator to do this? If no, why?
Thank you for anyone who would like to answer my naive question.
May I recommend you our completely webbased AR & VR tool holobuilder.com by bitstars.com?
It supports 360 degree photospheres that can be enhanced with custom 3D models and then directly be embedded into your website as iframe, it has native support for stereoscopic view mode and much more.
For your use case you could have a look at the lower part of this blog post where you find information and an embedded example presentation with photosphere imagery containing 3D elements:
http://heyholo.com/google-pushes-vr-great-for-tools-like-holobuilder/
If you want to start creating I recommend the beginners guide:
https://medium.com/#maxspeicher/the-definite-guide-to-holobuilder-3b62a54d303e
The cv feature tracking you requested can not yet be realized without any apps/browser. But what you can do is realizing perspectively correct displaying 3D elements into the camera image and move with sensors. Should be as performant as within the player app.
We hope that it can somehow help you in pushing your research and we would love to read your feedback. In case of any questions please do not hesitate to ask, here or on any other contact channel!

Resources