How to align SCNScene to a physical table using ARKit? - ios

I'm trying to find the best strategy to align a SCNScene to a physical table. Just like the ARKit app WWWFreeRivers.
Currently I'm just testing out to map a simple plane model, with the same dimensions as the table. If I draw out the plane that ARKit detects, I can see that the plane is not very accurate with the edges. They always go outside of the edges (image below).
So I can't really rely on that plane, to just place the model in the center of this. The model is not rotated correctly either (image below).
I had another idea to use the ARReferenceImage technique, to take a picture of the table top texture, and let ARKit find and match this "image" of the table. But even with wood grain texture, it wasn't enough data for ARKit to recognize it. And ARKit just fails if you have these errors. It doesn't even try to do a bad match.
How can I go about doing this?
Ideas I've had so far:
Take image of table and use ARImageReference feature to match it. This didn't work. Maybe if I add some more interesting feature points to the table, like some sort of QR codes in the corners.
Detect the plane, and then tap the four corners on the table to map out a square, and use this.
Do as the WWW app, just place the object randomly on the plane, and then let the user scale, move and rotate the model to give it correct placement.
Any more ideas? What do you think will be the best approach to this?

Two options I can think of you could use.
You could create an ARWorldMap (iOS12+ only) and use it instead of the ARImageReference, walk around the area while creating a map that subsequent ARKit Sessions will remember. You can experiment slightly as to how to fit your models within the four corners of the table (this is slightly tedious w/o much help from the SceneView editor). However, when you load the saved ARWorldMap and localized against it (just like the ARImageReference), your model should fit within the four corners of the table every time.
If you use something like Unity (and its ARKit plugin), it has much more powerful Editor tools (3D viewer/designer). There are some tools that can help you save the map just like ARWorldMap but then bring in details of the map into the editor so you can line things up right really easily. Placenote's Spatial Capture toolkit can help here. Placenote (iOS11+) creates its own "World Map" but it exposes the visual details in the Unity editor, making it easier to line things up and then localize against (Example). The map is also stored on a managed cloud from the get-go to make sharing across phones much easier.
P.S: Both these options require you to keep the environment generally static (not large lighting changes etc.), though this was a similar constraint to when using ARIMageReference.

Related

Detecting a real world object using ARKit with iOS

I am currently playing a bit with ARKit. My goal is to detect a shelf and draw stuff onto it.
I did already find the ARReferenceImage and that basically works for a very, very simple prototype, but the image needs to be quite complex it seems? Xcode always complains if I try to use something a lot simpler (like a QR-Code like image). With that marker I would know the position of an edge and then I'd know the physical size of my shelf and know how to place stuff into it. So that would be ok, but I think small and simple markers will not work, right?
But ideally I would not need a marker at all.
I know that I can detect e.g. planes, but I want to detect the shelf itself. But as my shelf is open, it's not really a plane. Are there other possibilities to find an object using ARKit?
I know that my question is very vague, but maybe somebody could point me in the right direction. Or tell me if that's even possible with ARKit or if I need other tools? Like Unity?
There are several different possibilities for positioning content in augmented reality. They are called content anchors, and they are all subclasses of the ARAnchor class.
Image anchor
Using an image anchor, you would stick your reference image on a pre-determined spot on the shelf and position your 3D content relative to it.
the image needs to be quite complex it seems? Xcode always complains if I try to use something a lot simpler (like a QR-Code like image)
That's correct. The image needs to have enough visual detail for ARKit to track it. Something like a simple black and white checkerboard pattern doesn't work very well. A complex image does.
Object anchor
Using object anchors, you scan the shape of a 3D object ahead of time and bundle this data file with your app. When a user uses the app, ARKit will try to recognise this object and if it does, you can position your 3D content relative to it. Apple has some sample code for this if you want to try it out quickly.
Manually creating an anchor
Another option would be to enable ARKit plane detection, and have the user tap a point on the horizontal shelf. Then you perform a raycast to get the 3D coordinate of this point.
You can create an ARAnchor object using this coordinate, and add it to the ARSession.
Then you can again position your content relative to the anchor.
You could also implement a drag gesture to let the user fine-tune the position along the shelf's plane.
Conclusion
Which one of these placement options is best for you depends on the use case of your app. I hope this answer was useful :)
References
There are a lot of informative WWDC videos about ARKit. You could start off by watching this one: https://developer.apple.com/videos/play/wwdc2018/610
It is absolutely possible. If you do this in swift or Unity depends entirely on what you are comfortable working in.
Arkit calls them https://developer.apple.com/documentation/arkit/arobjectanchor. In other implementations they are often called mesh or model targets.
This Youtube video shows what you want to do in swift.
But objects like a shelf might be hard to recognize since their content often changes.

Is there an easy way of removing unwanted features from a meshed 3D scanned part with well defined edges to prepare it for 3D printing?

I am working with a part that was taken from a *.acs point cloud map to a meshed part through Autodesk ReCap. This object had hidden features that I was unable to erase (or see) when converting from a *.acs (point cloud), to a *.rcp (ReCap file) to a *.obj (mesh), and I have not found an intuitive way to remove the "flash" from the sides without creating holes that I have not been able to fill. I tried manually removing these edges in MeshMixer and Meshlab, but feel like there must be an easier way than what I have tried so far. Image for reference
Thanks!

Placing objects below the ground in AR Quick Look on iOS

I am working on a project that will display objects below the ground using AR Quick Look. However, the AR mode seems to bring everything above the ground based on the bounding box of the objects in the scene.
I have tried using the USDZ directly and composing a simple scene in Reality Composer with the object or with a simple cube with the exact same result. AR preview mode in Reality Composer is showing the object below the ground or below an image anchor correctly. However, if I export the scene as a .reality file and open it in using AR Quick Look, it brings the object above the ground as well.
Is there a way to achieve showing an object below the detected horizontal plane or image (horizontal) using AR Quick Look?
This is still an issue a year later. I have submitted feedback to Apple. I suggest you do too. I have suggested adding a checkbox to keep Y axis persistent. My assumption is this behaves this way to prevent the object from colliding with the ground, but I don't think it's necessary. It's just a limitation right now.

Placing Virtual Object Behind the Real World Object

In ARKit for iOS. If you display a virtual item then it always comes before any real item. This means if I stand in front of the virtual item then I would still see the virtual item. How can I fix this scenario?
The bottle should be visible but it is cutting off.
You cannot achieve this with ARkit only. It offers no off the shelve solution for solving occlusion, which is a hard problem.
Ideally you'd know the depth of each pixel projected on the camera, and would use that to determine those that are in front and those that are behind. I would not try something with the feature points ARKit is exposing since 1) their position is innacurate 2) there's no way to know between two frames which feature point of frame A is which feature point in frame B. It's way to noisy data to do anything good.
You might be able to achieve something with third party options that'd process the captured image and understand depth or different depth levels in the scene, but I don't know any good solution. There's some SLAM technique that yields dense depth map like DTAM (https://www.kudan.eu/kudan-news/different-types-visual-slam-systems/) but that'd be redoing most of what arkit is doing. There might be other approaches that I'm not aware off. Apps like snap do this in their own way so it is possible!
So basically your question is to mapping the coordinate of the virtual item on real world coordinate system, in short, you want to see the virtual item blocked by the real item, and you can only see the virtual item once you pass the real item.
If so, you need to know the physical relations of each object in this environment, and then you need to know exactly where you are to decide if the virtual item is blocked.
It's not an intuitive way to fix this, however, it's the only way I can think of.
Cheers.
What you are trying to achieve is not easy.
You need to detect the parts of the real world that "should be visible" using some kind of image processing. Or maybe the ARKit feature points that have the depth information, then based on this you have to add "an invisible virtual object" that cuts the drawing of things behind it. This will represent your "real object" inside the "virtual world" so that the background (camera feed) remains visible in places where this invisible virtual object is present.

How to detect text in a photo

I am researching into the best way to detect test in a photo using open source libraries.
I think the standard way is as follows (note: steps 1 - 4 all use OpenCV):
1) detect outline of document
2) transform document so it's flat and cropped, using said outline
3) Make the background of document white, using a filter
4) Feed resulting image to Tesseract
Is this the optimum process, or is there a better way, or better tools?
Also, what happens for case if the photo doesn't have a document outline (It's possible that step 1 & 2 are redundant)?
Is there anyway to automatically detect document orientation (i.e. portrait / landscape)?
I think your process is fine. I've used a similar process for an Android project.
I think that the only way you can discover if a document is portrait/landscape is to reason with the length of the sides of the bounding box of your outline.
I don't think there's an automatic way to do this, maybe you can find the most external contour approximable with a 4 segment polyline (all doable in opencv). In order to get this you'll have to work with contour hierarchy and contous approximation (see cv2.approxPolyDP).
This is how I would go for automatic outline detection. As I said, the rest of your algorithm seems just fine to me.
PS. I'll leave my Android project GitHub link. I don't know if it can be useful to you, but here I specify the outline by dragging some handles, then transform the image and feed it to Tesseract, using Java and OpenCV. Yeah It's a very bad idea to do that in the main thread of an Android app and yeah, the app is not finished. I just wanted to experiment with OCR, so I didn't care much of performance and usability, since this was not intended to use, but just for studying.
Look up the uniform width transform.
What this does is detect edges which have more or less the same width with respect to their opposite edge. So things like drainpipes (which can be eliminated at a later pass) but also the majority of text. Whilst conceptually it's similar to a distance transform, the published method uses rather ad hoc normal projection methods and Canny edge detection.

Resources