In ARKit for iOS. If you display a virtual item then it always comes before any real item. This means if I stand in front of the virtual item then I would still see the virtual item. How can I fix this scenario?
The bottle should be visible but it is cutting off.
You cannot achieve this with ARkit only. It offers no off the shelve solution for solving occlusion, which is a hard problem.
Ideally you'd know the depth of each pixel projected on the camera, and would use that to determine those that are in front and those that are behind. I would not try something with the feature points ARKit is exposing since 1) their position is innacurate 2) there's no way to know between two frames which feature point of frame A is which feature point in frame B. It's way to noisy data to do anything good.
You might be able to achieve something with third party options that'd process the captured image and understand depth or different depth levels in the scene, but I don't know any good solution. There's some SLAM technique that yields dense depth map like DTAM (https://www.kudan.eu/kudan-news/different-types-visual-slam-systems/) but that'd be redoing most of what arkit is doing. There might be other approaches that I'm not aware off. Apps like snap do this in their own way so it is possible!
So basically your question is to mapping the coordinate of the virtual item on real world coordinate system, in short, you want to see the virtual item blocked by the real item, and you can only see the virtual item once you pass the real item.
If so, you need to know the physical relations of each object in this environment, and then you need to know exactly where you are to decide if the virtual item is blocked.
It's not an intuitive way to fix this, however, it's the only way I can think of.
Cheers.
What you are trying to achieve is not easy.
You need to detect the parts of the real world that "should be visible" using some kind of image processing. Or maybe the ARKit feature points that have the depth information, then based on this you have to add "an invisible virtual object" that cuts the drawing of things behind it. This will represent your "real object" inside the "virtual world" so that the background (camera feed) remains visible in places where this invisible virtual object is present.
Related
I am currently playing a bit with ARKit. My goal is to detect a shelf and draw stuff onto it.
I did already find the ARReferenceImage and that basically works for a very, very simple prototype, but the image needs to be quite complex it seems? Xcode always complains if I try to use something a lot simpler (like a QR-Code like image). With that marker I would know the position of an edge and then I'd know the physical size of my shelf and know how to place stuff into it. So that would be ok, but I think small and simple markers will not work, right?
But ideally I would not need a marker at all.
I know that I can detect e.g. planes, but I want to detect the shelf itself. But as my shelf is open, it's not really a plane. Are there other possibilities to find an object using ARKit?
I know that my question is very vague, but maybe somebody could point me in the right direction. Or tell me if that's even possible with ARKit or if I need other tools? Like Unity?
There are several different possibilities for positioning content in augmented reality. They are called content anchors, and they are all subclasses of the ARAnchor class.
Image anchor
Using an image anchor, you would stick your reference image on a pre-determined spot on the shelf and position your 3D content relative to it.
the image needs to be quite complex it seems? Xcode always complains if I try to use something a lot simpler (like a QR-Code like image)
That's correct. The image needs to have enough visual detail for ARKit to track it. Something like a simple black and white checkerboard pattern doesn't work very well. A complex image does.
Object anchor
Using object anchors, you scan the shape of a 3D object ahead of time and bundle this data file with your app. When a user uses the app, ARKit will try to recognise this object and if it does, you can position your 3D content relative to it. Apple has some sample code for this if you want to try it out quickly.
Manually creating an anchor
Another option would be to enable ARKit plane detection, and have the user tap a point on the horizontal shelf. Then you perform a raycast to get the 3D coordinate of this point.
You can create an ARAnchor object using this coordinate, and add it to the ARSession.
Then you can again position your content relative to the anchor.
You could also implement a drag gesture to let the user fine-tune the position along the shelf's plane.
Conclusion
Which one of these placement options is best for you depends on the use case of your app. I hope this answer was useful :)
References
There are a lot of informative WWDC videos about ARKit. You could start off by watching this one: https://developer.apple.com/videos/play/wwdc2018/610
It is absolutely possible. If you do this in swift or Unity depends entirely on what you are comfortable working in.
Arkit calls them https://developer.apple.com/documentation/arkit/arobjectanchor. In other implementations they are often called mesh or model targets.
This Youtube video shows what you want to do in swift.
But objects like a shelf might be hard to recognize since their content often changes.
I'm trying to find the best strategy to align a SCNScene to a physical table. Just like the ARKit app WWWFreeRivers.
Currently I'm just testing out to map a simple plane model, with the same dimensions as the table. If I draw out the plane that ARKit detects, I can see that the plane is not very accurate with the edges. They always go outside of the edges (image below).
So I can't really rely on that plane, to just place the model in the center of this. The model is not rotated correctly either (image below).
I had another idea to use the ARReferenceImage technique, to take a picture of the table top texture, and let ARKit find and match this "image" of the table. But even with wood grain texture, it wasn't enough data for ARKit to recognize it. And ARKit just fails if you have these errors. It doesn't even try to do a bad match.
How can I go about doing this?
Ideas I've had so far:
Take image of table and use ARImageReference feature to match it. This didn't work. Maybe if I add some more interesting feature points to the table, like some sort of QR codes in the corners.
Detect the plane, and then tap the four corners on the table to map out a square, and use this.
Do as the WWW app, just place the object randomly on the plane, and then let the user scale, move and rotate the model to give it correct placement.
Any more ideas? What do you think will be the best approach to this?
Two options I can think of you could use.
You could create an ARWorldMap (iOS12+ only) and use it instead of the ARImageReference, walk around the area while creating a map that subsequent ARKit Sessions will remember. You can experiment slightly as to how to fit your models within the four corners of the table (this is slightly tedious w/o much help from the SceneView editor). However, when you load the saved ARWorldMap and localized against it (just like the ARImageReference), your model should fit within the four corners of the table every time.
If you use something like Unity (and its ARKit plugin), it has much more powerful Editor tools (3D viewer/designer). There are some tools that can help you save the map just like ARWorldMap but then bring in details of the map into the editor so you can line things up right really easily. Placenote's Spatial Capture toolkit can help here. Placenote (iOS11+) creates its own "World Map" but it exposes the visual details in the Unity editor, making it easier to line things up and then localize against (Example). The map is also stored on a managed cloud from the get-go to make sharing across phones much easier.
P.S: Both these options require you to keep the environment generally static (not large lighting changes etc.), though this was a similar constraint to when using ARIMageReference.
I can use Scikit-Learn to train a model and recognize objects but I also need to be able to tell where in my test data images the object is residing. Is there someway I could maybe get the coordinates of the part of the test image which has the object I'm trying to recognize?
If not, please refer me to some other library that'll help me achieve this task.
Thankyou
I assume that you are talking about a computer vision application. Usually, the way that a box is drawn around an identified object is by using a sliding window and running your classifier on each window as it steps across the screen. You can keep track of which windows come back with positive results and use those windows as your bounds. You may wish to use windows of various size, if the object scale changes from image to image. In that case, you would likely want to prefer the smaller of two overlapping windows.
Ok, I've done some reading around the subject, have an idea of how I'd tackle my problem, but want to find out of this is the most efficient way, or if I'm missing something simple.
I have a line diagram of a section of railway that I'd like to plot the users location onto (the user being someone on a train moving up/down the railway).
Now, I initially went down the route of geo-referencing, but quickly realised this probably wasn't the way to go, as my image is not a real reflection of the area + I want the line diagram to be what the user sees.
OK, my though process of how I will tackle it:
I know the physical area so I could extract the coordinates along the railway, every x meters (my line diagram has a resolution of around 5m). Stick this into an array. Can anyone suggest a tool to do this?!
Allocate my line diagram a start and end, then match the image coordinates with the physical coordinates for the entire line.
Read in the users position and update where to draw the position based on the closest match in the array?
Does this sound doable, and would it give me decent results?
If you have more sophisticated answers, please do share.
It sounds reasonable in general. As the user is supposed to be on a train a simpler option may work where you just keep track of the physical distance moved and use that as a percentage distance along the line. This is a lot simpler to manage and could be backed up with some coordinate checkpoints to ensure you don't have a drifting error. I'd aim for a simpler implementation if you can.
I have a broad but interesting OpenCV question and I'm wondering where to start.
I am looking for any strategies or white papers that might help.
I need to get the position of people sitting at a conference table from a fixed overhead view. Ideally, I will assign a persistent ID to each person, and maintain a list of people with ID and coordinates. This problem could be easy in a specific case - for example, if designed for a single conference room table - but it gets harder in the general case, especially with people entering and leaving the scene.
My first question: is it a detection or a motion tracking problem? Or some combination of the two?
Well it seems like both to me. I would think you would need to take a long average of the visible area which becomes the background. Then based on your background information you can track movement of other objects.
Assigning an ID may become difficult if objects merge together (at least as far as the camera is concerned) and then separate again, say someone removing a hat placing it down and placing it back on.
But all that in mind it is possible even if it presents a challenge. I once saw a similar project tracking people in a train station using a similar approach (it was in a lecture so I can't provide a link sorry)