My question is about object detection and get a particular point from it.
For example, if I want to implement this, how would you tackle this?
You put a white paper in front of your camera and let it detect
it. As soon as that paper is detected, some action (like any action such as a node is produced somewhere in the screen) is triggered.
One solution that came through my head is to do some hit test, thinking it would give me a node position of a first detected by hit test. But it seems like it is not working as desired. I want to see a new node being created up above that paper, based on the position of the detected real object.
Is marker based detection is better for achieving this program, or is there any better solution to this?
Related
I am currently playing a bit with ARKit. My goal is to detect a shelf and draw stuff onto it.
I did already find the ARReferenceImage and that basically works for a very, very simple prototype, but the image needs to be quite complex it seems? Xcode always complains if I try to use something a lot simpler (like a QR-Code like image). With that marker I would know the position of an edge and then I'd know the physical size of my shelf and know how to place stuff into it. So that would be ok, but I think small and simple markers will not work, right?
But ideally I would not need a marker at all.
I know that I can detect e.g. planes, but I want to detect the shelf itself. But as my shelf is open, it's not really a plane. Are there other possibilities to find an object using ARKit?
I know that my question is very vague, but maybe somebody could point me in the right direction. Or tell me if that's even possible with ARKit or if I need other tools? Like Unity?
There are several different possibilities for positioning content in augmented reality. They are called content anchors, and they are all subclasses of the ARAnchor class.
Image anchor
Using an image anchor, you would stick your reference image on a pre-determined spot on the shelf and position your 3D content relative to it.
the image needs to be quite complex it seems? Xcode always complains if I try to use something a lot simpler (like a QR-Code like image)
That's correct. The image needs to have enough visual detail for ARKit to track it. Something like a simple black and white checkerboard pattern doesn't work very well. A complex image does.
Object anchor
Using object anchors, you scan the shape of a 3D object ahead of time and bundle this data file with your app. When a user uses the app, ARKit will try to recognise this object and if it does, you can position your 3D content relative to it. Apple has some sample code for this if you want to try it out quickly.
Manually creating an anchor
Another option would be to enable ARKit plane detection, and have the user tap a point on the horizontal shelf. Then you perform a raycast to get the 3D coordinate of this point.
You can create an ARAnchor object using this coordinate, and add it to the ARSession.
Then you can again position your content relative to the anchor.
You could also implement a drag gesture to let the user fine-tune the position along the shelf's plane.
Conclusion
Which one of these placement options is best for you depends on the use case of your app. I hope this answer was useful :)
References
There are a lot of informative WWDC videos about ARKit. You could start off by watching this one: https://developer.apple.com/videos/play/wwdc2018/610
It is absolutely possible. If you do this in swift or Unity depends entirely on what you are comfortable working in.
Arkit calls them https://developer.apple.com/documentation/arkit/arobjectanchor. In other implementations they are often called mesh or model targets.
This Youtube video shows what you want to do in swift.
But objects like a shelf might be hard to recognize since their content often changes.
This is a very common question, and I am not asking for technical details. What I am looking for is an approach or best practice guidance for the following situation:
Imagine a jump and run game entirely made with SceneKit. The game is played by controlling a character running to the right or left side, climbing up walls and jump over obstacles. (it is a 2½-D style game where the character always runs in one direction or it's opposite. Mainly on the x-Axis)
At certain "locations" (on the scene x-axis) I need to implement some specific actions, that should occur as soon as the main character walks, jumps or runs over that specific point.
What I have done so far, is adding simple invisible SCNPlanes with static physics bodies, with the bitmask configured to detect only the contact (no collision). In the physics handler (physics delegate) I can now fetch contact of the objects the character trespasses. Currently I have a large switch-statement to fetch the names of the "Action-Walls" as I call the planes through which the character walks. So, I can trigger whatever specific action that I want to happen at that location.
It is working this way, even quite good, but I wonder if there is a way to add some better "action points" using i.e. Gameplaykit, but I only find information about agents and behaviors or pathfinding.
What I am looking for is some static point, when trespassing it, an associated action happens.
I have no clue about the possibilities of Gameplaykit. Can anyone tell me an approach using Gameplaykit or whatever Apple has in its magic box of useful tools to make this better than using action walls of SCNPlanes and the physics handler?
In ARKit for iOS. If you display a virtual item then it always comes before any real item. This means if I stand in front of the virtual item then I would still see the virtual item. How can I fix this scenario?
The bottle should be visible but it is cutting off.
You cannot achieve this with ARkit only. It offers no off the shelve solution for solving occlusion, which is a hard problem.
Ideally you'd know the depth of each pixel projected on the camera, and would use that to determine those that are in front and those that are behind. I would not try something with the feature points ARKit is exposing since 1) their position is innacurate 2) there's no way to know between two frames which feature point of frame A is which feature point in frame B. It's way to noisy data to do anything good.
You might be able to achieve something with third party options that'd process the captured image and understand depth or different depth levels in the scene, but I don't know any good solution. There's some SLAM technique that yields dense depth map like DTAM (https://www.kudan.eu/kudan-news/different-types-visual-slam-systems/) but that'd be redoing most of what arkit is doing. There might be other approaches that I'm not aware off. Apps like snap do this in their own way so it is possible!
So basically your question is to mapping the coordinate of the virtual item on real world coordinate system, in short, you want to see the virtual item blocked by the real item, and you can only see the virtual item once you pass the real item.
If so, you need to know the physical relations of each object in this environment, and then you need to know exactly where you are to decide if the virtual item is blocked.
It's not an intuitive way to fix this, however, it's the only way I can think of.
Cheers.
What you are trying to achieve is not easy.
You need to detect the parts of the real world that "should be visible" using some kind of image processing. Or maybe the ARKit feature points that have the depth information, then based on this you have to add "an invisible virtual object" that cuts the drawing of things behind it. This will represent your "real object" inside the "virtual world" so that the background (camera feed) remains visible in places where this invisible virtual object is present.
I want to build an iPad app that detect an alphabet physical shape placed on the iPad screen and print the alphabet to the screen after processing the object detection. Is this doable?
I am trying to find a way to implement this, but could not find any article or online resource that guide me to that.
Thanks,
I would imagine you could start by looking at the various pens and stylus's that are available for iPads. Look at how they work. Then you would need to see if you cna make an object that will activate the touch mechanism over a defined area in the same way, for example - a line, and see if you can detech the touch points along the line. Sorting all that out will effectively get you started.
I wanna achieve something that looks like the wizard's ability in the game Trine.
I want to create a game where the player uses the mouse to create certain objects, so i will need to compare the shape the player drew to a predefined shape of my own and check if its close.
I have no idea how to achieve this and where to look for, I assume it has something to do with shape recognition like in image processing and computer vision but it should be much simpler and work in real time.
does anyone have a clue how this can be done or where can i look for something like that?
Is this what you're going for? http://www.youtube.com/watch?v=7Zh79q_xvZw
I would start by researching gesture recognition. I think that's the phrase you need to get good info. http://en.wikipedia.org/wiki/Gesture_recognition
Also, sketch recognition: http://en.wikipedia.org/wiki/Sketch_recognition
Have a look at this question. What you are looking for in particular is on-line handwriting recognition, meaning that you follow every move of the user from beginning to end.
Now, you might want to simplify it a whole lot, so one way is defining 9 areas, like a 3x3 grid. Then convert the user's movement into a list of how the user moved through these grids (use thresholds to make sure it was in that area for a while). Now you will have an array like this: 1-1, 1-2, 2-2, 2-3 (meaning the user went from upper-left corner, the upper-middle, etc.)
This information is now fairly easy to match to a set of gestures. If it performs poorly, you can either make it more difficult and introduce a Hidden Markov Model, that will allow some mistakes in the gesture (but still matching the most likely one you have in your gesture set), or you could simply display the grid to the user, so that the user will learn the gestures like number codes.