I am developing a game in which i am using a character model. Using Kinect i am now able to apply all the motions to the model as per player in front of Kinect camera.
Now i want to apply only hand motions to the model and remaining body has to be fixed as it is.
Can any one help me to achieve this...
It sounds to me like you want a fixed Skeleton that you will updated the joint info based on the Skeleton input and use this to update your UI - as opposed to drawing the input skeleton to the UI.
There is an issue with creating your own Skeleton objects as the Microsoft Kinect SDK developers made the JointType of the Joint class internal - this means that the JointType can not be updated and therefore the joint positions cannot be updated.
So to get around this you need:
Have the joint positions saved in your code that describe a stationary user
Create a new Skeleton object
Copy the incoming Skeleton object from the sensor to your newly instantiated Skeleton.
Now update this Skeleton with the stationary points except for the joints that you require the positions from the sensor
The key point here is step 3. This allows you to create your own Skeleton objects with your custom position data, which is what you will need to do to solve your problem.
Related
Im developing a prototype which should be able to scan a real-world object and create a 3D representation of it. It then needs to get spatial data about said object (like the width, height and depth of the object).
Currently I've been able to create a 3D scan of the object using the following guide: https://medium.com/zeitraumgruppe/what-arkit-3-5-and-the-new-ipad-pro-bring-to-the-table-d4bf25e5dd87
I've then been able to view that 3D model using SceneKit, and export it to a computer, however I stille need to figure out how to get the spacial dimensions of the object on the phone.
I'm thinking I also need to "crop" away some of the unwanted data captured by the LiDAR sensor (like the floor of the object). Another solution could be to only "scan" what is inside a predefined bounding box, as Apple is doing with the "Scanning and detecting 3D Objects", as shown here: https://developer.apple.com/documentation/arkit/content_anchors/scanning_and_detecting_3d_objects
I am, however, at a loss, as too where I should look for more precise info.
Any tips?
I’m interested in the issue of data processing from TrueDepth Camera. It is necessary to obtain the data of a person’s face, build a 3D model of the face and save this model in an .obj file.
Since in the 3D model needed presence of the person’s eyes and teeth, then ARKit / SceneKit is not suitable, because ARKit / SceneKit do not fill these areas with data.
But with the help of the SceneKit.ModelIO library, I managed to export ARSCNView.scene (type SCNScene) in the .obj format.
I tried to take this project as a basis:
https://developer.apple.com/documentation/avfoundation/cameras_and_media_capture/streaming_depth_data_from_the_truedepth_camera
In this project, working with TrueDepth Camera is done using Metal, but if I'm not mistaken, MTKView, rendered using Metal, is not a 3D model and cannot be exported as .obj.
Please tell me if there is a way to export MTKView to SCNScene or directly to .obj?
If there is no such method, then how to make a 3D model from AVDepthData?
Thanks.
It's possible to make a 3D model from AVDepthData, but that probably isn't what you want. One depth buffer is just that — a 2D array of pixel distance-from-camera values. So the only "model" you're getting from that isn't very 3D; it's just a height map. That means you can't look at it from the side and see contours that you couldn't have seen from the front. (The "Using Depth Data" sample code attached to the WWDC 2017 talk on depth photography shows an example of this.)
If you want more of a truly-3D "model", akin to what ARKit offers, you need to be doing the work that ARKit does — using multiple color and depth frames over time, along with a machine learning system trained to understand human faces (and hardware optimized for running that system quickly). You might not find doing that yourself to be a viable option...
It is possible to get an exportable model out of ARKit using Model I/O. The outline of the code you'd need goes something like this:
Get ARFaceGeometry from a face tracking session.
Create MDLMeshBuffers from the face geometry's vertices, textureCoordinates, and triangleIndices arrays. (Apple notes the texture coordinate and triangle index arrays never change, so you only need to create those once — vertices you have to update every time you get a new frame.)
Create a MDLSubmesh from the index buffer, and a MDLMesh from the submesh plus vertex and texture coordinate buffers. (Optionally, use MDLMesh functions to generate a vertex normals buffer after creating the mesh.)
Create an empty MDLAsset and add the mesh to it.
Export the MDLAsset to a URL (providing a URL with the .obj file extension so that it infers the format you want to export).
That sequence doesn't require SceneKit (or Metal, or any ability to display the mesh) at all, which might prove useful depending on your need. If you do want to involve SceneKit and Metal you can probably skip a few steps:
Create ARSCNFaceGeometry on your Metal device and pass it an ARFaceGeometry from a face tracking session.
Use MDLMesh(scnGeometry:) to get a Model I/O representation of that geometry, then follow steps 4-5 above to export it to an .obj file.
Any way you slice it, though... if it's a strong requirement to model eyes and teeth, none of the Apple-provided options will help you because none of them do that. So, some food for thought:
Consider whether that's a strong requirement?
Replicate all of Apple's work to do your own face-model inference from color + depth image sequences?
Cheat on eye modeling using spheres centered according to the leftEyeTransform/rightEyeTransform reported by ARKit?
Cheat on teeth modeling using a pre-made model of teeth, composed with the ARKit-provided face geometry for display? (Articulate your inner-jaw model with a single open-shut joint and use ARKit's blendShapes[.jawOpen] to animate it alongside the face.)
What are the ways to identify a particular object in a room and the position of the user for indoor navigation with AR. I understand that we can use beacon and marker to identify an object or the location of user in a room.
Without using them what are the other alternatives for finding user location and identifying an object for AR experience. I am exploring on AR for indoor navigation with iOS devices(currently focusing on using ARKit). If we use core location for user positioning, the accuracy is low. In a small shop if we use core location or any map related services we will face user/product miss positioning leading to not-a-good experience for users. Any other ways/solutions to solve this?
The obvious alternative way to detect objects visually in a scene would be to use CoreML framework with ARKit. A basic app is already available on Github.
CoreML-in-ARKit
You can also obtain the worldPosition of those objects relative to a starting origin & plot the x,z coordinate system (indoor map) based on the SCNNode label position. It’s not going to be that accurate... but it’s a basic object identification and positioning system.
Edit:
One limitation of using an out-of-the-box CoreML image classifiers like
Inceptionv3.mlmodel is it only detects the dominant generic objects from a set of generic categories such as trees, animals, food, vehicles, people, and more.
You mention doing an object recognition (image classifying) inside a retail shop. This will need a custom image classifier that can for example discriminate different types of iphone models (iphone7, iphone 8 or iphone X) rather than merely determining its a smartphone.
To create your own object recogniser (image classifier) for ARkit follow this tutorial written by Hunter Ward.
https://medium.com/#hunter.ley.ward/create-your-own-object-recognizer-ml-on-ios-7f8c09b461a1
code is available on Github:
https://github.com/hanleyweng/Gesture-Recognition-101-CoreML-ARKit
Note: If you need to create a custom classifier for 100’s of items in a retail shop... Ward recommends around 60 images per class... which would total around 60 x 100 = 6000 images. To generate the Core ML model, Ward uses a Microsoft Cognitive Service called “Custom Vision”... which currently has a limit of 1000 images. So if you need to do more than 1000 images you will have to find another way to create the model.
Im looking at integrating an AR Kit into our iOS App so we can use the camera to scan a room or field of view for objects. Above is an example of what i mean, if you were to bring up the camera it would highlight the separate objects in the room and allow them to be clicked and "added" into the system.
Does anyone know if this is achievable with the current AR kits or anything else out there? It all seems to be the fact that objects that you are looking for have to be pre-defined and loaded into a database so the app can find them. Im hoping it should be able to pick out the objects realtime. It doesnt need to actually know any details on the actual object just so that can be pulled off the base scenary.
Any ideas?
OpenCV library (iOS) contains many algorithms to compare different image blobs. If you want to match some simple template to find objects then try Viola & Jones algorithm and so called Haar cascades. OpenCV has trained collection of templates in XML files for detecting faces for example. OpenCV contains utility for training thus you are able to generate cascades for other kinds of objects.
Some example projects:
https://github.com/alexmac/alcexamples/blob/master/OpenCV-2.4.2/doc/user_guide/ug_traincascade.rst Cascade Classifier Training
https://github.com/lukagabric/iOS-OpenCV Example code for detecting Colors and Circle shapes
https://github.com/BloodAxe/OpenCV-Tutorial Feature Detection (SURF, ORB, FREAK)
https://github.com/foundry/OpenCVSquaresSL Square Detection using Pyramid scaling, Canny, contours, contour simpification
We are doing a virtual dressing room project and we could map .png image files on body joints. But we need to map 3D clothes to the body and the final out put should be a person(real person not an avatar) wearing a 3D cloth on a live video out put. But we don't know how to do this. Any help is so much appreciated.
Thanks in advance.
Answer for user3588017, its too lengthy but I'll try to explain how to get your project done. First play with XNA with their basic tutorials if you are totally new to 3D gaming. In my project I only focused on bending hands down. For this in you need to create a 3D model in blender and export it to .xna. I used this link as a start. the very beginning Actually this is the most hardest part after this you know how to animate your model using maths. Then you need to map kinect data as (x,y,z) to the parts of your model.
Ex:- map the model rotation to kinect's model rotation. For this I used simple calculations like measuring the depth of the shoulders and calculating the rotated angle and applied it to my cloth model.