I currently have a WebCamera that captures video and recognizes the surface of a book and estimates the homography with OpenCV.
(I followed the tutorial at https://bitesofcode.wordpress.com/2017/09/12/augmented-reality-with-python-and-opencv-part-1/
and I rewrote the first part for Unity).
Now I'm wondering how would I go about projecting a 3D model in Unity to the raw image?
I am a total newbie in Unity.
You need to have a Camera (that obviously is directed to your object) which is rendered onto a RenderTexture.
Then add a RawImage to your UI and set the reference to your RenderTexture
Related
I’m interested in the issue of data processing from TrueDepth Camera. It is necessary to obtain the data of a person’s face, build a 3D model of the face and save this model in an .obj file.
Since in the 3D model needed presence of the person’s eyes and teeth, then ARKit / SceneKit is not suitable, because ARKit / SceneKit do not fill these areas with data.
But with the help of the SceneKit.ModelIO library, I managed to export ARSCNView.scene (type SCNScene) in the .obj format.
I tried to take this project as a basis:
https://developer.apple.com/documentation/avfoundation/cameras_and_media_capture/streaming_depth_data_from_the_truedepth_camera
In this project, working with TrueDepth Camera is done using Metal, but if I'm not mistaken, MTKView, rendered using Metal, is not a 3D model and cannot be exported as .obj.
Please tell me if there is a way to export MTKView to SCNScene or directly to .obj?
If there is no such method, then how to make a 3D model from AVDepthData?
Thanks.
It's possible to make a 3D model from AVDepthData, but that probably isn't what you want. One depth buffer is just that — a 2D array of pixel distance-from-camera values. So the only "model" you're getting from that isn't very 3D; it's just a height map. That means you can't look at it from the side and see contours that you couldn't have seen from the front. (The "Using Depth Data" sample code attached to the WWDC 2017 talk on depth photography shows an example of this.)
If you want more of a truly-3D "model", akin to what ARKit offers, you need to be doing the work that ARKit does — using multiple color and depth frames over time, along with a machine learning system trained to understand human faces (and hardware optimized for running that system quickly). You might not find doing that yourself to be a viable option...
It is possible to get an exportable model out of ARKit using Model I/O. The outline of the code you'd need goes something like this:
Get ARFaceGeometry from a face tracking session.
Create MDLMeshBuffers from the face geometry's vertices, textureCoordinates, and triangleIndices arrays. (Apple notes the texture coordinate and triangle index arrays never change, so you only need to create those once — vertices you have to update every time you get a new frame.)
Create a MDLSubmesh from the index buffer, and a MDLMesh from the submesh plus vertex and texture coordinate buffers. (Optionally, use MDLMesh functions to generate a vertex normals buffer after creating the mesh.)
Create an empty MDLAsset and add the mesh to it.
Export the MDLAsset to a URL (providing a URL with the .obj file extension so that it infers the format you want to export).
That sequence doesn't require SceneKit (or Metal, or any ability to display the mesh) at all, which might prove useful depending on your need. If you do want to involve SceneKit and Metal you can probably skip a few steps:
Create ARSCNFaceGeometry on your Metal device and pass it an ARFaceGeometry from a face tracking session.
Use MDLMesh(scnGeometry:) to get a Model I/O representation of that geometry, then follow steps 4-5 above to export it to an .obj file.
Any way you slice it, though... if it's a strong requirement to model eyes and teeth, none of the Apple-provided options will help you because none of them do that. So, some food for thought:
Consider whether that's a strong requirement?
Replicate all of Apple's work to do your own face-model inference from color + depth image sequences?
Cheat on eye modeling using spheres centered according to the leftEyeTransform/rightEyeTransform reported by ARKit?
Cheat on teeth modeling using a pre-made model of teeth, composed with the ARKit-provided face geometry for display? (Articulate your inner-jaw model with a single open-shut joint and use ARKit's blendShapes[.jawOpen] to animate it alongside the face.)
I'm trying to mod Oculus World Demo to show an video stream from a camera and not a pre-set graphic, however, I'm finding it difficult to find the proper way to render an cv::IplImage or cv::mat image type onto the Oculus screen. If anyone knows how to display an image to the oculus I would be very grateful. This is for the DK 2.
Pure OpenCV isn't really well suited to rendering to the Rift, because you would need to manually implement the distortion mechanisms that are normally provided by the Oculus Rift SDK.
The best way to render an image from OpenCV onto the screen is to load the image into an OpenGL or Direct3D texture and use the 3D rendering API (GL or D3D) to place it into a rendered scene. There is an example of this in Github repository for my book on Rift development.
In summary, it sets up the video capture using the OpenCV API and then launches a thread which is responsible for capturing images from the camera device. In the main thread, the draw call renders a simple 3D scene which includes the captured image. Most of the interesting Rift related code is in the parent class, RiftApp.
I have a set of images of a scene at different angles and the camera intrinsic parameters. I was able to generate the 3D points using point correspondences and triangulation. Is there a built-in method or a way to produce 3D images in MATLAB? From the given set of images, a 3D image as such? By 3D image, I mean a 3D view of the scene based on the colors, depth, etc.?
There was a recent MathWorks blog post on 3D surface rendering, which showed methods using built-in functions and using contributed code.
The built-in method uses isosurface and patch. To display multiple surfaces, the trick is to set the 'FaceAlpha' patch property to get transparency.
The post also highlighted the "vol_3d v2" File Exchange submission, which provides the vol3d function to render the surface. It allows explicit definition of voxel colors and alpha values.
Some file exchange from mathworks:
3D CT/MRI images interactive sliding viewer, 3D imshow, image3, and viewer3D.
If your images matrix I has the dimension of x*y*z, you can try surface as well:
[X,Y] = meshgrid(1:size(I,2), 1:size(I,1));
Z = ones(size(I,1),size(I,2));
for z=1:size(I,3)
k=z*sliceInterval;
surface('XData',X, 'YData',Y, 'ZData',Z*k,'CData',I(:,:,z), 'CDataMapping','direct', 'EdgeColor','none', 'FaceColor','texturemap')
end
The Computer Vision System Toolbox for MATLAB includes a function called estimateUncalibratedRectification, which you can use to rectify a pair of stereo images. Check out this example of how to create a 3-D image, which can be viewed with a pair of red-cyan stereo glasses.
We are doing a virtual dressing room project and we could map .png image files on body joints. But we need to map 3D clothes to the body and the final out put should be a person(real person not an avatar) wearing a 3D cloth on a live video out put. But we don't know how to do this. Any help is so much appreciated.
Thanks in advance.
Answer for user3588017, its too lengthy but I'll try to explain how to get your project done. First play with XNA with their basic tutorials if you are totally new to 3D gaming. In my project I only focused on bending hands down. For this in you need to create a 3D model in blender and export it to .xna. I used this link as a start. the very beginning Actually this is the most hardest part after this you know how to animate your model using maths. Then you need to map kinect data as (x,y,z) to the parts of your model.
Ex:- map the model rotation to kinect's model rotation. For this I used simple calculations like measuring the depth of the shoulders and calculating the rotated angle and applied it to my cloth model.
How do Google Maps do their panoramas in Street View?
Yeah, I know its Flash, but how do they skew bitmaps with Correct Texture Mapping?
Are they doing it on the pixel-level like most Flash 3D engines?, or just applying some tricky transformation to the Bitmaps in the Movieclips?
Flash Panorama Player can help achieve a similar result!
It uses 6 equirectangular images (cube faces) stitched together seamlessly with some 'magic' ActionScript.
Also see these parts of flashpanos.com for plugins, and tutorials with (possibly) documentation.
A quick guide to shooting panoramas so you can view them with FPP (Flash Panorama Player).
Cubic projection cube faces are actually 90x90 degrees rectilinear
images like the ones you get from a normal camera lens. ~ What is VR Photography?
Check out http://www.panoguide.com/. They have howtos, links to software etc.
Basically there are 2 components in the process: the stitching software which creates a single panoramic photo from many separate image sources, then there is the panoramic viewer, which distorts the image as you change your POV to simulate what your eyes would see if you were actually there.
My company uses the Papervision3D flash render engine, and maps a panoramic image (still image or video) onto a 3D sphere. We found that using a spherical object with about 25 divisions along both the axes gives a much better visual result than mapping the same image on the six faces of a cube. Check it for yourself at http://www.panocast.com.
Actually, you could of course distort your image in advance, so that when it is mapped on the faces of a cube, its perspective is just right, but this requires the complete rerendering of your imagery.
With some additional "magic", we can also load still images incrementally, as needed, depending on where the user is looking and at what zoom level (not unlike Google Street View does).
In terms of what Google actually does, Bork had this right. I'm not sure of the exact details (and not sure I could release the details even if I did), but Google stores individual 360 degree streetview scenes in an equirectangular representation for serving. The flash player then uses a series of affine transformations to display the image in perspective. The affine transformations are approximate, but good enough to aggregate to a decent image overall.
The calculation of the served images is very involved, since there are many stages of image processing that have to be done, to remove faces, account for bloom, etc. etc. In terms of actually stitching the panoramas, there are many algorithms for this (wikipedia article). Just one interesting thing I'd like to point out though, as food for thought, in the 360 degree panoramas on street view, you can see the road at the bottom of the image, where there was no camera on the cars. Now that's stitching.
An expensive camera. makes
A 360 degree video
It is pretty impressive to watch a video that allows panning in every direction... which is what street view is without the bandwidth to support the full video.
For those wondering how the Google VR Photographers and editers add the ground to their Equirectangular panoramas, check out the feature called Viewpoint Correction, as seen in software like PTGui:
ptgui.com/excamples/vptutorial.html
(Note that this is NOT the software used by Google)
If you take a closer look at the ground in street view, you see that the stitching seems streched, and sometimes it even overlaps with information from the viewpoint next to the current one. (With that I mean that you can see something in one place, and suddenly that same feature is shown as the ground in the next place, revealing the technique used for the ground stitching).