I'm building a 360 video viewer for iOS in order to better understand the nuances of monoscopic and stereoscopic 360 video. Effectively, I'm trying to reverse engineer what the Google Cardboard SDK does.
My in-progress code is here: https://github.com/alfiehanssen/ThreeSixtyPlayer
I understand monoscopic video and the basics of stereoscopic video. I would like to better understand the client-side configuration/transformations required to display stereoscopic video.
Specifically, in two parts:
Cardboard can display monoscopic video files in stereoscopic mode. Although the same exact video is being presented to each eye, each eye's video view is clearly different. What transformations are being made to each eye to get this to work?
For stereoscopic videos (let's assume top/bottom layout), it also appears that transformations are applied to each eye's video view. What are the transformations being applied to each eye's video view?
It looks as though the video is being skewed. There are black masks applied to all sides of each eye's video, where are these coming from / are they the result of transformations?
A sample screenshot from a cardboard view:
Related
I am currently working on a service to update the content of the video bases on the markers present in the video. I was curious if we can use Vuforia to achieve the same by providing the pre-recorded video as an input to Vuforia instead of the live camera feed from the mobile phone.
TLDR; This is not possible because replacing the camera is not a function that neither Vuforia or ARKit expose.
Aside from not exposing the camera both frameworks use a combination of camera input and sensor data (gyro, accelerometer, compass, altitude, etc) to calculate the camera/phone's position (translation/rotation) relative to the marker image.
The effect you are looking for is image tracking and rendering within a video feed. You should consider OpenCV for the feature point tracking, or some computer vision library. With regard to rendering there's three options SceneKit, Metal, or OpenGL. Following Apple's lead you could use SceneKit for the rendering, similar to how ARKit handles the sensor inputs and uses SceneKit for rendering. If you are ambitious and want to control the rendering as well you could use Metal or OpenGL.
How can we access Front Facing Camera Images with ARCamera or ARSCNView and is it possible to record ARSCNView just like Camera Recording?
Regarding the front-facing camera: in short, no.
ARKit offers two basic kinds of AR experience:
World Tracking (ARWorldTrackingConfiguration), using the back-facing camera, where a user looks "through" the device at an augmented view of the world around them. (There's also AROrientationTrackingConfiguration, which is a reduced quality version of world tracking, so it still uses only the back-facing camera.)
Face Tracking (ARFaceTrackingConfiguration), supported only with the front-facing TrueDepth camera on iPhone X, where the user sees an augmented view of theirself in the front-facing camera view. (As #TawaNicolas notes, Apple has sample code here... which, until iPhone X actually becomes available, you can read but not run.)
In addition to the hardware requirement, face tracking and world tracking are mostly orthogonal feature sets. So even though there's a way to use the front facing camera (on iPhone X only), it doesn't give you an experience equivalent to what you get with the back facing camera in ARKit.
Regarding video recording in the AR experience: you can use ReplayKit in an ARKit app same as in any other app.
If you want to record just the camera feed, there isn't a high level API for that, but in theory you might have some success feeding the pixel buffers you get in each ARFrame to AVAssetWriter.
As far as I know, ARKit with Front Facing Camera is only supported for iPhone X.
Here's Apple's sample code regarding this topic.
If you want to access the UIKit or AVFoundation cameras, you still can, but separately from ARSCNView. E.g., I'm loading UIKit's UIImagePickerController from an IBAction and it is a little awkward to do so, but it works for my purposes (loading/creating image and video assets).
I'm trying to mod Oculus World Demo to show an video stream from a camera and not a pre-set graphic, however, I'm finding it difficult to find the proper way to render an cv::IplImage or cv::mat image type onto the Oculus screen. If anyone knows how to display an image to the oculus I would be very grateful. This is for the DK 2.
Pure OpenCV isn't really well suited to rendering to the Rift, because you would need to manually implement the distortion mechanisms that are normally provided by the Oculus Rift SDK.
The best way to render an image from OpenCV onto the screen is to load the image into an OpenGL or Direct3D texture and use the 3D rendering API (GL or D3D) to place it into a rendered scene. There is an example of this in Github repository for my book on Rift development.
In summary, it sets up the video capture using the OpenCV API and then launches a thread which is responsible for capturing images from the camera device. In the main thread, the draw call renders a simple 3D scene which includes the captured image. Most of the interesting Rift related code is in the parent class, RiftApp.
I saw that someone has made an app that tracks your feet using the camera, so that you can kick a virtual football on your iPhone screen.
How could you do something like this? Does anyone know of any code examples or other information about using the iPhone camera for detecting objects and tracking them?
I just gave a talk at SecondConf where I demonstrated the use of the iPhone's camera to track a colored object using OpenGL ES 2.0 shaders. The post accompanying that talk, including my slides and sample code for all demos can be found here.
The sample application I wrote, whose code can be downloaded from here, is based on an example produced by Apple for demonstrating Core Image at WWDC 2007. That example is described in Chapter 27 of the GPU Gems 3 book.
The basic idea is that you can use custom GLSL shaders to process images from the iPhone camera in realtime, determining which pixels match a target color within a given threshold. Those pixels then have their normalized X,Y coordinates embedded in their red and green color components, while all other pixels are marked as black. The color of the whole frame is then averaged to obtain the centroid of the colored object, which you can track as it moves across the view of the camera.
While this doesn't address the case of tracking a more complex object like a foot, shaders like this should be able to be written that could pick out such a moving object.
As an update to the above, in the two years since I wrote this I've now developed an open source framework that encapsulates OpenGL ES 2.0 shader processing of images and video. One of the recent additions to that is a GPUImageMotionDetector class that processes a scene and detects any kind of motion within it. It will give you back the centroid and intensity of the overall motion it detects as part of a simple callback block. Using this framework to do this should be a lot easier than rolling your own solution.
How do Google Maps do their panoramas in Street View?
Yeah, I know its Flash, but how do they skew bitmaps with Correct Texture Mapping?
Are they doing it on the pixel-level like most Flash 3D engines?, or just applying some tricky transformation to the Bitmaps in the Movieclips?
Flash Panorama Player can help achieve a similar result!
It uses 6 equirectangular images (cube faces) stitched together seamlessly with some 'magic' ActionScript.
Also see these parts of flashpanos.com for plugins, and tutorials with (possibly) documentation.
A quick guide to shooting panoramas so you can view them with FPP (Flash Panorama Player).
Cubic projection cube faces are actually 90x90 degrees rectilinear
images like the ones you get from a normal camera lens. ~ What is VR Photography?
Check out http://www.panoguide.com/. They have howtos, links to software etc.
Basically there are 2 components in the process: the stitching software which creates a single panoramic photo from many separate image sources, then there is the panoramic viewer, which distorts the image as you change your POV to simulate what your eyes would see if you were actually there.
My company uses the Papervision3D flash render engine, and maps a panoramic image (still image or video) onto a 3D sphere. We found that using a spherical object with about 25 divisions along both the axes gives a much better visual result than mapping the same image on the six faces of a cube. Check it for yourself at http://www.panocast.com.
Actually, you could of course distort your image in advance, so that when it is mapped on the faces of a cube, its perspective is just right, but this requires the complete rerendering of your imagery.
With some additional "magic", we can also load still images incrementally, as needed, depending on where the user is looking and at what zoom level (not unlike Google Street View does).
In terms of what Google actually does, Bork had this right. I'm not sure of the exact details (and not sure I could release the details even if I did), but Google stores individual 360 degree streetview scenes in an equirectangular representation for serving. The flash player then uses a series of affine transformations to display the image in perspective. The affine transformations are approximate, but good enough to aggregate to a decent image overall.
The calculation of the served images is very involved, since there are many stages of image processing that have to be done, to remove faces, account for bloom, etc. etc. In terms of actually stitching the panoramas, there are many algorithms for this (wikipedia article). Just one interesting thing I'd like to point out though, as food for thought, in the 360 degree panoramas on street view, you can see the road at the bottom of the image, where there was no camera on the cars. Now that's stitching.
An expensive camera. makes
A 360 degree video
It is pretty impressive to watch a video that allows panning in every direction... which is what street view is without the bandwidth to support the full video.
For those wondering how the Google VR Photographers and editers add the ground to their Equirectangular panoramas, check out the feature called Viewpoint Correction, as seen in software like PTGui:
ptgui.com/excamples/vptutorial.html
(Note that this is NOT the software used by Google)
If you take a closer look at the ground in street view, you see that the stitching seems streched, and sometimes it even overlaps with information from the viewpoint next to the current one. (With that I mean that you can see something in one place, and suddenly that same feature is shown as the ground in the next place, revealing the technique used for the ground stitching).