I need to simultaneously display a video that is playing in my applciation, full screen on a larger monitor. On some video cards, this is called Theater mode and is configured using a tool that the card manufacturer supplies.
I would like to do this with only software. Can I do this with DirectX?
My idea is to take the currently active video playing using DirectShow and repaint it on a second display (as configured by the user) in full screen mode.
What technologies or methods would I use for this?
The straightforward way is to split yet encoded video into two branches and use two video renderer set to present video on different monitors. One renderer could be a part of your application UI, the other could expand full screen on the large secondary monitor.
Splitting encoded video give you an option to still leverage hardware assisted decoding (DXVA) if available. You might prefer to use software only decoder and split already decoded video - this is also going to work.
You might additionally want to implement filter which would separately temporarily disable one or the other renderer, such as for example by stopping passing media samples through.
Another thing you can do is to use bridging to even more flexibly control the renderers and be able to detach them from media source.
Related
Windows provides DRM functionality to applications that require it.
Some of them, however, have more protection than others.
As an example, take Edge (both Legacy and Chromium) or IE that use Protected Media Path. They get to display >720p Netflix content. Other browsers don't use PMP and are capped at 720p.
The difference in the protections is noticeable when you try to capture the screen: while you have no problems on Firefox/Chrome, in Edge/IE a fixed black image takes the place of the media you are playing, but you still see media control buttons (play/pause/etc) that are normally overlaid (alpha blended) on the media content.
Example (not enough rep yet to post directly)
The question here is mainly conceptual, and in fact also could apply to systems that have identical behavior, like iOS that also replaces the picutre when you screenshot or capture the screen on Netflix.
How does it get to display two different images on two different outputs (Capture APIs with no DRM content and attached phisical monitor screen with DRM content)?
I'll make a guess and I'll start by excluding HW overlays. The reason is that play/pause buttons are still visible on the captured output. Since they are overlaid (alpha blended) on the media on the screen, and alpha blending on HW overlays is not possible in DirectX 9 or later nor it is using legacy DirectDraw, hardware overlays have to be discarded. And by the way, neither d3d9.dll or ddraw.dll are loaded by mfpmp.exe or iexplore.exe (version 11). Plus, I think hardware overlays are now considered a legacy feature, while Media Foundation (which Protected Media Path is a part of) is totally alive and maintained.
So my guess is that DWM, that is in charge for the screen composition, is actually doing two compositions. Either by forking the composition process at the point when it encounters a DRM area and feeds one output to the screen (with DRM protected content) and the other to the various screen capturing methods and APIs, or by entirely doing two different compositions in the first place.
Is my guess correct? And could you please provide evidence to support your answer?
My interest is understanding how composition software and DRM are implemented, primarily in Windows. But how many other ways could there be to do it in different OSes?
Thanks in advance.
According to this document, both the options are available.
The modern PlayReady DRM that Netflix uses for its playback in IE, Edge, and the UWP app uses the DWM method, which can be noticed by the video area showing only a black screen when DWM is forcibly killed. It seems that this is because the modern PlayReady is supported since Windows 8.1, which does not let users disable DWM easily.
I think both methods were used in Windows Vista-7, but I have no samples to test it. As HW overlays don't look that good with window previews, animations, and transparencies, they would have switched each method depending on the DWM status.
For iOS, it seems that a mechanism that is similar to the DWM method is done in the display server(SpringBoard?) level to present the protected content which is processed in the Secure Enclave Processor.
I am asking this because I couldn't find the answer anywhere, at least using the keywords I could think.
The most relevant question/answer I've found is : (Create interactive videos in iPad - An app for product demo) . The user Jano replied:
The easiest way to create interactive videos for iOS is to use Apple's HTTP Live Streaming technology. You have to create a video, embed metadata, play it using MPMoviePlayerController or AVPlayerItem, and then display clickable areas in response to metadata notifications.
Metadata should contain coordinates for the element you are tracking, eg: a dress, and a identifier for the product. You overlay this info with a clickable subview that reveals more information about the product. There are several applications of this kind in iTunes, here is one.
Once you get a working product and weeks-time of videos, the most difficult part is to perform motion tracking with the less possible human interaction. One approach is to use Adobe After Effects, another is to code your own solution based on OpenCV.
The example I've found concerning this technology (http://vimeo.com/16455248) showed the automatic addition of NSButtons when the video reaches the meta-tags embedded. My client wants a human body interactive video that pauses at a specific time (maybe using the meta-tags) and reacts to user tapping in an element in video (e.g: imagine a pill inside stomach; after tapping this pill it triggers another pre-rendered video, in a way not transparent to user). I have thought about animations using Cocos2D or Open GL ES, but I lack people who master these technologies.
I didn't quite understand the "motion tracking" reference in the quote above. Jano mentions Adobe After Effects and OpenCV. This motion tracking is like an "UIGestureRecognizer" ? Does it track parts of the video itself or motions initiated by user, as taps ?
I expect I've exposed the question in the most clear form possible. Thank you in advance.
This question is a year old, but I can give you insight into the After Effects question. AE has a feature where you can define an area in a video frame and the software will track that area across the timeline, logging the coordinates at specific intervals. For example, in a video of a person riding a mountain bike, you could select an area around their helmet and AE will log coordinates of the helmet throughout the timeline.
Since Flash was the most likely target for interactive video, the typical workflow would encode this coordinate data into a Flash video as cue point events (this is the only method I have personally experienced). According to some googling, the data is stored in key frames and can be extracted using scripts.
More info: http://helpx.adobe.com/after-effects/using/tracking-stabilizing-motion-cs5.html
Here's a manual method for extracting the data:
In the timeline panel select the footage and press the U key, all
track points keyframes will show up. Here’s the magic, select the
Feature Center property of each track point and copy it (Cmd+C for Mac
or Ctrl+C for PC)
Now open any text editor such as TextMate or Notepad and paste the
data (Cmd+V for Mac or Ctrl+V for PC)
Is it possible to build an application that displays itself (TopMost) even when a game is running (Quake, Farcry, Black Ops, any Direct X driven game)
I would like to be able to record my key presses while I play a game for video recording.
It must be possible because FRAPS displays the FPS on top of everything that uses direct X, including video players.
Any thoughts?
First of all, it isn't that easy. FRAPs works with API-Injection, to set some code into the drawing-steps of the programs, where it takes the many different versions of directx and opengl into account. If found a link, where it is explained a little more: Case Study Fraps.
Maybe a solution to work in windowed mode and capture the input with global hooks is easier, but I never tried out something in this direction. If you want to work with api-hooks maybe this link will be useful: Direct3DHook
Although i've searched SO and read documentation multiple times on AVCaptureConnection, AVCaptureSession, AVCaptureVideoPreviewLayer, AVCaptureDevice, AVCaptureInput/Output … i'm still confused about all this AV stuff. When it comes to this, it's one big pile of abstract words to me, that don't make much sense. I'm asking to shed some light on the subject for me here.
So, can anyone explain coherently in plain english the logic of proper setup and use of the media devices? What is AVCaptureVideoPreviewLayer? What is AVCaptureConnection? Input/Output?
I want to catch the basic idea the people who made this stuff had while making it.
Thanks
I wish I had more time to write a more thorough reply. Here are some simplified basics:
In order to work with audio and video coming from the hardware, destined for the screen or files, you need to setup an AVCaptureSession that helps coordinate the sources and the destinations, using AVCaptureConnections. You use the session instance to start and stop the process, along with setting some output properties like bitrate and quality. You use the AVCaptureConnection instance(s) to control the connection between an AVCaptureInputPort and an AVCaptureOutputPort (or AVCaptureVideoPreviewLayer), such as monitoring input levels of sounds or setting the orientation of the video.
AVCaptureInputPort are different inputs from AVCaptureDevice - which is where your video or audio is coming from, such as the camera or the microphone. You will normally look through all available devices and choose those that have the properties you are looking for, such as if they are audio, or if they are the front-facing camera.
AVCaptureOutput is where the AV is sent - it might be a file or a routine that allows you to process the data in real-time, etc.
AVCaptureVideoPreviewLayer is an OpenGL layer that is optimized for very fast rendering of the output of the selected video input device (front or back camera). You typically use this to show your user what input you are working with - sort of like a camera viewfinder.
If you are going to use this stuff, then you must read Apple's AV Foundation Programming Guide
Here's an image that may help you some more (from above-mentioned doc):
A more detailed view:
My application uses VMR9 Renderless mode to play a WMV file. I build a filter graph with IGraphBuilder::RenderFile and control playback with IMediaControl. Everything plays okay, but I can't figure out how to determine the source video size. Any ideas?
Note: This question was asked before in How can I adjust the video to a specified size in VMR9 renderless mode?. But the solution was to use Windowless mode instead of Renderless mode, which would require rewriting my code.
Firstly you want the Video renderer. You can do this by using EnumFilters on the IGraphBuilder interface. Then call EnumPins on that filter to find the input pin. You can then call ConnectionMediaType to get the media type being fed into that filter. Now depending what formattype is set to you can cast the pbFormat pointer to the relevant structure and from there find out what the video size is. If you want the size before that (to see if some scaling is going on) you can work your way back across the pin using "ConnectedTo" to get the next filter back. You can then find its input pins and repeat the ConnectionMediaType call. Repeat until you get to the filter's pin that you want.
You could use the MediaInfo project at http://mediainfo.sourceforge.net/hr/Download/Windows and through the CS wrapper included in the VCS2010 or VCS2008 folders get all the information about a video you need.
EDIT: Sorry I thought you were on managed. But in either case the MediaInfo can be used, so maybe it helps.