I have two videos of the Super Smash Brothers video game. In one video, the characters exist. In the other video, the characters do not exist. Everything else about the videos are the exact same except the characters being invisible in one of them.
When I output the two videos, I have to manually align them in a video editor. Once they are aligned, they stay in sync! However, the videos have a random amount of start time which is the problem.
What's a good way to automatically align these two different but extremely similar videos? Here are example frames.
Current ideas:
Take a random frame halfway through the video, compare to the other video at the same location. Use Mean Squared Error between the pixels. Move forward 5 seconds and back 5 seconds. Take the frame with the smallest MSE difference and use that as the matching frame remove the offset from the longer video at the beginning. This seems extremely brittle and slow.
Your current idea is good, but it doesn't need to be slow at all. Since the different part of the images are only the fighters! and we can assume those fighters are always at the middle of the image, so you just need to match a little part of the images, like the rectangle I drew:
Besides you can use other fast matching methods too like ORB features.
Related
This SO answer addresses how to do a screen capture of a UIView. We need something similar, but instead of a single image, the goal is to produce a video of everything appearing within a UIView over 60 seconds -- conceptually like recording only the layers of that UIView, ignoring other layers.
Our video app superimposes layers on whatever the user is recording, and the ultimate goal is to produce a master video merging those layers with the original video. However, using AVVideoCompositionCoreAnimationTool to merge layers with the original video is very, very, very slow: exporting a 60-second video takes 10-20 seconds.
What we found is combining two videos (i.e., only using AVMutableComposition without AVVideoCompositionCoreAnimationTool) is very fast: ~ 1 second. The hope is to create an independent video of the layers and then combine that with the original video only using AVMutableComposition.
An answer in Swift is ideal but not required.
It sounds like your "fast" merge doesn't involve (re)-encoding frames, i.e. it's trivial and basically a glorified file concatenation, which is why it's getting 60x realtime. I asked about that because your "very slow" export is from 3-6 times realtime, which actually isn't that terrible (at least it wasn't on older hardware).
Encoding frames with an AVAssetWriter should give you an idea of the fastest possible non-trivial export and this may reveal that on modern hardware you could halve or quarter your export times.
This is a long way of saying that there might not be that much more performance to be had. If you think about the typical iOS video encoding use case, which would probably be recording 1920p # 120 fps or 240 fps, your encoding at ~6x realtime # 30fps is in the ballpark of what your typical iOS device "needs" to be able to do.
There are optimisations available to you (like lower/variable framerates), but these may lose you the convenience of being able to capture CALayers.
I have a short video of 10 mins. This video is actually an online lecture. When you watch it, you will only see slide show (some slides are annotated).
I have the original slides (pdf or image or ppt or whatever). Is it possible to match each slide with a specific time in video when it appears?
My idea is to take every image and compare it with every video frames of that video and try to match the slide image in video.
How do you think my idea? Is it possible and doable with some algorithm?Can I just substract the video frame with the image (calculate the difference) to see which difference is close to zero? Thanks
If the images are perfectly aligned, then you can use any of simple differencing, sum of squared differences or normalised cross-correlation. However, if they are not aligned, you will need to register the two images first, followed by any of the three mentioned matching methods. Do a google search for image registration. Affine registration might be sufficient for your problem.
I am creating an app that represents the pages of a book with animation and interactive areas. There is one character who is constant throughout but each page has them represented in a different look so I cannot re-use the frames very easily. This character has wings, legs and eyes which all need to move differently. What I am wondering is what is the best way to take them from the PSD into the app? The two approaches I can think of is either:
Create a separate png for each frame of the animation and then cycle through them (this would be combined into a single sprite atlas)
Split the character into their parts and then position, rotate, scale and move them in the app manually.
The main reason I am considering point 2 is that if I do point 1 then I will need to create a lot of frames of animation for each page and also create them all twice to cater for normal and retina displays.
Please let me know what the correct approach for this may be and if there is anything I should keep in mind.
Thanks
Option 1 sounds much more feasible. 300 frames is a bit too much, but you dont have to load all of them in the memory at the same time. Divide your frames into multiple spritesheets of 1024*1024 and make sure all the frames of the same animation are on a single spritesheet. So, at any given moment, only a single texture would be loaded in the memory, which I guess is the minimum anyway.
You can also do a bit more optimization maybe, by creating separate animations for things that behave the same in different poses. For example, if the eyes are blinking exactly the same in different poses, you can stop creating separate frames for each pose just for blinking. Just take out the eyes (ouch!), create a separate animation for them, and place it over your character's animation node.
Going with option 2 would create un-necessary complications, both for you and the poor device.
My ultimate goal is to get meaningful snapshots from MP4 videos that are either 30 min or 1 hour long. "Meaningful" is a bit ambitious, so I have simplified my requirements.
The image should be crisp - non-overlapping, and ideally not blurry. Initially, I thought getting a keyframe would work, but I had no idea that keyframes could have overlapping images embedded in them like this:
Of course, some keyframe images look like this and those are much better:
I was wondering if someone might have source code to:
Take a sequence of say 10-15 continuous keyframes (jpg or png) and identify the best keyframe from all of them.
This must happen entirely programmatically. I found this paper: http://research.microsoft.com/pubs/68802/blur_determination_compressed.pdf
and felt that I could "rank" a few images based on the above paper, but then I was dissuaded by this link: Extracting DCT coefficients from encoded images and video given that my source video is an MP4. Of course, this confuses me because the input into the system is just a sequence of jpg images.
Another link that is interesting is:
Detection of Blur in Images/Video sequences
However, I am not sure if this will work for "overlapping" images.
Th first pic is from a interlaced video at scene change.The two fields belong to different scenes. De-interlacing the video will help, try the ffmpeg filter -filter:v yadif . I am not sure how yadiff works but if it extracts the two fields and scale them to original size, it would work. Another approach is to detect if the two fields(extract alternate lines and form images with half the height and diff them) are very different from each other and ignore those images.
I have some device which streams h264 video in following format: top half of picture is even lines of video, and bottom half of picture is odd lines of video. So the question is - how can I play this video in normal visibility, using standart players, ffplay for example.
I know about "tinterlace:merge" plugin in ffmpeg, but it combines video from two pictures following one by one. So my task is make a correct video from single frame.
Regards,
Alexey.
I recently had to deal with the exact same problem.
there are many different methods and the optimum solution completely depends on your situation,
the simplest fastest method is weaving two fields together which is perfect for immobile parts but create comb effect in moving object.
more complicated methods use motion detection methods.
what I did was merging two fields then applying Edge-Line averaging (ELA) for moving segments to reduce comb effect.
check this link for a detailed explanation of the problem
It would be good if you could provide a sample video file. You describe very well what the picture looks like, but the file may contain other information that is helpful for playback.
Furthermore, the format you describe doesn't sound like a standard format, so it's unlikely you will get a regular player to play it the way you want, out-of-the-box. If you're using ffplay, it's likely that you will have to write your own plugin to re-order the scanlines prior to displaying them.
Alternatively, you could re-encode the video into a standard format (interlaced or deinterlaced) using ffmpeg. You could then play it back in any regular player, like ffplay or VLC.
Finally, I recommend asking your question on the ffmpeg mailing list.