My ultimate goal is to get meaningful snapshots from MP4 videos that are either 30 min or 1 hour long. "Meaningful" is a bit ambitious, so I have simplified my requirements.
The image should be crisp - non-overlapping, and ideally not blurry. Initially, I thought getting a keyframe would work, but I had no idea that keyframes could have overlapping images embedded in them like this:
Of course, some keyframe images look like this and those are much better:
I was wondering if someone might have source code to:
Take a sequence of say 10-15 continuous keyframes (jpg or png) and identify the best keyframe from all of them.
This must happen entirely programmatically. I found this paper: http://research.microsoft.com/pubs/68802/blur_determination_compressed.pdf
and felt that I could "rank" a few images based on the above paper, but then I was dissuaded by this link: Extracting DCT coefficients from encoded images and video given that my source video is an MP4. Of course, this confuses me because the input into the system is just a sequence of jpg images.
Another link that is interesting is:
Detection of Blur in Images/Video sequences
However, I am not sure if this will work for "overlapping" images.
Th first pic is from a interlaced video at scene change.The two fields belong to different scenes. De-interlacing the video will help, try the ffmpeg filter -filter:v yadif . I am not sure how yadiff works but if it extracts the two fields and scale them to original size, it would work. Another approach is to detect if the two fields(extract alternate lines and form images with half the height and diff them) are very different from each other and ignore those images.
Related
I have two videos of the Super Smash Brothers video game. In one video, the characters exist. In the other video, the characters do not exist. Everything else about the videos are the exact same except the characters being invisible in one of them.
When I output the two videos, I have to manually align them in a video editor. Once they are aligned, they stay in sync! However, the videos have a random amount of start time which is the problem.
What's a good way to automatically align these two different but extremely similar videos? Here are example frames.
Current ideas:
Take a random frame halfway through the video, compare to the other video at the same location. Use Mean Squared Error between the pixels. Move forward 5 seconds and back 5 seconds. Take the frame with the smallest MSE difference and use that as the matching frame remove the offset from the longer video at the beginning. This seems extremely brittle and slow.
Your current idea is good, but it doesn't need to be slow at all. Since the different part of the images are only the fighters! and we can assume those fighters are always at the middle of the image, so you just need to match a little part of the images, like the rectangle I drew:
Besides you can use other fast matching methods too like ORB features.
This SO answer addresses how to do a screen capture of a UIView. We need something similar, but instead of a single image, the goal is to produce a video of everything appearing within a UIView over 60 seconds -- conceptually like recording only the layers of that UIView, ignoring other layers.
Our video app superimposes layers on whatever the user is recording, and the ultimate goal is to produce a master video merging those layers with the original video. However, using AVVideoCompositionCoreAnimationTool to merge layers with the original video is very, very, very slow: exporting a 60-second video takes 10-20 seconds.
What we found is combining two videos (i.e., only using AVMutableComposition without AVVideoCompositionCoreAnimationTool) is very fast: ~ 1 second. The hope is to create an independent video of the layers and then combine that with the original video only using AVMutableComposition.
An answer in Swift is ideal but not required.
It sounds like your "fast" merge doesn't involve (re)-encoding frames, i.e. it's trivial and basically a glorified file concatenation, which is why it's getting 60x realtime. I asked about that because your "very slow" export is from 3-6 times realtime, which actually isn't that terrible (at least it wasn't on older hardware).
Encoding frames with an AVAssetWriter should give you an idea of the fastest possible non-trivial export and this may reveal that on modern hardware you could halve or quarter your export times.
This is a long way of saying that there might not be that much more performance to be had. If you think about the typical iOS video encoding use case, which would probably be recording 1920p # 120 fps or 240 fps, your encoding at ~6x realtime # 30fps is in the ballpark of what your typical iOS device "needs" to be able to do.
There are optimisations available to you (like lower/variable framerates), but these may lose you the convenience of being able to capture CALayers.
I'm trying to do image comparison to detect changes in a video processing application. These are two images that look identical to me, but are different according to both
http://pdiff.sourceforge.net/
and http://www.itec.uni-klu.ac.at/lire/nightly/api/net/semanticmetadata/lire/imageanalysis/LireFeature.html
Can anyone explain the difference? Eventually I need to find a library that can detect differences that doesn't have any false positives.
The two images are different.
I used GIMP (open source) to stack the two images one on top of the other and do a difference for the top layer. It showed a very faint black image, i.e. very little difference. I then used Curve to raise the tones and it revealed that what seem to be JPEG artifacts, even though the files given are PNG. I recommend GIMP and sometimes I use it instead of Photoshop.
Using GIMP to do a blink comparison between layers at 400% view, I would guess that the first image is closer to the original. The second may be saved copy of the first or from the original but saved at a lower quality setting.
It seems that the metadata has been stripped off both images (haven't done a definitive look), so no clues there.
There was a program called Unique Filer that I used for years. It is tunable and rather good. But any comparator is likely to generate a number of false positives if you tune it well enough to make sure it doesn't miss duplicates. If you only want to catch images that are very similar like this pair, then you can tune it very tightly. It is old and may not work on Windows 7 or later.
I would like to find good image checkers / comparators too. I've considered writing my own program.
I have a short video of 10 mins. This video is actually an online lecture. When you watch it, you will only see slide show (some slides are annotated).
I have the original slides (pdf or image or ppt or whatever). Is it possible to match each slide with a specific time in video when it appears?
My idea is to take every image and compare it with every video frames of that video and try to match the slide image in video.
How do you think my idea? Is it possible and doable with some algorithm?Can I just substract the video frame with the image (calculate the difference) to see which difference is close to zero? Thanks
If the images are perfectly aligned, then you can use any of simple differencing, sum of squared differences or normalised cross-correlation. However, if they are not aligned, you will need to register the two images first, followed by any of the three mentioned matching methods. Do a google search for image registration. Affine registration might be sufficient for your problem.
I have some device which streams h264 video in following format: top half of picture is even lines of video, and bottom half of picture is odd lines of video. So the question is - how can I play this video in normal visibility, using standart players, ffplay for example.
I know about "tinterlace:merge" plugin in ffmpeg, but it combines video from two pictures following one by one. So my task is make a correct video from single frame.
Regards,
Alexey.
I recently had to deal with the exact same problem.
there are many different methods and the optimum solution completely depends on your situation,
the simplest fastest method is weaving two fields together which is perfect for immobile parts but create comb effect in moving object.
more complicated methods use motion detection methods.
what I did was merging two fields then applying Edge-Line averaging (ELA) for moving segments to reduce comb effect.
check this link for a detailed explanation of the problem
It would be good if you could provide a sample video file. You describe very well what the picture looks like, but the file may contain other information that is helpful for playback.
Furthermore, the format you describe doesn't sound like a standard format, so it's unlikely you will get a regular player to play it the way you want, out-of-the-box. If you're using ffplay, it's likely that you will have to write your own plugin to re-order the scanlines prior to displaying them.
Alternatively, you could re-encode the video into a standard format (interlaced or deinterlaced) using ffmpeg. You could then play it back in any regular player, like ffplay or VLC.
Finally, I recommend asking your question on the ffmpeg mailing list.