Different video formats makes differences to Gist descriptor and classifier? - image-processing

I’m working on dataset that is made of avi videos and I want to apply Gist on its frames and use Gist features of each frames for training my classifier to recognize actions. If I convert this videos to mp4 format and then perform Gist what will be the result?

mpeg4 is just a container, it says nearly nothing about how the actual data is compressed. In short - if you use lossy compression then Gist descriptors will change, if you use lossless then they will be the same, and since most common default video compressors are lossy, your Gist will most probably change.

Related

Alternatives for Error level Analysis (ELA)

I am working on Image processing with deep learning and I came on to the topic Error level Analysis which gives/shows the difference in compression level(I'm trying to show if the image as gone under multiple compressions or not) in JPEG (lossy compression).
Are there any other techniques which are similar to ELA on JPEG and techniques similar or different which can be used on PNG as well to show multiple compressions?
There cannot be, IMHO.
Since PNG compression is lossless, every decompression must result in the identical original image. Therefore, every recompression will start from the same place so no history can be retained.

Using Huffman coding to compress images taken by the iPhone camera

Im thinking to use the Huffman coding to make an app that takes pictures right from the iPhone camera and compress it. Would it be possible for the hardware to handle the complex computation and building the tree ? In other words, is it doable?
Thank you
If you mean the image files (like jpg, png, etc), then you should know that they are already compressed with algorithms specific to images. The resulting files would not huffman compress much, if at all.
If you mean that you are going to take the UIImage raw pixel data and compress it, you could do that. I am sure that the iPhone could handle it.
If this is for a fun project, then go for it. If you want this to be a useful and used app, you will have some challenges
It is very unlikely that Huffman will be better than the standard image compression used in JPG, PNG, etc.
Apple has already seen a need for better compression and implemented HEIF in iOS 11. WWDC Video about HEIF
They did a lot of work in the OS and Photos app to make sure to use HEIF locally, but if you share the photo it turns it into something anyone could use (e.g. JPG)
All of the compression they implement uses hardware acceleration. You could do this too, but the code is a lot harder than Huffman.
So, for learning and fun, it's a good project -- it might be easier to do as a Mac app instead, but for something meant to be real, it would be extremely hard to overcome the above issues.
There are 2 parts, encoding and decoding. The encoding process involves constructing a tree or a table based representation of a tree. The decoding process covers reading from huff encoding bytes and undoing a delta. It would likely be difficult to get much speed advantage in the encoding as compared to PNG, but for decoding a very effective speedup can be seen by moving the decoding logic to the GPU with Metal. You can have a look at the full source code of an example that does just that for grayscale images on github Metal Huffman.

Does the type of image matter when training an object detector?

I was wondering if the type of photo used to train an object detector makes a difference, I can't seem to find anything about this online. I am using opencv and dlib if that makes a difference but I am interested in a more general answer if possible.
Am I correct in assuming that lossless file formats would be better than lossey formats? And if training for an object jpg would be better than png as pngs are optimized for text and graphs?
As long as the compression doesn't introduce noticeable artifacts it generally won't matter. Also, many real world computer vision systems need to deal with video or images acquired from less than ideal sources. So you usually shouldn't assume you will get super high quality images anyway.

which format is best for upload video & audio

I am working on "upload video & audio to server",I want to know which format is best for upload (consider the quality & file-size)
video formats are just containers, if you want to consider quality and file size you should look into the encoding of video. For ios based devices h264 encoder with high efficiency level 4 provides the good compression, hence you will get good quality in less file size.
If you want to learn about conversion of video data from one format to another please look into ffmpeg.

Can I use ffmpeg to create multi-bitrate (MBR) MPEG-4 videos?

I am currently in a webcam streaming server project that requires the function of dynamically adjusting the stream's bitrate according to the client's settings (screen sizes, processing power...) or the network bandwidth. The encoder is ffmpeg, since it's free and open sourced, and the codec is MPEG-4 part 2. We use live555 for the server part.
How can I encode MBR MPEG-4 videos using ffmpeg to achieve this?
The multi-bitrate video you are describing is called "Scalable Video Codec". See this wiki link for basic understanding.
Basically, in a scalable video codec, a base layer stream itself has completely decodable; however, additional information is represented in the form of (one or many) enhancement streams. There are couple of techniques to be able to do this including lower/higher resolution, framerate and change in Quantization. The following papers explains in details
of Scalable Video coding for MEPG4 and H.264 respectively. Here is another good paper that explains what you intend to do.
Unfortunately, this is broadly a research topic and till date no open source (ffmpeg and xvid) doesn't support such multi layer encoding. I guess even commercial encoders don't support this as well. This is significantly complex. Probably you can check out if Reference encoder for H.264 supports it.
The alternative (but CPU expensive) way could be transcode in real-time while transmitting the packets. In this case, you should start off with reasonably good quality to start with. If you are using FFMPEG as API, it should not be a problem. Generally multiple resolution could still be a messy but you can keep changing target encoding rate.

Resources