Supporting dependency of P frame on I frame only - encoder

I have a doubt on x264 encoding.
If I have IP_1P_2P_3 IPPP some kind of this GOP from the x264 encoded output. Then in general, P_3 will be dependent on I, P_1 and P_2, P_2 will be dependent on I and P_1 (a cumulative encoding).
Can anyone please give me any ideas how can I support direct encoding so that all P_1, P_2, P_3 will be only dependent on the I frame?

You can modify the x264 source code to add references frame invalidation to all non IDR frames. It’s already a supported and existing function in x264 It just need to be called on the P/B frames.

Related

FDK AAC encoder/decoder : Access Huffman encoded and decoded data

For the FDK AAC,
I want to access the spectral data before and after Huffman encoding/decoding in the encoder and in the decoder.
For accessing spectral data before Huffman encoding, I am using pSpectralCoefficient pointer and dumping 1024 samples (on the decoder side) and using qcOutChannel[ch]->quantSpec and dumping 1024 samples (on the encoder side). Is this correct?
Secondly, how do access the Huffman encoded signal in the encoder and decoder. If someone can tell me the location in the code and the name of the pointer to use and the length of this data, I will be extremely thankful.
Thirdly,
I wanted to know that what is the frame size in frequency domain(before huffman encoding)?
I am dumping 1024 samples of *pSpectralCoefficient. Is that correct?
Is it possible that some frames are 1024 in length and others are a set of 8 frames with 128 frequency bins. If it is possible, then is there any flag that can give me this information ?
Thank you for your time. Request you please help me out with this as soon as possible.
Regards,
Akshay
To pull out that specific data from the bitstream you will need to step through the decoder and find the desired peaces of stream. In order to do that you have to have the AAC bitstream specification. Current AAC specification is:
ISO/IEC 14496-3:2009 "Information technology -- Coding of audio-visual objects -- Part 3: Audio"

Input of a fixed point DSP

i'm new to working with dsps and fixed point and i really need to know:
1. Is it the fixed point dsp that converts the float number to Q format or a device does that before feeding the Dsp?
2. Who specifies the Q format to be used. Does each DSP come with a specified Q_format or the programmer does that in his codes.
3. Can i have an idea of how to perform a simple say 4 by 4 fixed point matrix multiplication in c++?
Thanks in anticipation
The format is usually fixed for a given DSP, e.g. Motorola DSP 56k family uses a 24 bit signed fractional format (Q23).
Fixed point is really just the same as an ordinary integer but there's an implicit scale factor. For most operations this makes no difference, e.g. load/store/add/subtract all work the same way regardless of whether the data is integer or fixed point.
When it comes to multiplication or division however the implicit scaling factor needs to be taken into account - typically there will be a shift after the operation to correct for this. DSP instructions take care of this automatically, whereas normal CPUs have to do this explicitly.
When you're doing e.g. a 4x4 matrix multiply you just use the DSP's native fixed point arithmetic instructions and the scaling is all taken care of automatically.

What is an ideal domain for FFT?

I am using the Jtransforms library which seems to be wicked fast for my purpose.
At this point I think I have a pretty good handle on how FFT works so now I am wondering if there is any form of a standard domain which is used for audio visualizations like spectograms?
Thanks to android's native FFT in 2.3 I had been using bytes as the range although I am still unclear as to whether it is signed or not. (I know java doesn't have unsigned bytes, but Google implemented these functions natively and the waveform is PCM 8bit unsigned)
However I am adapting my app to work with mic audio and 2.1 phones. At this point having the input domain being in the range of bytes whether it is [-128, 127] or [0, 255] no longer seems quite optimal.
I would like the range of my FFT function to be [0,1] so that I can scale it easily.
So should I use a domain of [-1, 1] or [0, 1]?
Essentially, the input domain does not matter. At most, it causes an offset and a change in scaling on your original data, which will be turned into an offset on bin #0 and an overall change in scaling on your frequency-domain results, respectively.
As to limiting your FFT output to [0,1]; that's essentially impossible. In general, the FFT output will be complex, there's no way to manipulate your input data so that the output is restricted to positive real numbers.
If you use DCT instead of FFT your output range will be real. (Read about the difference and decide if DCT is suitable for your application.)
FFT implementations for real numbers (as input domain) use half the samples for the output range (since there are only even results when the input is real), therefore the fact you have both real and imaginary parts for each sample doesn't effect the size of the result (vs the size of the source) much (output size is ceil(n/2)*2).

How to normalize OpenCV feature descriptors to an integer scale?

OpenCV SURF implementation returns a sequence of 64/128 32 bit float values (descriptor) for each feature point found in the image. Is there a way to normalize this float values and take them to an integer scale (for example, [0, 255])?. That would save important space (1 or 2 bytes per value, instead of 4). Besides, the conversion should ensure that the descriptors remain meaningful for other uses, such as clustering.
Thanks!
There are other feature extractors than SURF. The BRIEF extractor uses only 32 bytes per descriptor. It uses 32 unsigned bytes [0-255] as its elements. You can create one like this: Ptr ptrExtractor = DescriptorExtractor::create("BRIEF");
Be aware that a lot of image processing routines in OpenCV need or assume that the data is stored as floating-point numbers.
You can treat the float features as an ordinary image (Mat or cvmat) and then use cv::normalize(). Another option is using cv::norm() to find the range of descriptor values and then cv::convertTo() to convert to CV_8U. Look up the OpenCV documentation for these functions.
The descriptor returned by cv::SurfFeatureDetector is already normalized. You can verify this by taking the L2 Norm of the cv::Mat returned, or refer to the paper.

On-the-fly lossless image compression

I have an embedded application where an image scanner sends out a stream of 16-bit pixels that are later assembled to a grayscale image. As I need to both save this data locally and forward it to a network interface, I'd like to compress the data stream to reduce the required storage space and network bandwidth.
Is there a simple algorithm that I can use to losslessly compress the pixel data?
I first thought of computing the difference between two consecutive pixels and then encoding this difference with a Huffman code. Unfortunately, the pixels are unsigned 16-bit quantities so the difference can be anywhere in the range -65535 .. +65535 which leads to potentially huge codeword lengths. If a few really long codewords occur in a row, I'll run into buffer overflow problems.
Update: my platform is an FPGA
PNG provides free, open-source, lossless image compression in a standard format using standard tools. PNG uses zlib as part of its compression. There is also a libpng. Unless your platform is very unusual, it should not be hard to port this code to it.
How many resources do you have available on your embedded platform?
Could you port zlib and do gzip compression? Even with limited resources, you should be able to port something like LZ77 or LZ88.
There are a wide variety of image compression libraries available. For example, this page lists nothing but libraries/toolkits for PNG images. Which format/library works best for you will most likely depend on the particular resource constraints you are working under (in particular, whether or not your embedded system can do floating-point arithmetic).
The goal with lossless compression is to be able to predict the next pixel based on previous pixels, and then to encode the difference between your prediction and the real value of the pixel. This is what you initial thought to do, but you were only using the one previous pixel and making the prediction that the next pixel would be the same.
Keep in mind that if you have all of the previous pixels, you have more relevant information than just the preceding pixel. That is, if you are trying to predict the value of X, you should use the O pixels:
..OOO...
..OX
Also, you would not want to use the previous pixel, B, in the stream to predict X in the following situation:
OO...B <-- End of row
X <- Start of next row
Instead you would make your prediction base on the Os.
How 'lossless' do you need?
If this is a real scanner there is a limit to the bandwidth/resolution so even if it can send +/-64K values it may be unphysical for adjacent pixels to have a difference of more than say 8 bits.
In which case you can do a start pixel value for each row and then do differences between each pixel.
This will smear out peaks but it may be that any peaks more than 'N'bits are noise anyway.
A good LZ77/RLE hybrid with bells and wwhistles can get wonderful compression that is fairly quick to decompress. They will also be bigger, badder compressors on smaller files due to the lack of library overhead. For a good, but GPLd implentation of this, check out PUCrunch

Resources