How are quarter-precision motion vectors encoded - video-encoding

I would need to understand how exactly motion vectors are encoded, for non integer precision (whether it is for quarter pel, 1/16 pel or whatever)
In the code, the motion vectors components are always integers, but I don't understand how to deal with non integer precision.
For example if my motion vector "actual values" are say (3.5, 2.75), how then to get the "int" values that are in the code, or if the value of the x and y component in the code are (114, 82) and it is with quarter pel precision, what are the actual values ?
Thank you for helping

They are basically scaled to integer and then coded. For instance, MV=2.75 is scaled to scaledMV=2.75x4=11. Note that to be able to decode integer MVs, they should be scaled, too. For instance, MV=1.0 will become scaledMV=4x1.0=4.0.
FYI, the MV coding of HEVC is way too complicated to be explained here. So, I would suggest that you take a look at this paper.

Related

Scilab Error: Mean, Variance not executing

I1 is an rgb image. 'Out' variable basically stores one colour channel of the whole image.
The in-built functions mean, variance and standard deviation when calculated on 'out' gives an error asking for a real vector or matrix as input.
This can be seen in image given below
But when min or max is used, no error is reported.But these in-built function take in the same parameters as mentioned in the Scilab documentation which is of type vector or matrix of integers.
On further examination, it seems that variable 'out' is of type matrix of graphic handles when it should be a matrix of integers.
I can't seem to understand why the error is coming if it works for min and max functions ?
How can I solve this problem?
The output of imread() is a hypermatrix of integers, not of floating point numbers.
This is shown by the fact that min(out) is displayed as "4" (without decimal point), not as "4."
Now, mean() and stdev() do not work with integers, only with real or complex numbers.
The solution is to convert integers into decimal numbers:
mean(double(out))
https://help.scilab.org/docs/6.1.1/en_US/double.html

How much storage to represent sparse matrix

I don`t know how to solve this problem in Fundamentals of data structure in C ed.2nd ch2.5
On a computer with w bits per word, how much storage is needed to represent a sparse matrix, A, with t nonzero terms?
I think the answer is 3*w*t because in sparse matrix we just store row, col and values,
so 3 times w*t but someone says answer is w2 + t.... I don't understand what they mean.
In the most common “general purpose” sparse matrix formats (CSR and CSC), for a matrix with t nonzeros, there are two integer arrays, of lengths t+1 and t, and one array of floating-point numbers of length t. In practice, the size in bytes will depend on the sizes of the integer and floating-point representations. In a theoretical machine with one uniform word size for everything, the size would be 3t+1 words.
I fail to see how w^2+t could be correct or even related.

Input of a fixed point DSP

i'm new to working with dsps and fixed point and i really need to know:
1. Is it the fixed point dsp that converts the float number to Q format or a device does that before feeding the Dsp?
2. Who specifies the Q format to be used. Does each DSP come with a specified Q_format or the programmer does that in his codes.
3. Can i have an idea of how to perform a simple say 4 by 4 fixed point matrix multiplication in c++?
Thanks in anticipation
The format is usually fixed for a given DSP, e.g. Motorola DSP 56k family uses a 24 bit signed fractional format (Q23).
Fixed point is really just the same as an ordinary integer but there's an implicit scale factor. For most operations this makes no difference, e.g. load/store/add/subtract all work the same way regardless of whether the data is integer or fixed point.
When it comes to multiplication or division however the implicit scaling factor needs to be taken into account - typically there will be a shift after the operation to correct for this. DSP instructions take care of this automatically, whereas normal CPUs have to do this explicitly.
When you're doing e.g. a 4x4 matrix multiply you just use the DSP's native fixed point arithmetic instructions and the scaling is all taken care of automatically.

Packing a 16-bit floating point variable into two 8-bit variables (aka half into two bytes)

I code on XNA and only has access to shader model 3, hence no bitshift operators. I need to pack two random 16-bit floating point variables (meaning NOT in range [0,1] but ANY RANDOM FLOAT VARIABLE) into two 8-bit variables. There is no way to normalize them.
I thought about doing bitshifting manually but I can't find a good article on how to convert a random decimal float (not [0,1]) into binary and back.
Thanks
This is not really a good idea - a 16-bit float already has very limited range and precision. Remember that 8-bits leaves you with just 256 possible values!
Getting an 8-bit value into a shader is trivial. As a colour is one method. You can use each channel as a normalised range, from 0 to 1.
Of course, you say you don't want to normalise your values. So I assume you want to maintain the nice floating-point property of a wide range with better precision closer to zero.
(Now would be a good time to read some background info on floating-point. Especially about half-precision floating-point and minifloats and microfloats.)
One way to do that would be to encode your values using a logarithm and an exponent (to encode and decode, respectivly). This is basically exactly what the floating-point format itself does. The exact maths will depend on the precision and the range that you desire - (which 256 values will you represent?) - so I will leave it as an exercise.

How are matrices stored in memory?

Note - may be more related to computer organization than software, not sure.
I'm trying to understand something related to data compression, say for jpeg photos. Essentially a very dense matrix is converted (via discrete cosine transforms) into a much more sparse matrix. Supposedly it is this sparse matrix that is stored. Take a look at this link:
http://en.wikipedia.org/wiki/JPEG
Comparing the original 8x8 sub-block image example to matrix "B", which is transformed to have overall lower magnitude values and much more zeros throughout. How is matrix B stored such that it saves much more memory over the original matrix?
The original matrix clearly needs 8x8 (number of entries) x 8 bits/entry since values can range randomly from 0 to 255. OK, so I think it's pretty clear we need 64 bytes of memory for this. Matrix B on the other hand, hmmm. Best case scenario I can think of is that values range from -26 to +5, so at most an entry (like -26) needs 6 bits (5 bits to form 26, 1 bit for sign I guess). So then you could store 8x8x6 bits = 48 bytes.
The other possibility I see is that the matrix is stored in a "zig zag" order from the top left. Then we can specify a start and an end address and just keep storing along the diagonals until we're only left with zeros. Let's say it's a 32-bit machine; then 2 addresses (start + end) will constitute 8 bytes; for the other non-zero entries at 6 bits each, say, we have to go along almost all the top diagonals to store a sum of 28 elements. In total this scheme would take 29 bytes.
To summarize my question: if JPEG and other image encoders are claiming to save space by using algorithms to make the image matrix less dense, how is this extra space being realized in my hard disk?
Cheers
The dct needs to be accompanied with other compression schemes that take advantage of the zeros/high frequency occurrences. A simple example is run length encoding.
JPEG uses a variant of Huffman coding.
As it says in "Entropy coding" a zig-zag pattern is used, together with RLE which will already reduce size for many cases. However, as far as I know the DCT isn't giving a sparse matrix per se. But it usually enhances the entropy of the matrix. This is the point where the compressen becomes lossy: The intput matrix is transferred with DCT, then the values are quantizised and then the huffman-encoding is used.
The most simple compression would take advantage of repeated sequences of symbols (zeros). A matrix in memory may look like this (suppose in dec system)
0000000000000100000000000210000000000004301000300000000004
After compression it may look like this
(0,13)1(0,11)21(0,12)43010003(0,11)4
(Symbol,Count)...
As my under stand, JPEG on only compress, it also drop data. After the 8x8 block transfer to frequent domain, it drop the in-significant (high-frequent) data, which means it only has to save the significant 6x6 or even 4x4 data. That it can has higher compress rate then non-lost method (like gif)

Resources