YUV422 Packed format scaling - opencv

I am writing a scaling algorithm for YUV422 packed format images (without any intermediate conversions to RGB or grayscale or what have you). As can be seen in the below image from MSDN, the 4:2:2 format has 2 Luma bytes for each chroma byte. My test bench involves procuring images from the iSight camera using OpenCV APIs, converting them to YUV (CV_BGR2YUV) and then resizing them. The questions I have are:
I am posting sample data (from OpenCV's Mat's pointer to raw data) for reference below straight from the memory dump, how do I identify by looking at the data as to what is the Y component and what the UV components are?
15 8B 7A 17 8A 7A 18 8A 7B 17 89 7A 19 89 79 19
Is this bilinear interpolation algorithm correct? Let's say, my box is
TOP ROW: Y00, U00, Y01, V00, Y02, U01, Y03, V01,
BOTTOM ROW: Y10, U10, Y11, V10, Y12, U11, Y13, V11,
Result is interpolation of: (Y00, Y01, Y10, Y11), (U00, U01, U10, U11), (Y02, Y03, Y12, Y13), (U00, U01, U10, U11).
That forms my first two YUYV pixels of 32 bits.
Any references to principles of performing bilinear interpolation on YUYV images would be very helpful! Thanks in advance.
See image format
[EDIT]: Please note that the post here is somewhat different, in that it does not discuss the effects of additive operations on the YUV images. It just discards pixels to downsize. Resize (downsize) YUV420sp image

Related

High- vs. low bit encoding for 10/12 bit images

I had a discussion with a colleague recently on the topic of high-bit encodings vs. low bit encodings of 12 bit images in 16 bits of memory (e.g. PNG files). He argued that high-bit encodings were easier to use, since many image viewers (e.g. windows image preview, explorer thumbnails, ...) can display them more easily in a human-readable way using trivial 16-to-8 conversions, while low-bit encoded images appear mostly black.
I'm thinking more from an image processing perspective and thought that surely, low-bit encodings are better, so I sat down, did some experimentation and wrote out my findings. I'm curious if the community here has additional or better insights that I am missing.
When using some image processing backend (e.g. Intel IPP), 10 and 12 bit encodings are often assumed to physically take 16 bits (unsigned short). The memory is read as "a number", so that a gray value of 1 (12 bit) encoded in the low bits yields a "1", while encoded in the high bits it yields a 16 (effectively, left shift by four).
Taking a low-bit image and the corresponding left shifted high-bit image and performing some operations (possibly including interpolation) will yield identical results after right-shifting the result of the high-bit input (for comparison's sake).
The main differences come when looking at the histograms of the images: the low-bit histogram is "dense" and 4k entries long, the high-bit histogram contains 15 zeros for every non-zero entry, and is 65k entries long.
3 a) Generating lookup tables (LUTs) for operations takes 16x as long, applying them should take identical time.
3 b) Operations scaling with histrogram^2 (e.g. gray scale cooccurrence matrices) become much more costly: 256x memory and time if done naively (but then again, in order to correctly treat any potential interpolated pixels with values not allowed in the original 12 bit encoding correctly, you kind of have to...)
3 c) Debugging histogram vectors is a PITA when looking at mostly-zero interspersed high-bit histograms.
To my great surprise, that was all I could come up with. Anything obvious that I am missing?

ImageMagick, What does Q8 vs Q16 actually mean?

Under Windows, I need to choose between Q8 and Q16. I know that Q8 are 8 bits-per-pixel component (e.g. 8-bit red, 8-bit green, etc.), whereas, Q16 are 16 bits-per-pixel component. I know also that Q16 uses twice memory as Q8. Therefore, I must choose carefully.
What is a 16 bits-per-pixel component? Does a jpeg image support 16 bits-per-pixel component? Does a picture one takes from a digital camera in a smartphone result in 8 bits-per-pixel component or 16 bits-per-pixel component?
I just need to load jpg images, crop/resize them and save. I also need to save the pictures in 2 different variants: one with the icc color profile management included and another without any icc profile (sRGB)
What is a 16 bits-per-pixel component?
Each "channel" (e.g. Red, Green, Blue) can have a value between 0x0000 (no color), and 0xFFFF (full color). This allows greater depth of color, and more precision calculations.
For example. A "RED" pixel displayed with QuantumDepth of 8...
$ convert -size 1x1 xc:red -depth 8 rgb:- | hexdump
0000000 ff 00 00
0000003
Same for QuantumDepth go 16 ...
$ convert -size 1x1 xc:red -depth 16 rgb:- | hexdump
0000000 ff ff 00 00 00 00
0000006
And for Q32..? You guessed it.
$ convert -size 1x1 xc:red -depth 32 rgb:- | hexdump
0000000 ff ff ff ff 00 00 00 00 00 00 00 00
000000c
All-n-all, more memory is allocated to represent a color value. It gets a little more complex with HDRI imaging.
does jpeg image support 16 bits-per-pixel component ? does picture we take from camera in smartphone are in 8 bits-per-pixel component or 16 bits-per-pixel component ?
I believe JPEG's are 8bit, but I can be wrong here. I do know that most photographers KEEP all RAW files from device because JPEG doesn't support all the detail captured by the camera sensor. Here's a great write-up with examples.
I just need to load jpg images, crop/resize them and save. I also need to save the pictures in 2 different variants: one with the icc color profile management included and another without any icc profile (sRGB)
ImageMagick was designed to be "Swiss-Army-Knife" of encoders & decoders (+ a large amount of features). When reading a file, it decodes the format into something called "Authenticate Pixels" to be managed internally. The default size of the internal storage can be configured at time of compile, and for convenience the pre-build binaries are offered as Q8, Q16, and Q32. Plus additional HDRI support.
If your focused on quality, Q16 is a safe option. Q8 will be way faster, but limiting at times.
Also, you can find answer here (.net package, but means the same) : https://github.com/dlemstra/Magick.NET/tree/main/docs#q8-q16-or-q16-hdri
Q8, Q16 or Q16-HDRI?
Versions with Q8 in the name are 8 bits-per-pixel component (e.g.
8-bit red, 8-bit green, etc.), whereas, Q16 are 16 bits-per-pixel
component. A Q16 version permits you to read or write 16-bit images
without losing precision but requires twice as much resources as the
Q8 version. The Q16-HDRI version uses twice the amount of memory as
the Q16. It is more precise because it uses a floating point (32
bits-per-pixel component) and it allows out-of-bound pixels (less than
0 and more than 65535). The Q8 version is the recommended version. If
you need to read/write images with a better quality you should use the
Q16 version instead.

Bag of Visual Words in Opencv

I am using BOW in opencv for clustering the features of variable size. However one thing is not clear from the documentation of the opencv and also i am unable to find the reason for this question:
assume: dictionary size = 100.
I use surf to compute the features, and each image has variable size descriptors e.g.: 128 x 34, 128 x 63, etc. Now in BOW each of them are clustered and I get a fixed descriptor size of 128 x 100 for a image. I know 100 is the cluster center created using kmeans clustering.
But I am confused in that, if image has 128 x 63 descriptors, than how come it clusters into 100 clusters which is impossible using kmeans UNLESS i convert the descriptor matrix to 1D. Wont converting to 1D will lose valid 128 dimensional information of a single key points?
I need to know how is the descriptor matrix manipulated to get 100 cluter centers from only 63 features.
Think it like this.
You have 10 cluster means total and 6 features for current image. First 3 of those features are closest to 5th mean and remaining 3 of them are closest to 7th, 8th, and 9th mean respectively. Then your feature will be like [0, 0, 0, 0, 3, 0, 1, 1, 1, 0] or normalized version of this. Which is 10 dimensional, and that is equal to number of cluster mean. So you can create 100000 dimensional vector from 63 features if you want.
But still I think there is something wrong, because after you applied BOW, your features should be 1x100 not 128x100. Your cluster means are 128x1 and you are assigning your 128x1 sized features (you hvae 34 128x1 feature for first image, 63 128x1 feature for second image, etc.) to those means. So in basic you are assigning 34 or 63 features to 100 means, your result should be 1x100.

How to make normalized cross correlation robust to small changes in uniform regions

the problem is described below:
Given 2 sets of data: A= {
91
87
85
85
84
90
85
83
86
86
90
86
84
89
93
87
89
91
95
97
91
92
97
101
101
},
B = {133
130
129
131
133
136
131
131
135
135
133
133
133
131
135
131
129
131
132
132
130
127
129
137
134
},
If A represent a set of pixels from a background image around (x,y) location, B represents another set of pixels around (x,y) from different image where the illumination changes.
The normalised cross correlation (NCC) calculated = 0.184138251
(from http://en.wikipedia.org/wiki/Cross-correlation#Normalized_cross-correlation)
Calculated NCC tells us the set A is different to set B. But in fact, A and B are the same group of pixels under the different illumination conditions.
It shows that NCC is very sensitive to the small changes in the data set whose relative variation is quite small. For example, if the ratio between standard deviation and mean is representing the relative variation in each data set, then the relative variation in set A = 0.057684745, in set B = 0.018484007.
Could anyone help me to figure out how to incorporate the relative variation factor in NCC formula, so the modified NCC is robust to the small changes in data sets where the variation within each set are very small?
Also, the output of modified NCC still needs to be -1 to 1.
Thanks a lot.
There are two issues here:
being robust to noise
being robust to illumination changes.
For noise robustness, I would suggest you to apply some denoising algorithm. Depending on your application, computational constraints, knowledge... you can try a simple median filtering, or more complicated bilateral filtering or non-local means. Each of these algorithms will preserve most of the fine structures of your images (which ar eimportant for the NCC).
Then, to be robust against illumination changes, you can start with applying a simple histogram matching procedure first. If it does not work well enough you should give a try to the Midway algorithm (pdf) that was developed by Julie Delon specifically for this case of stereo matching. It is relatively easy to implement (I did it in a few hours using OpenCV/C++).
Normalizing images before the correlation may help!
if you have a choice I suggest you to test the phase correlation
here you find a very interesting paper
I hope that helps

How are matrices stored in memory?

Note - may be more related to computer organization than software, not sure.
I'm trying to understand something related to data compression, say for jpeg photos. Essentially a very dense matrix is converted (via discrete cosine transforms) into a much more sparse matrix. Supposedly it is this sparse matrix that is stored. Take a look at this link:
http://en.wikipedia.org/wiki/JPEG
Comparing the original 8x8 sub-block image example to matrix "B", which is transformed to have overall lower magnitude values and much more zeros throughout. How is matrix B stored such that it saves much more memory over the original matrix?
The original matrix clearly needs 8x8 (number of entries) x 8 bits/entry since values can range randomly from 0 to 255. OK, so I think it's pretty clear we need 64 bytes of memory for this. Matrix B on the other hand, hmmm. Best case scenario I can think of is that values range from -26 to +5, so at most an entry (like -26) needs 6 bits (5 bits to form 26, 1 bit for sign I guess). So then you could store 8x8x6 bits = 48 bytes.
The other possibility I see is that the matrix is stored in a "zig zag" order from the top left. Then we can specify a start and an end address and just keep storing along the diagonals until we're only left with zeros. Let's say it's a 32-bit machine; then 2 addresses (start + end) will constitute 8 bytes; for the other non-zero entries at 6 bits each, say, we have to go along almost all the top diagonals to store a sum of 28 elements. In total this scheme would take 29 bytes.
To summarize my question: if JPEG and other image encoders are claiming to save space by using algorithms to make the image matrix less dense, how is this extra space being realized in my hard disk?
Cheers
The dct needs to be accompanied with other compression schemes that take advantage of the zeros/high frequency occurrences. A simple example is run length encoding.
JPEG uses a variant of Huffman coding.
As it says in "Entropy coding" a zig-zag pattern is used, together with RLE which will already reduce size for many cases. However, as far as I know the DCT isn't giving a sparse matrix per se. But it usually enhances the entropy of the matrix. This is the point where the compressen becomes lossy: The intput matrix is transferred with DCT, then the values are quantizised and then the huffman-encoding is used.
The most simple compression would take advantage of repeated sequences of symbols (zeros). A matrix in memory may look like this (suppose in dec system)
0000000000000100000000000210000000000004301000300000000004
After compression it may look like this
(0,13)1(0,11)21(0,12)43010003(0,11)4
(Symbol,Count)...
As my under stand, JPEG on only compress, it also drop data. After the 8x8 block transfer to frequent domain, it drop the in-significant (high-frequent) data, which means it only has to save the significant 6x6 or even 4x4 data. That it can has higher compress rate then non-lost method (like gif)

Resources