Reading GLTF vertex color encoding - gltf

I'm trying to parse the BoxVertexColors GLTF 2.0 file from the official test sample set and store the information in my own data structures. I've got the position, normals, etc to work, but the result parsing the vertex color mesh attribute is weird.
The .gltf states the format is 5126 (=float) and VEC4, the starting offset is 0x288. When looking into the buffer.bin file, the content at 0x288 starts with
1D E9 06 30 7A 7A 25 3C FE FF 7F 3F 51 D2 27 30
which doesn't look like floats to me, let alone an encoded color. I would have expected four floats representing RGBA. What am I missing ...?

What makes you think the data is incorrect?
(0x3006E91D, 0X3C257A7A, 0x3F7FFFFE, 0x3727D251) =>
(4.90802e-10, 0.0101, 1.0, 6.10531e-10) => (simplified)
(0, 0.0101, 1.0, 0)
Apart from the possibly nonsensical alpha value, this seems like a perfectly reasonable start position for what's clearly a computed range of colours spread across a cube.
The alpha value is ultimately irrelevant, because the associated material does not specify a blendMode setting, which in turn means it defaults to OPAQUE, and thus transparency is ignored.
Loading the model in https://gltf-viewer.donmccurdy.com/ and breakpointing confirms that the data is decoded as you'd expect:

Related

How are floating-point pixel values converted to integer values?

How does image library (such as PIL, OpenCV, etc) convert floating-point values to integer pixel values?
For example
import numpy as np
from PIL import Image
# Creates a random image and saves in a file
def get_random_img(m=0, s=1, fname='temp.png'):
im = m + s * np.random.randn(60, 60, 3) # For eg. min: -3.8947058634971179, max: 3.6822041760496904
print(im[0, 0]) # for eg. array([ 0.36234732, 0.96987366, 0.08343])
imp = Image.fromarray(im, 'RGB') # (*)
print(np.array(imp)[0, 0]) # [140 , 74, 217]
imp.save(fname)
return im, imp
For the above method, an example is provided in the comment (which is randomly produced). My question is: how does (*) convert ndarray (which can range from - infinity to plus infinity) to pixel values between 0 and 255?
I tried to investigate the Pil.Image.fromarray method and eventually ended by at line #798 d.decode(data) within Pil.Image.Image().frombytes method. I could find the implementation of decode method, thus unable to know what computation goes behind the conversion.
My initial thought was that maybe the method use minimum (to 0) and maximum (to 255) value from the array and then map all the other values accordingly between 0 and 255. But upon investigations, I found out that's not what is happening. Moreover, how does it handle when the values of the array range between 0 and 1 or any other range of values?
Some libraries assume that floating-point pixel values are between 0 and 1, and will linearly map that range to 0 and 255 when casting to 8-bit unsigned integer. Some others will find the minimum and maximum values and map those to 0 and 255. You should always explicitly do this conversion if you want to be sure of what happened to your data.
In general, a pixel does not need to be 8-bit unsigned integer. A pixel can have any numerical type. Usually a pixel intensity represents an amount of light, or a density of some sort, but this is not always the case. Any physical quantity can be sampled in 2 or more dimensions. The range of meaningful values thus depends on what is imaged. Negative values are often also meaningful.
Many cameras have 8-bit precision when converting light intensity to a digital number. Likewise, displays typically have an b-bit intensity range. This is the reason many image file formats store only 8-bit unsigned integer data. However, some cameras have 12 bits or more, and some processes derive pixel data with a higher precision that one does not want to quantize. Therefore formats such as TIFF and ICS will allow you to save images in just about any numeric format you can think of.
I'm afraid it has done nothing anywhere near as clever as you hoped! It has merely interpreted the first byte of the first float as a uint8, then the second byte as another uint8...
from random import random, seed
import numpy as np
from PIL import Image
# Generate repeatable random data, so other folks get the same results
np.random.seed(42)
# Make a single RGB pixel
im = np.random.randn(1, 1, 3)
# Print the floating point values - not that we are interested in them
print(im)
# OUTPUT: [[[ 0.49671415 -0.1382643 0.64768854]]]
# Save that pixel to a file so we can dump it
im.tofile('array.bin')
# Now make a PIL Image from it and print the uint8 RGB values
imp = Image.fromarray(im, 'RGB')
print(imp.getpixel((0,0)))
# OUTPUT: (124, 48, 169)
So, PIL has interpreted our data as RGB=124/48/169
Now look at the hex we dumped. It is 24 bytes long, i.e. 3 float64 (8-byte) values, one for red, one for green and one for blue for the 1 pixel in our image:
xxd array.bin
Output
00000000: 7c30 a928 2aca df3f 2a05 de05 a5b2 c1bf |0.(*..?*.......
00000010: 685e 2450 ddb9 e43f h^$P...?
And the first byte (7c) has become 124, the second byte (30) has become 48 and the third byte (a9) has become 169.
TLDR; PIL has merely taken the first byte of the first float as the Red uint8 channel of the first pixel, then the second byte of the first float as the Green uint8 channel of the first pixel and the third byte of the first float as the Blue uint8 channel of the first pixel.

How is Spark reading my image using the image format?

It might be a silly question but I can't figure out how Spark read my image using the spark.read.format("image").load(....) argument.
After importing my image which gives me the following:
>>> image_df.select("image.height","image.width","image.nChannels", "image.mode", "image.data").show()
+------+-----+---------+----+--------------------+
|height|width|nChannels|mode| data|
+------+-----+---------+----+--------------------+
| 430| 470| 3| 16|[4D 55 4E 4C 54 4...|
+------+-----+---------+----+--------------------+
I arrive to the conclusion that:
my image is 430x470 pixels,
my image is colored (RGB due to nChannels = 3) which is an openCV compatible-type,
my image mode is 16 which corresponds to a particular openCV byte-order.
Does someone knows which website/documentation I could browse to know more about it?
the data in the data column is of type Binary but:
when I run image_df.select("image.data").take(1) I got an output which seems to be only one array (see below).
>>> image_df.select("image.data").take(1)
# **1/** Here are the last elements of the result
....<<One Eternity Later>>....x92\x89\x8a\x8d\x84\x86\x89\x80\x84\x87~'))]
# 2/ I got also several part of the result which looks like:
.....\x89\x80\x80\x83z|\x7fvz}tpsjqtkrulsvmsvmsvmrulrulrulqtkpsjnqhnqhmpgmpgmpgnqhnqhn
qhnqhnqhnqhnqhnqhmpgmpgmpgmpgmpgmpgmpgmpgnqhnqhnqhnqhnqhnqhnqhnqhknejmdilcilchkbh
kbilcilckneloflofmpgnqhorioripsjsvmsvmtwnvypx{ry|sz}t{~ux{ry|sy|sy|sy|sz}tz}tz}tz}
ty|sy|sy|sy|sz}t{~u|\x7fv|\x7fv}.....
What come next are linked to the results displayed above. Those might be due to my lack of knowledge concerning openCV (or else). Nonetheless:
1/ I don't understand the fact that if I got an RGB image, I should have 3 matrix but the output finishes by .......\x84\x87~'))]. I was more thinking on obtaining something like [(...),(...),(...\x87~')].
2/ Is this part has a special meaning? Like those are the separator between each matrix or something?
To be more clear about what I'm trying to achieve, I want to process images to do pixel comparison between each images. Therefore, I want to know the pixel values for a given position in my image (I assume that if I have an RGB image, I shall have 3 pixel values for a given position).
Example: let's say that I have a webcam pointing to the sky only during the day and I want to know the values of a pixel at a position corresponding to the top left sky part, I found out that the concatenation of those values gives the colour Light Blue which says that the photo was taken on a sunny day. Let's say that the only possibility is that a sunny day takes the colour Light Blue.
Next I want to compare the previous concatenation with another concat of pixel values at the exact same position but from a picture taken the next day. If I found out that they are not equal then I conclude that the given picture was taken on a cloudy/rainy day. If equal then sunny day.
Any help on that would be highly appreciated. I have vulgarized my example for a better understanding but my goal is pretty much the same. I know that ML model can exist to achieve those stuff but I would be happy to try this first. My first goal is to split this column into 3 columns corresponding to each color code: a red matrix, a green matrix, a blue matrix
I think I have the logic. I used the keras.preprocessing.image.img_to_array() function to understand how the values are classified (since I have an RGB image, I must have 3 matrix: one for each color R G B). Posting that if someone wonder how it works, I might be wrong but I think I have something :
from keras.preprocessing import image
import numpy as np
from PIL import Image
# Using spark built-in data source
first_img = spark.read.format("image").schema(imageSchema).load(".....")
raw = first_img.select("image.data").take(1)[0][0]
np.shape(raw)
(606300,) # which is 470*430*3
# Using keras function
img = image.load_img(".../path/to/img")
yy = image.img_to_array(img)
>>> np.shape(yy)
(430, 470, 3) # the form is good but I have a problem of order since:
>>> raw[0], raw[1], raw[2]
(77, 85, 78)
>>> yy[0][0]
array([78., 85., 77.], dtype=float32)
# Therefore I used the numpy reshape function directly on raw
# to have 470 matrix of 3 lines and 470 columns:
array = np.reshape(raw, (430,470,3))
xx = image.img_to_array(array) # OPTIONAL and not used here
>>> array[0][0] == (raw[0],raw[1],raw[2])
array([ True, True, True])
>>> array[0][1] == (raw[3],raw[4],raw[5])
array([ True, True, True])
>>> array[0][2] == (raw[6],raw[7],raw[8])
array([ True, True, True])
>>> array[0][3] == (raw[9],raw[10],raw[11])
array([ True, True, True])
So if I understood well, spark will read the image as a big array - (606300,) here - where in fact each element are ordered and corresponds to their respective color shade (R G B).
After doing my little transformations, I obtain 430 matrix of 3 columns x 470 lines. Since my image is (470x430) for (WidthxHeight), each matrix corresponds to a pixel heigth position and inside each: 3 columns for each color and 470 lines for each width position.
Hope that helps someone :)!

Why does Metal constant with h suffix produce bad byte range results?

I have run into a very strange problem in my Metal shader that has to do with a byte value in the range (0, 255). This byte value is represented as a ushort that is converted to a half precision by code like (x / 255.0h). What is strange is that this literal constant divide seems to be optimized incorrectly when run on a A7 device (A10 does not do this). Has anyone else run into this? Is there some way I can write Metal code that is used only on this GPU family 1 device?
I found a workaround, by leaving the h suffix off of the inline constant:
// This method accepts 4 byte range input values and encodes them as a half4 vector
// that works properly on A7 class hardware. The issue with A7 devices is that
// there seems to be a compler bug or range issue with an operation like (x / 255.0h).
// What should be the same operation (x / 255.0) does not show the range problem on A7.
half4
encodeBytesAsHalf4(const ushort4 b4) {
return half4(b4.x/255.0, b4.y/255.0, b4.z/255.0, b4.a/255.0);
}

Converting RGB to grayscale/intensity

When converting from RGB to grayscale, it is said that specific weights to channels R, G, and B ought to be applied. These weights are: 0.2989, 0.5870, 0.1140.
It is said that the reason for this is different human perception/sensibility towards these three colors. Sometimes it is also said these are the values used to compute NTSC signal.
However, I didn't find a good reference for this on the web. What is the source of these values?
See also these previous questions: here and here.
The specific numbers in the question are from CCIR 601 (see Wikipedia article).
If you convert RGB -> grayscale with slightly different numbers / different methods,
you won't see much difference at all on a normal computer screen
under normal lighting conditions -- try it.
Here are some more links on color in general:
Wikipedia Luma
Bruce Lindbloom 's outstanding web site
chapter 4 on Color in the book by Colin Ware, "Information Visualization", isbn 1-55860-819-2;
this long link to Ware in books.google.com
may or may not work
cambridgeincolor :
excellent, well-written
"tutorials on how to acquire, interpret and process digital photographs
using a visually-oriented approach that emphasizes concept over procedure"
Should you run into "linear" vs "nonlinear" RGB,
here's part of an old note to myself on this.
Repeat, in practice you won't see much difference.
### RGB -> ^gamma -> Y -> L*
In color science, the common RGB values, as in html rgb( 10%, 20%, 30% ),
are called "nonlinear" or
Gamma corrected.
"Linear" values are defined as
Rlin = R^gamma, Glin = G^gamma, Blin = B^gamma
where gamma is 2.2 for many PCs.
The usual R G B are sometimes written as R' G' B' (R' = Rlin ^ (1/gamma))
(purists tongue-click) but here I'll drop the '.
Brightness on a CRT display is proportional to RGBlin = RGB ^ gamma,
so 50% gray on a CRT is quite dark: .5 ^ 2.2 = 22% of maximum brightness.
(LCD displays are more complex;
furthermore, some graphics cards compensate for gamma.)
To get the measure of lightness called L* from RGB,
first divide R G B by 255, and compute
Y = .2126 * R^gamma + .7152 * G^gamma + .0722 * B^gamma
This is Y in XYZ color space; it is a measure of color "luminance".
(The real formulas are not exactly x^gamma, but close;
stick with x^gamma for a first pass.)
Finally,
L* = 116 * Y ^ 1/3 - 16
"... aspires to perceptual uniformity [and] closely matches human perception of lightness." --
Wikipedia Lab color space
I found this publication referenced in an answer to a previous similar question. It is very helpful, and the page has several sample images:
Perceptual Evaluation of Color-to-Grayscale Image Conversions by Martin Čadík, Computer Graphics Forum, Vol 27, 2008
The publication explores several other methods to generate grayscale images with different outcomes:
CIE Y
Color2Gray
Decolorize
Smith08
Rasche05
Bala04
Neumann07
Interestingly, it concludes that there is no universally best conversion method, as each performed better or worse than others depending on input.
Heres some code in c to convert rgb to grayscale.
The real weighting used for rgb to grayscale conversion is 0.3R+0.6G+0.11B.
these weights arent absolutely critical so you can play with them.
I have made them 0.25R+ 0.5G+0.25B. It produces a slightly darker image.
NOTE: The following code assumes xRGB 32bit pixel format
unsigned int *pntrBWImage=(unsigned int*)..data pointer..; //assumes 4*width*height bytes with 32 bits i.e. 4 bytes per pixel
unsigned int fourBytes;
unsigned char r,g,b;
for (int index=0;index<width*height;index++)
{
fourBytes=pntrBWImage[index];//caches 4 bytes at a time
r=(fourBytes>>16);
g=(fourBytes>>8);
b=fourBytes;
I_Out[index] = (r >>2)+ (g>>1) + (b>>2); //This runs in 0.00065s on my pc and produces slightly darker results
//I_Out[index]=((unsigned int)(r+g+b))/3; //This runs in 0.0011s on my pc and produces a pure average
}
Check out the Color FAQ for information on this. These values come from the standardization of RGB values that we use in our displays. Actually, according to the Color FAQ, the values you are using are outdated, as they are the values used for the original NTSC standard and not modern monitors.
What is the source of these values?
The "source" of the coefficients posted are the NTSC specifications which can be seen in Rec601 and Characteristics of Television.
The "ultimate source" are the CIE circa 1931 experiments on human color perception. The spectral response of human vision is not uniform. Experiments led to weighting of tristimulus values based on perception. Our L, M, and S cones1 are sensitive to the light wavelengths we identify as "Red", "Green", and "Blue" (respectively), which is where the tristimulus primary colors are derived.2
The linear light3 spectral weightings for sRGB (and Rec709) are:
Rlin * 0.2126 + Glin * 0.7152 + Blin * 0.0722 = Y
These are specific to the sRGB and Rec709 colorspaces, which are intended to represent computer monitors (sRGB) or HDTV monitors (Rec709), and are detailed in the ITU documents for Rec709 and also BT.2380-2 (10/2018)
FOOTNOTES
(1) Cones are the color detecting cells of the eye's retina.
(2) However, the chosen tristimulus wavelengths are NOT at the "peak" of each cone type - instead tristimulus values are chosen such that they stimulate on particular cone type substantially more than another, i.e. separation of stimulus.
(3) You need to linearize your sRGB values before applying the coefficients. I discuss this in another answer here.
Starting a list to enumerate how different software packages do it. Here is a good CVPR paper to read as well.
FreeImage
#define LUMA_REC709(r, g, b) (0.2126F * r + 0.7152F * g + 0.0722F * b)
#define GREY(r, g, b) (BYTE)(LUMA_REC709(r, g, b) + 0.5F)
OpenCV
nVidia Performance Primitives
Intel Performance Primitives
Matlab
nGray = 0.299F * R + 0.587F * G + 0.114F * B;
These values vary from person to person, especially for people who are colorblind.
is all this really necessary, human perception and CRT vs LCD will vary, but the R G B intensity does not, Why not L = (R + G + B)/3 and set the new RGB to L, L, L?

linear transformation function

I need to write a function that takes 4 bytes as input, performs a reversible linear transformation on this, and returns it as 4 bytes.
But wait, there is more: it also has to be distributive, so changing one byte on the input should affect all 4 output bytes.
The issues:
if I use multiplication it won't be reversible after it is modded 255 via the storage as a byte (and its needs to stay as a byte)
if I use addition it can't be reversible and distributive
One solution:
I could create an array of bytes 256^4 long and fill it in, in a one to one mapping, this would work, but there are issues: this means I have to search a graph of size 256^8 due to having to search for free numbers for every value (should note distributivity should be sudo random based on a 64*64 array of byte). This solution also has the MINOR (lol) issue of needing 8GB of RAM, making this solution nonsense.
The domain of the input is the same as the domain of the output, every input has a unique output, in other words: a one to one mapping. As I noted on "one solution" this is very possible and I have used that method when a smaller domain (just 256) was in question. The fact is, as numbers get big that method becomes extraordinarily inefficient, the delta flaw was O(n^5) and omega was O(n^8) with similar crappiness in memory usage.
I was wondering if there was a clever way to do it. In a nutshell, it's a one to one mapping of domain (4 bytes or 256^4). Oh, and such simple things as N+1 can't be used, it has to be keyed off a 64*64 array of byte values that are sudo random but recreatable for reverse transformations.
Balanced Block Mixers are exactly what you're looking for.
Who knew?
Edit! It is not possible, if you indeed want a linear transformation. Here's the mathy solution:
You've got four bytes, a_1, a_2, a_3, a_4, which we'll think of as a vector a with 4 components, each of which is a number mod 256. A linear transformation is just a 4x4 matrix M whose elements are also numbers mod 256. You have two conditions:
From Ma, we can deduce a (this means that M is an invertible matrix).
If a and a' differ in a single coordinate, then Ma and Ma' must differ in every coordinate.
Condition (2) is a little trickier, but here's what it means. Since M is a linear transformation, we know that
M(a - a) = Ma - Ma'
On the left, since a and a' differ in a single coordinate, a - a has exactly one nonzero coordinate. On the right, since Ma and Ma' must differ in every coordinate, Ma - Ma' must have every coordinate nonzero.
So the matrix M must take a vector with a single nonzero coordinate to one with all nonzero coordinates. So we just need every entry of M to be a non-zero-divisor mod 256, i.e., to be odd.
Going back to condition (1), what does it mean for M to be invertible? Since we're considering it mod 256, we just need its determinant to be invertible mod 256; that is, its determinant must be odd.
So you need a 4x4 matrix with odd entries mod 256 whose determinant is odd. But this is impossible! Why? The determinant is computed by summing various products of entries. For a 4x4 matrix, there are 4! = 24 different summands, and each one, being a product of odd entries, is odd. But the sum of 24 odd numbers is even, so the determinant of such a matrix must be even!
Here are your requirements as I understand them:
Let B be the space of bytes. You want a one-to-one (and thus onto) function f: B^4 -> B^4.
If you change any single input byte, then all output bytes change.
Here's the simplest solution I have thusfar. I have avoided posting for a while because I kept trying to come up with a better solution, but I haven't thought of anything.
Okay, first of all, we need a function g: B -> B which takes a single byte and returns a single byte. This function must have two properties: g(x) is reversible, and x^g(x) is reversible. [Note: ^ is the XOR operator.] Any such g will do, but I will define a specific one later.
Given such a g, we define f by f(a,b,c,d) = (a^b^c^d, g(a)^b^c^d, a^g(b)^c^d, a^b^g(c)^d). Let's check your requirements:
Reversible: yes. If we XOR the first two output bytes, we get a^g(a), but by the second property of g, we can recover a. Similarly for the b and c. We can recover d after getting a,b, and c by XORing the first byte with (a^b^c).
Distributive: yes. Suppose b,c, and d are fixed. Then the function takes the form f(a,b,c,d) = (a^const, g(a)^const, a^const, a^const). If a changes, then so will a^const; similarly, if a changes, so will g(a), and thus so will g(a)^const. (The fact that g(a) changes if a does is by the first property of g; if it didn't then g(x) wouldn't be reversible.) The same holds for b and c. For d, it's even easier because then f(a,b,c,d) = (d^const, d^const, d^const, d^const) so if d changes, every byte changes.
Finally, we construct such a function g. Let T be the space of two-bit values, and h : T -> T the function such that h(0) = 0, h(1) = 2, h(2) = 3, and h(3) = 1. This function has the two desired properties of g, namely h(x) is reversible and so is x^h(x). (For the latter, check that 0^h(0) = 0, 1^h(1) = 3, 2^h(2) = 1, and 3^h(3) = 2.) So, finally, to compute g(x), split x into four groups of two bits, and take h of each quarter separately. Because h satisfies the two desired properties, and there's no interaction between the quarters, so does g.
I'm not sure I understand your question, but I think I get what you're trying to do.
Bitwise Exclusive Or is your friend.
If R = A XOR B, R XOR A gives B and R XOR B gives A back. So it's a reversible transformation, assuming you know the result and one of the inputs.
Assuming I understood what you're trying to do, I think any block cipher will do the job.
A block cipher takes a block of bits (say 128) and maps them reversibly to a different block with the same size.
Moreover, if you're using OFB mode you can use a block cipher to generate an infinite stream of pseudo-random bits. XORing these bits with your stream of bits will give you a transformation for any length of data.
I'm going to throw out an idea that may or may not work.
Use a set of linear functions mod 256, with odd prime coefficients.
For example:
b0 = 3 * a0 + 5 * a1 + 7 * a2 + 11 * a3;
b1 = 13 * a0 + 17 * a1 + 19 * a2 + 23 * a3;
If I remember the Chinese Remainder Theorem correctly, and I haven't looked at it in years, the ax are recoverable from the bx. There may even be a quick way to do it.
This is, I believe, a reversible transformation. It's linear, in that af(x) mod 256 = f(ax) and f(x) + f(y) mod 256 = f(x + y). Clearly, changing one input byte will change all the output bytes.
So, go look up the Chinese Remainder Theorem and see if this works.
What you mean by "linear" transformation?
O(n), or a function f with f(c * (a+b)) = c * f(a) + c * f(b)?
An easy approach would be a rotating bitshift (not sure if this fullfils the above math definition). Its reversible and every byte can be changed. But with this it does not enforce that every byte is changed.
EDIT: My solution would be this:
b0 = (a0 ^ a1 ^ a2 ^ a3)
b1 = a1 + b0 ( mod 256)
b2 = a2 + b0 ( mod 256)
b3 = a3 + b0 ( mod 256)
It would be reversible (just subtract the first byte from the other, and then XOR the 3 resulting bytes on the first), and a change in one bit would change every byte (as b0 is the result of all bytes and impacts all others).
Stick all of the bytes into 32-bit number and then do a shl or shr (shift left or shift right) by one, two or three. Then split it back into bytes (could use a variant record). This will move bits from each byte into the adjacent byte.
There are a number of good suggestions here (XOR, etc.) I would suggest combining them.
You could remap the bits. Let's use ii for input and oo for output:
oo[0] = (ii[0] & 0xC0) | (ii[1] & 0x30) | (ii[2] & 0x0C) | (ii[3] | 0x03)
oo[1] = (ii[0] & 0x30) | (ii[1] & 0x0C) | (ii[2] & 0x03) | (ii[3] | 0xC0)
oo[2] = (ii[0] & 0x0C) | (ii[1] & 0x03) | (ii[2] & 0xC0) | (ii[3] | 0x30)
oo[3] = (ii[0] & 0x03) | (ii[1] & 0xC0) | (ii[2] & 0x30) | (ii[3] | 0x0C)
It's not linear, but significantly changing one byte in the input will affect all the bytes in the output. I don't think you can have a reversible transformation such as changing one bit in the input will affect all four bytes of the output, but I don't have a proof.

Resources