Machine learning non linear hypothesis - machine-learning

Can anyone tell me how 50 × 50 pixel images is 7500 if it's in RGB. This is something non linear hypothesis example from Andrew Ng machine learning

50x50 pixels, 3 values for each pixel
So 50*50*3 = 7500
3 being the three colors in RGB (red green blue)

Related

iOS Metal. Why does simply changing colorPixelFormat result in brighter imagery?

In Metal on iOS the default colorPixelFormat is bgra8Unorm. When I change format to rgba16Float all imagery brightens. Why?
An example:
Artwork
MTKView with format bgra8Unorm.
Texture-mapped quad. Texture created with SRGB=false.
MTKView with format rgba16Float.
Texture-mapped quad. Texture created with SRGB=false.
Why is everything brighter with rgba16Float. My understanding is that SRGB=false implies that no gamma correction is done when importing artwork. The assumption is the artwork has no gamma applied.
What is going on here?
If your artwork has a gamma (it does per the first image you uploaded), you have to convert it to a linear gamma if you want to use it in a linear space.
What is happening here is you are displaying gamma encoded values of the image in a linear workspace, without using color management or transform to convert those values.
BUT: Reading some of your comments, is the texture not an image but an .svg?? Did you convert your color values to linear space?
Here's the thing: RGB values are meaningless numbers unless you define how those RGB values relate to a given space.
#00FF00 in sRGB is a different color than #00FF00 in Adobe98 for instance. In your case you are going linear, but what primaries? Still using sRGB primaries? P3 Primaries? I'm not seeing a real hue shift, so I assume you are using sRGB primaries and a linear transfer curve for the second example.
THAT SAID, an RGB value of the top middle kid's green shirt is #8DB54F, normalized to 0-1, that's 0.553 0.710 0.310 .These numbers by themselves don't know if they are gamma encoded or not.
THE RELATIONSHIP BETWEEN sRGB, Y, and Light:
For the purposes of this discussion, we will assume the SIMPLE sRGB gamma of 1/2.2 and not the piecewise version. Same for L*
In sRGB, #8DB54F when displayed on an sRGB monitor with a sRGB gamma curve, the luminance (Y) is 39
This can be found by
(0.553^2.2)*0.2126 + (0.710^2.2)*0.7152 + (0.310^2.2)*0.0722
or 0.057 + 0.33 + 0.0061 = 0.39 and 0.39 * 100 = 39 (Y)
But if color management is told the values are linear, then the gamma correction is discarded, and (more or less):
0.553*0.2126 + 0.710*0.7152 + 0.310*0.0722
or 0.1175 + 0.5078 + 0.0223 = 0.65 and 0.65 * 100 = 65 (Y)
(Assuming the same coefficients are used.)
Luminance (Y) is linear, like light. But human perception is not, and neither are sRGB values.
Y is the linear luminance from CIEXYZ, while it is spectrally weighted based on the eye's response to different wavelengths, it is NOT uniform in terms of lightness. On a scale of 0-100, 18.4 is perceived as the middle.
L* is a perceptual lightness from CIELAB (L* a* b*), it is (simplified curve of):
L* = Y^0.42 On a scale of 0-100, L* 50 is the "perceived middle" value. So that green shirt at Y 39 is L* 69 when interpreted and displayed as sRGB, and the Y 65 is about L* 84 (those numbers are based on the math, here are the values per the color picker on my MacBook):
sRGB is a gamma encoded signal, done to make the best use of the limited bit depth of 8bits per channel. The effective gamma curve is similar to human perception so that more bits are used to define darker areas as human perception is more sensitive to luminance changes in dark regions. As noted above it is a simplified curve of:
sRGB_Video = Linear_Video^0.455 (And to be noted, the MONITOR adds an exponent of about 1.1)
So if 0% is black and 100% is white, then middle gray, the point most humans will say is in between 0% and 100% is:
Y 18.4%. = L* 50% = sRGB 46.7%
That is, an sRGB hex value of #777777 will display a luminance of 18.4 Y, and is equivalent to a perceived lightness of 50 L*. Middle Grey.
BUT WAIT, THERE'S MORE
So what is happening, you are telling MTKView that you are sending it image data that references linear values. But you are actually sending it sRGB values which are lighter due to the applied gamma correction. And then color management is taking what it thinks are linear values, and transforming them to the needed values for the output display.
Color management needs to know what the values mean, what colorspace they relate to. When you set SRGB=false then you are telling it that you are sending it linear values, not gamma encoded values.
BUT you are clearly sending gamma encoded values into a linear space without transforming/decoding the values to linear. Linearization won't happen unless you implicitly do so.
SOLUTION
Linearize the image data OR set the flag SRGB=true
Please let me know if you have further questions. But also, you may wish to see the Poynton Gamma FAQ or also the Color FAQ for clarification.
Also, for your grey: A linear value of 0.216 is equivalent to an sRGB (0-1) value of 0.500

Relationship of standard deviation for Gaussian filter between pixel domain and the real world

I constructed an experiment with Gaussian blur in real world and MR images. I printed some test images blurred and compare augmented images blurred too.
What is the best way to express how much blurring I applied in real-world coordinates?
The image is 2560x1440 pixels, corresponding to 533x300 cm in the real world. If this image is blurred with a Gaussian with standard deviation n (filter size is ceil(3 * n) * 2 + 1), how can this be expressed in centimeters? Is it reasonable to express it as the real size of the filter in centimeters?
In short, yes, it is perfectly reasonable to express the size of the kernel in real-world coordinates.
In your case, you have 533 cm == 2560 pixels horizontally, which is 0.2082 cm per pixel. (Please edit if the question has a mistake and this should be mm instead of cm.) Vertically you have approximately the same, so we can assume isotropic sampling and leave it at 0.208 cm/px.
Given that pixel size, a standard deviation of the Gaussian of n is equivalent to a standard deviation of 0.208*n cm in the real world.

Cluster centers in k-means?

I have a 96x96 pixel grayscale facial images. i am trying to find the eye centers and lip corners. I applied one gabor filter (theta=pi/2, lamda=1.50) on the facial image and after convolving i get the filter output like this.
As you can see from the gabor output eyes and mouth corners are clearly distinguishable. i apply scikit kmeans clustering to group pixels together to 4 clusters (2 eyes and 2 lip corner)
data = output.reshape(-1,96*96)
estimator = KMeans(n_clusters=4)
estimator.fit(data)
centroids = np.asarray(estimator.cluster_centers_)
print 'Cluster centers', centroids.shape
print 'Labels', estimator.labels_, estimator.labels_.shape
Output
Input X,y: (100, 96, 96) (1783, 1)
Gabor Filters (1, 9, 9)
Final output X,y (100, 96, 96) (0,)
Shape estimator.cluster_centers_: (4, 9216)
Now comes the question: How do i plot the centroids x,y coordinates of the 4 cluster centers? Will i see the eye centers and mouth corners
Further information: I plot the estimator.cluster_centers_ and the output is like a code book. i see no coordinates of cluster centroids.
I am using the steps as described in this paper: http://jyxy.tju.edu.cn/Precision/MOEMS/doc/p36.pdf
I think there's some confusion here about the space in which you're doing your K-means clustering. In the code snippet that you included in your question, you're training up a KMeans model using the vectorized face images as data points. K-means clusters live in the same space as the data you give it, so (as you noticed) your cluster centroids will also be vectorized face images. Importantly, these face images have dimension 9216, not dimension 2 (i.e., x-y coordinates)!
To get 2-dimensional (x, y) coordinates as your K-means centroids, you need to run the algorithm using 2-dimensional input data. Just off the top of my head, it seems like you could apply a darkness threshold to your face images and assemble a clustering dataset of only the dark pixel locations. Then after you run K-means on this dataset, the centroids will hopefully be close to the pixel locations in your face images where there are the most dark pixels. These locations (assuming the face images in your training data were already somewhat registered) should be somewhat close to the eyes and mouth corners that you're hoping for.
This can be really confusing so I'll try to add an example. Let's say just for an example you have "face images" that are 3 pixels wide by 4 pixels tall. After thresholding the pixels in one of your images it might look like:
0 1 2 <-- x coordinates
0 0 0 0 ^ y coordinates
0 1 0 1 |
1 0 0 2 |
0 0 1 3 v
If you use this "image" directly in K-means, you're really running your K-means algorithm in a 12-dimensional space, and the image above would be vectorized as:
0 0 0 0 1 0 1 0 0 0 0 1
Then your K-means cluster centroids will also live in this same 12-dimensional space.
What I'm trying to suggest is that you could extract the (x, y) coordinates of the 1s in each image, and use those as the data for your K-means algorithm. So for the example image above, you'd get the following data points:
1 1
0 2
2 3
In this example, we've extracted 3 2-dimensional points from this one "image"; with more images you'd get more 2-dimensional points. After you run K-means with these 2-dimensional data points, you'll get cluster centroids that can also be interpreted as pixel locations in the original images. You could plot those centroid locations on top of your images and see where they correspond in the images.

Feature dimension of RGB color histogram?

I am unsure about this but I want to compute features around interest points computed by surf using RGB color Histogram. I guess the final feature will be 256 dimensional long. However, I am unsure if this is correct.
The dimension of the RGB color histogram is determined by how many bins you use for each channel. The dimension will be 24 (8+8+8) if you use 8 bins for each of them.

Viola Jones Face Detection framework

I have just started studying the Viola-Jones face detection algorithm to design a face recognition system. off all the things i understood, I have confusion regarding the phrase:"sum of pixels". Does it means sum of colors at given pixels or the sum of distance of the given pixels?
Generally if you see something like that they're talking about the value of a pixel (its intensity). According to OpenCV, the value of a pixel is calculated as 0.299 R + 0.587 G + 0.114 B. This is how OpenCV converts to grayscale, btw. So when they talk about the sum of pixels, they're likely talking about the sum of the pixel values in a given region (i.e. for a 3x3 region, take the value of each pixel and sum it up).
The sum of pixels is the cumulative sum of pixels along the 2 dimensions of the image
Please check cumsum function in Matlab.
For example:
I = cumsum(cumsum(double(image)),2)
Check out this link for some good info on the Viola Jones Face Detection technique
Good Luck!

Resources