Drawing rgb5a1 texture format in metal - metal

I have some textures that I drawn in OpenGL with the unsigned_short_5_5_5_1 and rgba format and i need to draw them in metal. The issue is that metal only supports the bgr5a1 pixel format and i tried to change the data of the textures like this:
for (int i = count; --i; rgba++, bgra++)
{
uint16_t nr = *rgba;
nr = (((nr >> 1) & 31) << 11) + (nr & 1984) + ((nr >> 11) << 1) + (nr & 1);
*bgra = nr;
}
In the shader I have this code:
const half4 colorSample = colorTexture.sample(textureSampler, in.texCoords.xy);
return float4(colorSample);
However, the colors are not right and I cannot figure out why.Metal example
OpenGL example

Related

Is there a direct way to get a unique value representing RGB color in opencv C++

My image is a RGB image. I want to get a unique value (such as a unicode value) to represent the RGB color value of a certain pixel. For example If the pixels red channel=23, Green channel=200,Blue channel=45 this RGB color could be represented by 232765. I wish if there is a direct opencv c++ function to get such a value from a pixel. And note that this unique value should be unique for that RGB value.
I want something like this and I know this is not correct.
uniqueColorForPixel_i_j=(matImage.at<Vec3b>(i,j)).getUniqueColor();
I hope something could be done if we can get the Scalar value of a pixel. And as in the way RNG can generate a random Scalar RGB value from number, can we get the inverse...
Just a small sample code to show how to pass directly a Vec3b to the function, and an alternative way to shift-and approach.
The code is based on this answer.
UPDATE
I added also a simple struct BGR, that will handle more easily the conversion between Vec3b and unsigned.
UPDATE 2
The code in your question:
uniqueColorForPixel_i_j=(matImage.at<Vec3b>(i,j)).getUniqueColor();
doesn't work because you're trying to call the method getUniqueColor() on a Vec3b which hasn't this method. You should instead pass the Vec3b as the argument of unsigned getUniqueColor(const Vec3b& v);.
The code should clarify this:
#include <opencv2\opencv.hpp>
using namespace cv;
unsigned getUniqueColor_v1(const Vec3b& v)
{
return ((v[2] & 0xff) << 16) + ((v[1] & 0xff) << 8) + (v[0] & 0xff);
}
unsigned getUniqueColor_v2(const Vec3b& v)
{
return 0x00ffffff & *((unsigned*)(v.val));
}
struct BGR
{
Vec3b v;
unsigned u;
BGR(const Vec3b& v_) : v(v_){
u = ((v[2] & 0xff) << 16) + ((v[1] & 0xff) << 8) + (v[0] & 0xff);
}
BGR(unsigned u_) : u(u_) {
v[0] = uchar(u & 0xff);
v[1] = uchar((u >> 8) & 0xff);
v[2] = uchar((u >> 16) & 0xff);
}
};
int main()
{
Vec3b v(45, 200, 23);
unsigned col1 = getUniqueColor_v1(v);
unsigned col2 = getUniqueColor_v2(v);
unsigned col3 = BGR(v).u;
// col1 == col2 == col3
//
// hex: 0x0017c82d
// dec: 1558573
Vec3b v2 = BGR(col3).v;
// v2 == v
//////////////////////////////
// Taking values from a mat
//////////////////////////////
// Just 2 10x10 green mats
Mat mat1(10, 10, CV_8UC3);
mat1.setTo(Vec3b(0, 255, 0));
Mat3b mat2(10, 10, Vec3b(0, 255, 0));
int row = 2;
int col = 3;
unsigned u1 = getUniqueColor_v1(mat1.at<Vec3b>(row, col));
unsigned u2 = BGR(mat1.at<Vec3b>(row, col)).u;
unsigned u3 = getUniqueColor_v1(mat2(row, col));
unsigned u4 = BGR(mat2(row, col)).u;
// u1 == u2 == u3 == u4
return 0;
}

ios metal: multiple kernel calls in one command buffer

I'm having a problem with the implementation of multiple kernel functions in Metal in combination with Swift.
My target is to implement a block-wise DCT transformation over an image. The DCT is implemented with two matrix multiplications.
J = H * I * H^-1
The following code shows the kernel functions itself and the used calls in the swift code. If I run each kernel function alone it works but i can't manage to hand over the write buffer from the first kernel function to the second function. The second function therefore always returns a buffer filled with just 0.
All the image input and output buffers are 400x400 big with RGB (16-bit Integer for each component). The matrices are 8x8 16-bit Integers.
Is there a special command needed to synchronize the buffer read and write accesses of the different kernel functions? Or am I doing something else wrong?
Thanks for your help
shaders.metal
struct Image3D16{
short data[400][400][3];
};
struct Matrix{
short data[8 * 8];
};
kernel void dct1(device Image3D16 *inputImage [[buffer(0)]],
device Image3D16 *outputImage [[buffer(1)]],
device Matrix *mult [[buffer(2)]],
uint2 gid [[thread_position_in_grid]],
uint2 tid [[thread_position_in_threadgroup]]){
int red = 0, green = 0, blue = 0;
for(int x=0;x<8;x++){
short r = inputImage->data[gid.x-tid.x + x][gid.y][0];
short g = inputImage->data[gid.x-tid.x + x][gid.y][1];
short b = inputImage->data[gid.x-tid.x + x][gid.y][2];
red += r * mult->data[tid.x*8 + x];
green += g * mult->data[tid.x*8 + x];
blue += b * mult->data[tid.x*8 + x];
}
outputImage->data[gid.x][gid.y][0] = red;
outputImage->data[gid.x][gid.y][1] = green;
outputImage->data[gid.x][gid.y][2] = blue;
}
kernel void dct2(device Image3D16 *inputImage [[buffer(0)]],
device Image3D16 *outputImage [[buffer(1)]],
device Matrix *mult [[buffer(2)]],
uint2 gid [[thread_position_in_grid]],
uint2 tid [[thread_position_in_threadgroup]]){
int red = 0, green = 0, blue = 0;
for(int y=0;y<8;y++){
short r = inputImage->data[gid.x][gid.y-tid.y + y][0];
short g = inputImage->data[gid.x][gid.y-tid.y + y][1];
short b = inputImage->data[gid.x][gid.y-tid.y + y][2];
red += r * mult->data[tid.y*8 + y];
green += g * mult->data[tid.y*8 + y];
blue += b * mult->data[tid.y*8 + y];
}
outputImage->data[gid.x][gid.y][0] = red;
outputImage->data[gid.x][gid.y][1] = green;
outputImage->data[gid.x][gid.y][2] = blue;
}
ViewController.swift
...
let commandBuffer = commandQueue.commandBuffer()
let computeEncoder1 = commandBuffer.computeCommandEncoder()
computeEncoder1.setComputePipelineState(computeDCT1)
computeEncoder1.setBuffer(input, offset: 0, atIndex: 0)
computeEncoder1.setBuffer(tmpBuffer3D1, offset: 0, atIndex: 1)
computeEncoder1.setBuffer(dctMatrix1, offset: 0, atIndex: 2)
computeEncoder1.dispatchThreadgroups(blocks, threadsPerThreadgroup: dctSize)
computeEncoder1.endEncoding()
let computeEncoder2 = commandBuffer.computeCommandEncoder()
computeEncoder2.setComputePipelineState(computeDCT2)
computeEncoder2.setBuffer(tmpBuffer3D1, offset: 0, atIndex: 0)
computeEncoder2.setBuffer(output, offset: 0, atIndex: 1)
computeEncoder2.setBuffer(dctMatrix2, offset: 0, atIndex: 2)
computeEncoder2.dispatchThreadgroups(blocks, threadsPerThreadgroup: dctSize)
computeEncoder2.endEncoding()
commandBuffer.commit()
commandBuffer.waitUntilCompleted()
I found the error. My kernel function tried to read outside of its allocated memory. The reaction of the metal interface is then to stop the execution of all following commands in the command buffer. Therefore was the output always zero because the computation was never done. The GPU usage of the application drops which can be used for detecting the error.

How to efficiently merge two overlapping contours into one big contour?

I have a huge image ( about 63000 x 63000 pixels = 3969 Megapixels )
what i have done so far is i decided to make "tiles" of (1024 x 1024) and do my calculations based on these tiles, resulting in an 62 x 62 image tile grid!
(this works out very well and has the advantage of making the image viewable with zoom-in and zoom out, only viewn tiles are downsized for example)
But what i need now are the contours from the huge image!
i use the OpenCV function "findContours" to detect contours on each
one of the tiles.
i have added some overlap in the tiles so i get
overlapping contours ( 1 pixel overlap )
i used the offset parameter
of "findContours" to shift the contours to the right position
into the "virtual total image"
Here are some screenshot's i made from a demo application
What I want is this:
Now my questions:
is it possible to stitch the contours, my worst case is a contour which covers the total image... is there some library that can do this?
is there a library which works on a compressed version of the total image ( like rle for example )
is there a way to make opencv findcontours work on 1 bit binary images ?
Here's the code used by findcontours:
// Surf2DTiledData ...a gobject based class used for 2d tile management and viewing..
Surf2DTiledData* td = (Surf2DTiledData*)in_td;
int nr_hor_tiles = surf2_d_tiled_data_get_nr_hor_tiles(td);
int nr_ver_tiles = surf2_d_tiled_data_get_nr_ver_tiles(td);
int tile_size_x = surf2_d_tiled_data_get_tile_width(td);
int tile_size_y = surf2_d_tiled_data_get_tile_height(td);
contouring_data_obj = surf2_d_tiled_data_get_ContouringData(td);
p_contours = contouring_data_obj->p_contours;
p_border_contours = contouring_data_obj->p_border_contours;
g_return_if_fail(p_border_contours != NULL);
g_return_if_fail(p_contours != NULL);
for (y = 0; y < nr_ver_tiles; y++){
int x;
for (x = 0; x < nr_hor_tiles; x++){
int idx = x + y*nr_hor_tiles;
CvMemStorage *mem = contouring_data_obj->contour_storage[idx];
CvMat _src;
CvSeq *contours = NULL;
uchar* dataBuffer = (uchar*)p_data[x][y];
// the idea is to have some extra space available for the overlap
// detection of contours!
// the extra space is needed for the algorithm to check for
// overlaps of contours later on!
#define VIRT_BORDER_EXTEND 2
int virtual_x = x * tile_size_x - VIRT_BORDER_EXTEND;
int virtual_y = y * tile_size_y - VIRT_BORDER_EXTEND;
int virtual_width = tile_size_x + VIRT_BORDER_EXTEND * 2;
int virtual_height = tile_size_y + VIRT_BORDER_EXTEND * 2;
int x_off = -VIRT_BORDER_EXTEND;
int y_off = -VIRT_BORDER_EXTEND;
if (virtual_x < 0) {
virtual_width += virtual_x;
virtual_x = 0;
x_off = 0;
}
if (virtual_y < 0) {
virtual_height += virtual_y;
virtual_y = 0;
y_off = 0;
}
if ((virtual_x + virtual_width) > (nr_hor_tiles*tile_size_x)) {
virtual_width = nr_hor_tiles*tile_size_x - virtual_x;
}
if ((virtual_y + virtual_height) > (nr_ver_tiles*tile_size_y)) {
virtual_height = nr_ver_tiles*tile_size_y - virtual_y;
}
CvMat* _roi_mat = get_roi_mat(td,
virtual_x, virtual_y,
virtual_width, virtual_height);
// Use either this:
//mem = cvCreateMemStorage(0);
if (_roi_mat){
// CV_LINK_RUNS => different algorithm!!!!
int tile_off_x = tile_size_x * x;
int tile_off_y = tile_size_y * y;
CvPoint contour_shift = cvPoint(x_off + tile_off_x, y_off + tile_off_y);
int n = cvFindContours(_roi_mat, mem, &contours, sizeof(CvContour), CV_RETR_LIST, CV_CHAIN_APPROX_SIMPLE, contour_shift);
cvReleaseMat(&_roi_mat);
p_contours[x][y] = contours;
}
//cvReleaseMemStorage(&mem);
}
}
later i used opengl to make textures out of the tiles and for every tile there is a quad !
the opencv contours are not drawn as this could be too slow for now, but i draw their bounding boxes... which are drawn in opengl too..

accelerate rgb planar to rgba interleaved conversion using sse or mmx

I have to pass medical image data retrieved from one proprietary device SDK to an image processing function in another - also proprietary - device SDK from a second vendor.
The first function gives me the image in a planar rgb format:
int mrcpgk_retrieve_frame(uint16_t *r, uint16_t *g, uint16_t *b, int w, int h);
The reason for uint16_t is that the device can be switched to output each color value encoded as 16-bit floating point values. However, I'm operating in "byte mode" and thus the upper 8 bits of each color value are always zero.
The second function from another device SDK is defined like this:
BOOL process_cpgk_image(const PBYTE rgba, DWORD width, DWORD height);
So we get filled three buffers with the following bits: (16bit planar rgb)
R: 0000000 rrrrrrrr 00000000 rrrrrrrr ...
G: 0000000 gggggggg 00000000 gggggggg ...
B: 0000000 bbbbbbbb 00000000 bbbbbbbb ...
And the desired output illustrated in bits is:
RGBA: rrrrrrrrggggggggbbbbbbbb00000000 rrrrrrrrggggggggbbbbbbbb00000000 ....
We don't have access to the source code of these functions and cannot change the environment. Currently we have implemented the following basic "bridge" to connect the two devices:
void process_frames(int width, int height)
{
uint16_t *r = (uint16_t*)malloc(width*height*sizeof(uint16_t));
uint16_t *g = (uint16_t*)malloc(width*height*sizeof(uint16_t));
uint16_t *b = (uint16_t*)malloc(width*height*sizeof(uint16_t));
uint8_t *rgba = (uint8_t*)malloc(width*height*4);
int i;
memset(rgba, 0, width*height*4);
while ( mrcpgk_retrieve_frame(r, g, b, width, height) != 0 )
{
for (i=0; i<width*height; i++)
{
rgba[4*i+0] = (uint8_t)r[i];
rgba[4*i+1] = (uint8_t)g[i];
rgba[4*i+2] = (uint8_t)b[i];
}
process_cpgk_image(rgba, width, height);
}
free(r);
free(g);
free(b);
free(rgba);
}
This code works perfectly fine but processing takes very long for many thousands of high resolution images. The two functions for processing and retrieving are very fast and our bridge is currently the bottleneck.
I know how to do basic arithmetic, logical and shifting operations with SSE2 intrinsics but I wonder if and how this 16bit planar rgb to packed rgba conversion can be accelerated with MMX, SSE2 or [S]SSE3?
(SSE2 would be preferable because there are still some pre-2005 appliances in use).
Here is a simple SSE2 implementation:
#include <emmintrin.h> // SSE2 intrinsics
assert((width*height)%8 == 0); // NB: total pixels must be multiple of 8
for (i=0; i<width*height; i+=8)
{
__m128i vr = _mm_load_si128((__m128i *)&r[i]); // load 8 pixels from r[i]
__m128i vg = _mm_load_si128((__m128i *)&g[i]); // load 8 pixels from g[i]
__m128i vb = _mm_load_si128((__m128i *)&b[i]); // load 8 pixels from b[i]
__m128i vrg = _mm_or_si128(vr, _mm_slli_epi16(vg, 8));
// merge r/g
__m128i vrgba = _mm_unpacklo_epi16(vrg, vb); // permute first 4 pixels
_mm_store_si128((__m128i *)&rgba[4*i], vrgba); // store first 4 pixels to rgba[4*i]
vrgba = _mm_unpackhi_epi16(vrg, vb); // permute second 4 pixels
_mm_store_si128((__m128i *)&rgba[4*i+16], vrgba); // store second 4 pixels to rgba[4*i+16]
}
Reference implementation with using of AVX2 instructions:
#include <immintrin.h> // AVX2 intrinsics
assert((width*height)%16 == 0); // total pixels count must be multiple of 16
assert(r%32 == 0 && g%32 == 0 && b%32 == 0 && rgba% == 0); // all pointers must to have 32-byte alignment
for (i=0; i<width*height; i+=16)
{
__m256i vr = _mm256_permute4x64_epi64(_mm265_load_si256((__m256i *)(r + i)), 0xD8); // load 16 pixels from r[i]
__m256i vg = _mm256_permute4x64_epi64(_mm265_load_si256((__m256i *)(g + i)), 0xD8); // load 16 pixels from g[i]
__m256i vb = _mm256_permute4x64_epi64(_mm265_load_si256((__m256i *)(b + i)), 0xD8); // load 16 pixels from b[i]
__m256i vrg = _mm256_or_si256(vr, _mm256_slli_si256(vg, 1));// merge r/g
__m256i vrgba = _mm256_unpacklo_epi16(vrg, vb); // permute first 8 pixels
_mm256_store_si256((__m256i *)(rgba + 4*i), vrgba); // store first 8 pixels to rgba[4*i]
vrgba = _mm256_unpackhi_epi16(vrg, vb); // permute second 8 pixels
_mm256_store_si256((__m256i *)(rgba + 4*i+32), vrgba); // store second 8 pixels to rgba[4*i + 32]
}

equalize/normalize Hue Saturation Brightness in color images with OpenCV

i want to equalize two half face color images of the same subject and then merge them. Each of them has different values of hue saturation and brightness....using opencv how can i normalize/equalize each half image?
I tried performing cvEqualizeHist(v, v); on the v value of the converted HSV image, but two images still have significant difference and after the merge still has a line between the colors of the two halves...thanks
Have u tried to read this link? http://answers.opencv.org/question/75510/how-to-make-auto-adjustmentsbrightness-and-contrast-for-image-android-opencv-image-correction/
void Utils::BrightnessAndContrastAuto(const cv::Mat &src, cv::Mat &dst, float clipHistPercent)
{
CV_Assert(clipHistPercent >= 0);
CV_Assert((src.type() == CV_8UC1) || (src.type() == CV_8UC3) || (src.type() == CV_8UC4));
int histSize = 256;
float alpha, beta;
double minGray = 0, maxGray = 0;
//to calculate grayscale histogram
cv::Mat gray;
if (src.type() == CV_8UC1) gray = src;
else if (src.type() == CV_8UC3) cvtColor(src, gray, CV_BGR2GRAY);
else if (src.type() == CV_8UC4) cvtColor(src, gray, CV_BGRA2GRAY);
if (clipHistPercent == 0)
{
// keep full available range
cv::minMaxLoc(gray, &minGray, &maxGray);
}
else
{
cv::Mat hist; //the grayscale histogram
float range[] = { 0, 256 };
const float* histRange = { range };
bool uniform = true;
bool accumulate = false;
calcHist(&gray, 1, 0, cv::Mat(), hist, 1, &histSize, &histRange, uniform, accumulate);
// calculate cumulative distribution from the histogram
std::vector<float> accumulator(histSize);
accumulator[0] = hist.at<float>(0);
for (int i = 1; i < histSize; i++)
{
accumulator[i] = accumulator[i - 1] + hist.at<float>(i);
}
// locate points that cuts at required value
float max = accumulator.back();
clipHistPercent *= (max / 100.0); //make percent as absolute
clipHistPercent /= 2.0; // left and right wings
// locate left cut
minGray = 0;
while (accumulator[minGray] < clipHistPercent)
minGray++;
// locate right cut
maxGray = histSize - 1;
while (accumulator[maxGray] >= (max - clipHistPercent))
maxGray--;
}
// current range
float inputRange = maxGray - minGray;
alpha = (histSize - 1) / inputRange; // alpha expands current range to histsize range
beta = -minGray * alpha; // beta shifts current range so that minGray will go to 0
// Apply brightness and contrast normalization
// convertTo operates with saurate_cast
src.convertTo(dst, -1, alpha, beta);
// restore alpha channel from source
if (dst.type() == CV_8UC4)
{
int from_to[] = { 3, 3 };
cv::mixChannels(&src, 4, &dst, 1, from_to, 1);
}
return;
}
I'm not sure as I'm now facing the same problem,
but maybe try to equalize the H & S values instead of the V?
Also try manually adjusting it using Photoshop to see what works best and then try to replicate it using code.

Resources