metal causes segment fault: 11 - ios

A segment fault occurred occasionally. I have no idea if it was caused by [MTLComputeComandEncoder setCcomptePipelinestate] method. The related codes are listed in the follows
void metal_func( void *metal_context, int16_t *dst, uint8_t *_src, int _srcstride, int height, int mx, int my, int width)
{
int x, y;
pixel *src = (pixel*)_src;
int srcstride = _srcstride / sizeof(pixel);
int16_t dst_buf[4900];
int16_t out_buf[4900];
int16_t *pdst = dst_buf;
int16_t *pout = out_buf;
uint8_t local_src[4900];
MetalContext *mc = metal_context;
memset( out_buf, 0, sizeof(int16_t)*4900 );
int dst_size = sizeof(int16_t)*4900;
int src_size = sizeof(uint8_t)*4900;
id<MTLDevice> device = mc->metal_device;
id<MTLCommandQueue> commandQueue = mc->metal_commandqueue;
id<MTLComputePipelineState> cpipeline = mc->metal_cps_v;
// Buffer for storing encoded commands that are sent to GPU
id<MTLCommandBuffer> commandBuffer = [commandQueue commandBuffer];
id<MTLBuffer> dst_buffer;
id<MTLBuffer> src_buffer;
id<MTLBuffer> stride_buffer;
id<MTLBuffer> mx_buffer;
id<MTLBuffer> my_buffer;
id<MTLBuffer> depth_buffer;
id <MTLComputeCommandEncoder> computeCommandEncoder;
MTLSize ts= {1, 1, 1};
MTLSize numThreadgroups = {70*height, 1, 1};
int m_x = mx;
int m_y = my;
int s = _srcstride / sizeof(pixel);
int i_size = sizeof(int);
int dpt = BIT_DEPTH;
memset( pdst, 0, 4900*sizeof(int16_t));
//copy data to the local_src_buffer
uint8_t *pcsrc = _src - 3*s;
uint8_t *pcdst = local_src;
memset( local_src, 0, sizeof(uint8_t)*4900 );
for( int i = 0; i < height+7; i++ )
{
memcpy( pcdst, pcsrc, sizeof(uint8_t)*width);
pcsrc += s;
pcdst += 70;
}
int local_src_stride = 70;
computeCommandEncoder = [commandBuffer computeCommandEncoder];
//set kernel function parameters
dst_buffer = [device newBufferWithBytes: pdst length: dst_size options: MTLResourceOptionCPUCacheModeDefault ];
[computeCommandEncoder setBuffer: dst_buffer offset: 0 atIndex: 0 ];
src_buffer = [device newBufferWithBytes: local_src length: src_size options: MTLResourceOptionCPUCacheModeDefault ];
[computeCommandEncoder setBuffer: src_buffer offset: 0 atIndex: 1 ];
stride_buffer = [device newBufferWithBytes: &local_src_stride length: i_size options: MTLResourceOptionCPUCacheModeDefault ];
[computeCommandEncoder setBuffer: stride_buffer offset: 0 atIndex: 2 ];
mx_buffer = [device newBufferWithBytes: &m_x length: i_size options: MTLResourceOptionCPUCacheModeDefault ];
[computeCommandEncoder setBuffer: mx_buffer offset: 0 atIndex: 3 ];
my_buffer = [device newBufferWithBytes: &m_y length: i_size options: MTLResourceOptionCPUCacheModeDefault ];
[computeCommandEncoder setBuffer: my_buffer offset: 0 atIndex: 4 ];
depth_buffer = [device newBufferWithBytes: &dpt length: i_size options: MTLResourceOptionCPUCacheModeDefault ];
[computeCommandEncoder setBuffer: depth_buffer offset: 0 atIndex: 5 ];
[computeCommandEncoder setComputePipelineState:cpipeline ];
**//occasionally, segment fault was reported just after the above commands**
[computeCommandEncoder dispatchThreadgroups:numThreadgroups threadsPerThreadgroup:ts];
[computeCommandEncoder endEncoding ];
[ commandBuffer commit];
[ commandBuffer waitUntilCompleted];
//get the data computed by GPU
NSData* outdata = [NSData dataWithBytesNoCopy:[dst_buffer contents] length: dst_size freeWhenDone:false ];
[outdata getBytes:pout length:dst_size];
[dst_buffer release];
[src_buffer release];
[stride_buffer release];
[mx_buffer release];
[my_buffer release];
[depth_buffer release];
pout = out_buf;
pdst = dst;
for( int j = 0; j < height; j++ )
{
memcpy( pdst, pout, sizeof(int16_t)*width );
pdst += MAX_PB_SIZE;
pout += 70;
}
}
MetalContext is defined as as followed. The 3 members of MetalContext are initialized outside when program began, and they were initialized once.
typedef struct {
void * metal_device;
void * metal_commandqueue;
void * metal_cps_v;
}MetalContext;
The codes could not run successfully all the time. Sometimes, "segment fault: 11" was reported after the command [computeCommandEncoder setComputePipelineState:cpipeline ]. Is there any wrong?

Related

aes cbc encrypt, decrypt result is diffrent(objecitve-c, c#)

i have a code in c# for aes decryption
i want make same encryption result by objective-c
but i failed.. help me
i can fix objective-c code, what can i for this?
c# for decrypt
private static readonly string AES_KEY = "asdfasdfasdfasdf";
private static readonly int BUFFER_SIZE = 1024 * 4;
private static readonly int KEY_SIZE = 128;
private static readonly int BLOCK_SIZE = 128;
static public string Composite(string value)
{
using (AesManaged aes = new AesManaged())
using (MemoryStream ims = new MemoryStream(Convert.FromBase64String(value), false))
{
aes.KeySize = KEY_SIZE;
aes.BlockSize = BLOCK_SIZE;
aes.Mode = CipherMode.CBC;
aes.Key = Encoding.UTF8.GetBytes(AES_KEY);
byte[] iv = new byte[aes.IV.Length];
ims.Read(iv, 0, iv.Length);
aes.IV = iv;
using (ICryptoTransform ce = aes.CreateDecryptor(aes.Key, aes.IV))
using (CryptoStream cs = new CryptoStream(ims, ce, CryptoStreamMode.Read))
using (DeflateStream ds = new DeflateStream(cs, CompressionMode.Decompress))
using (MemoryStream oms = new MemoryStream())
{
byte[] buf = new byte[BUFFER_SIZE];
for (int size = ds.Read(buf, 0, buf.Length); size > 0; size = ds.Read(buf, 0, buf.Length))
{
oms.Write(buf, 0, size);
}
return Encoding.UTF8.GetString(oms.ToArray());
}
}
}
objective-c for encrypt
- (NSString *)AES128EncryptWithKey:(NSString *)key
{
NSData *plainData = [self dataUsingEncoding:NSUTF8StringEncoding];
NSData *encryptedData = [plainData AES128EncryptWithKey:key];
NSString *encryptedString = [encryptedData stringUsingEncodingBase64];
return encryptedString;
}
#import "NSData+AESCrypt.h"
#import <CommonCrypto/CommonCryptor.h>
static char encodingTable[64] =
{
'A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P',
'Q','R','S','T','U','V','W','X','Y','Z','a','b','c','d','e','f',
'g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v',
'w','x','y','z','0','1','2','3','4','5','6','7','8','9','+','/'
};
#implementation NSData (AESCrypt)
- (NSData *)AES128EncryptWithKey:(NSString *)key
{
// 'key' should be 16 bytes for AES128
char keyPtr[kCCKeySizeAES128 + 1]; // room for terminator (unused)
bzero( keyPtr, sizeof( keyPtr ) ); // fill with zeroes (for padding)
// fetch key data
[key getCString:keyPtr maxLength:sizeof( keyPtr ) encoding:NSUTF8StringEncoding];
NSUInteger dataLength = [self length];
//See the doc: For block ciphers, the output size will always be less than or
//equal to the input size plus the size of one block.
//That's why we need to add the size of one block here
size_t bufferSize = dataLength + kCCBlockSizeAES128;
void *buffer = malloc( bufferSize );
size_t numBytesEncrypted = 0;
CCCryptorStatus cryptStatus = CCCrypt( kCCEncrypt, kCCAlgorithmAES128, kCCModeCBC | kCCOptionPKCS7Padding,
keyPtr, kCCKeySizeAES128,
NULL /* initialization vector (optional) */,
[self bytes], dataLength, /* input */
buffer, bufferSize, /* output */
&numBytesEncrypted );
if( cryptStatus == kCCSuccess )
{
//the returned NSData takes ownership of the buffer and will free it on deallocation
return [NSData dataWithBytesNoCopy:buffer length:numBytesEncrypted];
}
free( buffer ); //free the buffer
return nil;
}
- (NSString *)base64Encoding
{
const unsigned char *bytes = [self bytes];
NSMutableString *result = [NSMutableString stringWithCapacity:self.length];
unsigned long ixtext = 0;
unsigned long lentext = self.length;
long ctremaining = 0;
unsigned char inbuf[3], outbuf[4];
unsigned short i = 0;
unsigned short charsonline = 0, ctcopy = 0;
unsigned long ix = 0;
while( YES )
{
ctremaining = lentext - ixtext;
if( ctremaining <= 0 ) break;
for( i = 0; i < 3; i++ )
{
ix = ixtext + i;
if( ix < lentext ) inbuf[i] = bytes[ix];
else inbuf [i] = 0;
}
outbuf [0] = (inbuf [0] & 0xFC) >> 2;
outbuf [1] = ((inbuf [0] & 0x03) << 4) | ((inbuf [1] & 0xF0) >> 4);
outbuf [2] = ((inbuf [1] & 0x0F) << 2) | ((inbuf [2] & 0xC0) >> 6);
outbuf [3] = inbuf [2] & 0x3F;
ctcopy = 4;
switch( ctremaining )
{
case 1:
ctcopy = 2;
break;
case 2:
ctcopy = 3;
break;
}
for( i = 0; i < ctcopy; i++ )
[result appendFormat:#"%c", encodingTable[outbuf[i]]];
for( i = ctcopy; i < 4; i++ )
[result appendString:#"="];
ixtext += 3;
charsonline += 4;
}
return [NSString stringWithString:result];
}
------------------------------------------------------------
------------------------------------------------------------

reading GPU resource data by CPU

i am learning directx11 these days. and i have been stuck in compute shader section.
so i made four resource and three corresponding view.
immutable input buffer = {1,1,1,1,1} / SRV
immutable input buffer = {2,2,2,2,2} / SRV
output buffer / UAV
staging buffer for reading / No View
and i succeeded to create all things, and dispatch cs function, and copy data from output buffer to staging buffer, and i read/check data.
// INPUT BUFFER1--------------------------------------------------
const int dataSize = 5;
D3D11_BUFFER_DESC vb_dest;
vb_dest.ByteWidth = sizeof(float) * dataSize;
vb_dest.StructureByteStride = sizeof(float);
vb_dest.BindFlags = D3D11_BIND_SHADER_RESOURCE;
vb_dest.Usage = D3D11_USAGE_IMMUTABLE;
vb_dest.CPUAccessFlags = 0;
vb_dest.MiscFlags = 0;
float v1_float[dataSize] = { 1,1,1,1,1 };
D3D11_SUBRESOURCE_DATA v1_data;
v1_data.pSysMem = static_cast<void*>(v1_float);
device->CreateBuffer(
&vb_dest,
&v1_data,
valueBuffer1.GetAddressOf());
D3D11_SHADER_RESOURCE_VIEW_DESC srv_desc;
srv_desc.Format = DXGI_FORMAT_R32_FLOAT;
srv_desc.ViewDimension = D3D11_SRV_DIMENSION_BUFFER;
srv_desc.Buffer.FirstElement = 0;
srv_desc.Buffer.NumElements = dataSize;
srv_desc.Buffer.ElementWidth = sizeof(float);
device->CreateShaderResourceView(
valueBuffer1.Get(),
&srv_desc,
inputSRV1.GetAddressOf());
// INPUT BUFFER2-----------------------------------------------------------
float v2_float[dataSize] = { 2,2,2,2,2 };
D3D11_SUBRESOURCE_DATA v2_data;
v2_data.pSysMem = static_cast<void*>(v2_float);
device->CreateBuffer(
&vb_dest,
&v2_data,
valueBuffer2.GetAddressOf());
device->CreateShaderResourceView(
valueBuffer2.Get(),
&srv_desc,
inputSRV2.GetAddressOf());
// OUTPUT BUFFER-----------------------------------------------------------
D3D11_BUFFER_DESC ov_desc;
ov_desc.ByteWidth = sizeof(float) * dataSize;
ov_desc.StructureByteStride = sizeof(float);
ov_desc.BindFlags = D3D11_BIND_UNORDERED_ACCESS;
ov_desc.Usage = D3D11_USAGE_DEFAULT;
ov_desc.CPUAccessFlags = 0;
ov_desc.MiscFlags = 0;
device->CreateBuffer(
&ov_desc,
nullptr,
outputResource.GetAddressOf());
D3D11_UNORDERED_ACCESS_VIEW_DESC outputUAV_desc;
outputUAV_desc.Format = DXGI_FORMAT_R32_FLOAT;
outputUAV_desc.ViewDimension = D3D11_UAV_DIMENSION_BUFFER;
outputUAV_desc.Buffer.FirstElement = 0;
outputUAV_desc.Buffer.NumElements = dataSize;
outputUAV_desc.Buffer.Flags = 0;
device->CreateUnorderedAccessView(
outputResource.Get(),
&outputUAV_desc,
outputUAV.GetAddressOf());
// BUFFER FOR COPY-----------------------------------------------------------
D3D11_BUFFER_DESC rb_desc;
rb_desc.ByteWidth = sizeof(float) * dataSize;
rb_desc.StructureByteStride = sizeof(float);
rb_desc.Usage = D3D11_USAGE_STAGING;
rb_desc.CPUAccessFlags = D3D11_CPU_ACCESS_READ;
rb_desc.BindFlags = 0;
rb_desc.MiscFlags = 0;
device->CreateBuffer(
&rb_desc,
nullptr,
readResource.GetAddressOf());
// DISPATCH and COPY and GET DATA
dContext->CSSetShaderResources(0, 1, inputSRV1.GetAddressOf());
dContext->CSSetShaderResources(1, 1, inputSRV2.GetAddressOf());
dContext->CSSetUnorderedAccessViews(0, 1, outputUAV.GetAddressOf(), nullptr);
dContext->CSSetShader(cs.Get(), nullptr, 0);
dContext->Dispatch(1, 1, 1);
dContext->CopyResource(readResource.Get(), outputResource.Get());
D3D11_MAPPED_SUBRESOURCE mappedResource2;
ZeroMemory(&mappedResource2, sizeof(D3D11_MAPPED_SUBRESOURCE));
R_CHECK(dContext->Map(readResource.Get(), 0, D3D11_MAP_READ, 0, &mappedResource2));
float* data = static_cast<float*>(mappedResource2.pData);
for (int i = 0; i < 5; ++i)
{
int a = data[i];
}
and this is compute shader code
StructuredBuffer<float> inputA : register(t0);
StructuredBuffer<float> inputB : register(t1);
RWStructuredBuffer<float> output : register(u0);
[numthreads(5, 1, 1)]
void main(int3 id : SV_DispatchThreadID)
{
output[id.x] = inputA[id.x] + inputB[id.x];
}
in CS, it's adding two input buffer data and store into output buffer.
so expected answer would be {3,3,3,3,3}.
but the result is {3,0,0,0,0} only first idx has proper answer.
any advice would be amazing.
dContext->CopyResource(readResource.Get(), outputResource.Get());
D3D11_MAPPED_SUBRESOURCE mappedResource2;
ZeroMemory(&mappedResource2, sizeof(D3D11_MAPPED_SUBRESOURCE));
R_CHECK(dContext->Map(readResource.Get(), 0, D3D11_MAP_READ, 0, &mappedResource2));
float* data = static_cast<float*>(mappedResource2.pData);
for (int i = 0; i < 5; ++i)
{
int a = data[i];
}
this code should be like this.
CopyResource();
Map();
Declare and allocate 'data'
zeromemory(data);
memcopy(data, resource's pointer);
unMap();
for some reason, i have to use the memcopy instead of just reading resource directly with the pointer that i get from mapping.

AudioConverterFillComplexBuffer returns 1852797029 (kAudioCodecIllegalOperationError)

I'm trying to decode aac data with AudioToolbox in iOS environment. I consulted this thread.
'AudioConverterNew' function call succeed but AudioConverterFillComplexBuffer returns error code 1852797029, kAudioCodecIllegalOperationError.
I'm trying to find my mistakes. Thank you for reading.
- (void)initAudioToolBox {
HCAudioAsset* asset = [self.provider getAudioAsset];
AudioStreamBasicDescription outFormat;
memset(&outFormat, 0, sizeof(outFormat));
outFormat.mSampleRate = 44100;
outFormat.mFormatID = kAudioFormatLinearPCM;
outFormat.mFormatFlags = kLinearPCMFormatFlagIsSignedInteger;
outFormat.mBytesPerPacket = 2;
outFormat.mFramesPerPacket = 1;
outFormat.mBytesPerFrame = 2;
outFormat.mChannelsPerFrame = 1;
outFormat.mBitsPerChannel = 16;
outFormat.mReserved = 0;
AudioStreamBasicDescription inFormat;
memset(&inFormat, 0, sizeof(inFormat));
inFormat.mSampleRate = [asset sampleRate];
inFormat.mFormatID = kAudioFormatMPEG4AAC;
inFormat.mFormatFlags = kMPEG4Object_AAC_LC;
inFormat.mBytesPerPacket = 0;
inFormat.mFramesPerPacket = (UInt32)[asset framePerPacket];
inFormat.mBytesPerFrame = 0;
inFormat.mChannelsPerFrame = (UInt32)[asset channelCount];
inFormat.mBitsPerChannel = 0;
inFormat.mReserved = 0;
OSStatus status = AudioConverterNew(&inFormat, &outFormat, &audioConverter);
if (status != noErr) {
NSLog(#"setup converter error, status: %i\n", (int)status);
} else {
NSLog(#"Audio Converter is initialized successfully.");
}
}
typedef struct _PassthroughUserData PassthroughUserData;
struct _PassthroughUserData {
UInt32 mChannels;
UInt32 mDataSize;
const void* mData;
AudioStreamPacketDescription mPacket;
};
int inInputDataProc(AudioConverterRef aAudioConverter,
UInt32* aNumDataPackets,
AudioBufferList* aData,
AudioStreamPacketDescription** aPacketDesc,
void* aUserData)
{
PassthroughUserData* userData = (PassthroughUserData*)aUserData;
if (!userData->mDataSize) {
*aNumDataPackets = 0;
NSLog(#"inInputDataProc returns -1");
return -1;
}
if (aPacketDesc) {
userData->mPacket.mStartOffset = 0;
userData->mPacket.mVariableFramesInPacket = 0;
userData->mPacket.mDataByteSize = userData->mDataSize;
NSLog(#"mDataSize:%d", userData->mDataSize);
*aPacketDesc = &userData->mPacket;
}
aData->mBuffers[0].mNumberChannels = userData->mChannels;
aData->mBuffers[0].mDataByteSize = userData->mDataSize;
aData->mBuffers[0].mData = (void*)(userData->mData);
NSLog(#"buffer[0] - channel:%d, byte size:%u, data:%p",
aData->mBuffers[0].mNumberChannels,
(unsigned int)aData->mBuffers[0].mDataByteSize,
aData->mBuffers[0].mData);
// No more data to provide following this run.
userData->mDataSize = 0;
NSLog(#"inInputDataProc returns 0");
return 0;
}
- (void)decodeAudioFrame:(NSData *)frame withPts:(NSInteger)pts{
if(!audioConverter){
[self initAudioToolBox];
}
HCAudioAsset* asset = [self.provider getAudioAsset];
PassthroughUserData userData = { (UInt32)[asset channelCount], (UInt32)frame.length, [frame bytes]};
NSMutableData *decodedData = [NSMutableData new];
const uint32_t MAX_AUDIO_FRAMES = 128;
const uint32_t maxDecodedSamples = MAX_AUDIO_FRAMES * 1;
do {
uint8_t *buffer = (uint8_t *)malloc(maxDecodedSamples * sizeof(short int));
AudioBufferList decBuffer;
memset(&decBuffer, 0, sizeof(AudioBufferList));
decBuffer.mNumberBuffers = 1;
decBuffer.mBuffers[0].mNumberChannels = 2;
decBuffer.mBuffers[0].mDataByteSize = maxDecodedSamples * sizeof(short int);
decBuffer.mBuffers[0].mData = buffer;
UInt32 numFrames = MAX_AUDIO_FRAMES;
AudioStreamPacketDescription outPacketDescription;
memset(&outPacketDescription, 0, sizeof(AudioStreamPacketDescription));
outPacketDescription.mDataByteSize = MAX_AUDIO_FRAMES;
outPacketDescription.mStartOffset = 0;
outPacketDescription.mVariableFramesInPacket = 0;
NSLog(#"frame - size:%lu, buffer:%p", [frame length], [frame bytes]);
OSStatus rv = AudioConverterFillComplexBuffer(audioConverter,
inInputDataProc,
&userData,
&numFrames,
&decBuffer,
&outPacketDescription);
NSLog(#"num frames:%d, dec buffer [0] channels:%d, dec buffer [0] data byte size:%d, rv:%d",
numFrames, decBuffer.mBuffers[0].mNumberChannels,
decBuffer.mBuffers[0].mDataByteSize, (int)rv);
if (rv && rv != noErr) {
NSLog(#"Error decoding audio stream: %d\n", rv);
break;
}
if (numFrames) {
[decodedData appendBytes:decBuffer.mBuffers[0].mData length:decBuffer.mBuffers[0].mDataByteSize];
}
} while (true);
//void *pData = (void *)[decodedData bytes];
//audioRenderer->Render(&pData, decodedData.length, pts);
}

How to deep copy camera collection callback CMSampleBufferRef?

Obviously I must copy the CMSampleBufferRef, but CMSampleBufferCreateCopy() will only create a shallow copy.
The method is feasible, but the CPU consumption is too high!
- (CVPixelBufferRef) copyPixelbuffer : (CVPixelBufferRef)pixel {
NSAssert(CFGetTypeID(pixel) == CVPixelBufferGetTypeID(), #"typeid !=");
CVPixelBufferRef _copy = NULL;
CVPixelBufferCreate(nil, CVPixelBufferGetWidth(pixel), CVPixelBufferGetHeight(pixel), CVPixelBufferGetPixelFormatType(pixel), CVBufferGetAttachments(pixel, kCVAttachmentMode_ShouldPropagate), &_copy);
if (_copy != NULL) {
CVPixelBufferLockBaseAddress(pixel, kCVPixelBufferLock_ReadOnly);
CVPixelBufferLockBaseAddress(_copy, 0);
size_t count = CVPixelBufferGetPlaneCount(pixel);
size_t img_widstp = CVPixelBufferGetBytesPerRowOfPlane(pixel, 0);
size_t img_heistp = CVPixelBufferGetBytesPerRowOfPlane(pixel, 1);
NSLog(#"img_widstp = %d, img_heistp = %d", img_widstp, img_heistp);
for (size_t plane = 0; plane < count; plane++) {
void *dest = CVPixelBufferGetBaseAddressOfPlane(_copy, plane);
void *source = CVPixelBufferGetBaseAddressOfPlane(pixel, plane);
size_t height = CVPixelBufferGetHeightOfPlane(pixel, plane);
size_t bytesPerRow = CVPixelBufferGetBytesPerRowOfPlane(pixel, plane);
memcpy(dest, source, height * bytesPerRow);
}
CVPixelBufferUnlockBaseAddress(_copy, 0);
CVPixelBufferUnlockBaseAddress(pixel, kCVPixelBufferLock_ReadOnly);
}
return _copy;
}

Loading Texture2D data in DirectX 11 Compute Shader

I am trying to read some data from a texture2d in DirectX11 compute shader, however, the 'Load' function of a texture2D object keeps returning 0 even though the texture object is filled with the same float number.
This is a 160 * 120 texture2d with DXGI_FORMAT_R32G32B32A32_FLOAT. The following code is how I created this resource:
HRESULT TestResources(ID3D11Device* pd3dDevice, ID3D11DeviceContext* pImmediateContext) {
float *test = new float[4 * 80 * 60 * 4]; // 80 * 60, 4 channels, 1 big texture contains 4 80 * 60 subimage
for (int i = 0; i < 4 * 80 * 60 * 4; i++) test[i] = 0.7f;
HRESULT hr = S_OK;
D3D11_TEXTURE2D_DESC RTtextureDesc;
ZeroMemory(&RTtextureDesc, sizeof(D3D11_TEXTURE2D_DESC));
RTtextureDesc.Width = 160;
RTtextureDesc.Height = 120;
RTtextureDesc.MipLevels = 1;
RTtextureDesc.ArraySize = 1;
RTtextureDesc.Format = DXGI_FORMAT_R32G32B32A32_FLOAT;
RTtextureDesc.SampleDesc.Count = 1;
RTtextureDesc.SampleDesc.Quality = 0;
RTtextureDesc.Usage = D3D11_USAGE_DYNAMIC;
RTtextureDesc.BindFlags = D3D11_BIND_SHADER_RESOURCE;
RTtextureDesc.CPUAccessFlags = D3D11_CPU_ACCESS_WRITE;
RTtextureDesc.MiscFlags = 0;
D3D11_SUBRESOURCE_DATA InitData;
InitData.pSysMem = test;
InitData.SysMemPitch = sizeof(float) * 4;
V_RETURN(pd3dDevice->CreateTexture2D(&RTtextureDesc, &InitData, &m_pInputTex2Ds));
//V_RETURN(pd3dDevice->CreateTexture2D(&RTtextureDesc, NULL, &m_pInputTex2Ds));
D3D11_SHADER_RESOURCE_VIEW_DESC SRViewDesc;
ZeroMemory(&SRViewDesc, sizeof(SRViewDesc));
SRViewDesc.Format = RTtextureDesc.Format;
SRViewDesc.ViewDimension = D3D11_SRV_DIMENSION_TEXTURE2D;
SRViewDesc.Texture2D.MostDetailedMip = 0;
SRViewDesc.Texture2D.MipLevels = 1;
V_RETURN(pd3dDevice->CreateShaderResourceView(m_pInputTex2Ds, &SRViewDesc, &m_pInputTexSRV));
delete[] test;
return hr;
}
And then I try to run dispatch with X = Y = 2 and Z = 1 like the following:
void ComputeShaderReduction::ExecuteComputeShader(ID3D11DeviceContext* pd3dImmediateContext, UINT uInputNum, ID3D11UnorderedAccessView** ppUAVInputs, UINT X, UINT Y, UINT Z) {
pd3dImmediateContext->CSSetShader(m_pComputeShader, nullptr, 0);
pd3dImmediateContext->CSSetShaderResources(0, 1, &m_pInputTexSRV); // test code
pd3dImmediateContext->CSSetUnorderedAccessViews(0, uInputNum, ppUAVInputs, nullptr);
//pd3dImmediateContext->CSSetUnorderedAccessViews(0, 1, &m_pGPUOutUAVs, nullptr);
pd3dImmediateContext->UpdateSubresource(m_pConstBuf, 0, nullptr, &m_ConstBuf, 0, 0);
pd3dImmediateContext->CSSetConstantBuffers(0, 1, &m_pConstBuf);
pd3dImmediateContext->Dispatch(X, Y, Z);
pd3dImmediateContext->CSSetShader(nullptr, nullptr, 0);
ID3D11UnorderedAccessView* ppUAViewnullptr[1] = { nullptr };
pd3dImmediateContext->CSSetUnorderedAccessViews(0, 1, ppUAViewnullptr, nullptr);
ID3D11ShaderResourceView* ppSRVnullptr[1] = { nullptr };
pd3dImmediateContext->CSSetShaderResources(0, 1, ppSRVnullptr);
ID3D11Buffer* ppCBnullptr[1] = { nullptr };
pd3dImmediateContext->CSSetConstantBuffers(0, 1, ppCBnullptr);
}
And I wrote a very simple CS shader to try to get the data in the texture2d and out it. So, the compute shader looks like this:
#define subimg_dim_x 80
#define subimg_dim_y 60
Texture2D<float4> BufferIn : register(t0);
StructuredBuffer<float> Test: register(t1);
RWStructuredBuffer<float> BufferOut : register(u0);
groupshared float sdata[subimg_dim_x];
[numthreads(subimg_dim_x, 1, 1)]
void CSMain(uint3 DTid : SV_DispatchThreadID,
uint3 threadIdx : SV_GroupThreadID,
uint3 groupIdx : SV_GroupID) {
sdata[threadIdx.x] = 0.0;
GroupMemoryBarrierWithGroupSync();
if (threadIdx.x == 0) {
float4 num = BufferIn.Load(uint3(groupIdx.x, groupIdx.y, 1));
//BufferOut[groupIdx.y * 2 + groupIdx.x] = 2.0; //This one gives me 2.0 as output in the console
BufferOut[groupIdx.y * 2 + groupIdx.x] = num.x; //This one keeps giving me 0.0 and in the texture, r = g = b = a = 0.7 or x = y = z = w = 0.7, so it suppose to print 0.7 in the console.
}
GroupMemoryBarrierWithGroupSync();
}
I think the way I print the CS shader result on CPU end is correct.
void ComputeShaderReduction::CopyToCPUBuffer(ID3D11Device* pdevice, ID3D11DeviceContext* pd3dImmediateContext, ID3D11Buffer* pGPUOutBufs) {
D3D11_BUFFER_DESC desc;
ZeroMemory(&desc, sizeof(desc));
pGPUOutBufs->GetDesc(&desc);
desc.CPUAccessFlags = D3D11_CPU_ACCESS_READ;
desc.Usage = D3D11_USAGE_STAGING;
desc.BindFlags = 0;
desc.MiscFlags = 0;
if (!m_pCPUOutBufs && SUCCEEDED(pdevice->CreateBuffer(&desc, nullptr, &m_pCPUOutBufs))) {
pd3dImmediateContext->CopyResource(m_pCPUOutBufs, pGPUOutBufs);
}
else pd3dImmediateContext->CopyResource(m_pCPUOutBufs, pGPUOutBufs);
D3D11_MAPPED_SUBRESOURCE MappedResource;
float *p;
pd3dImmediateContext->Map(m_pCPUOutBufs, 0, D3D11_MAP_READ, 0, &MappedResource);
p = (float*)MappedResource.pData;
for (int i = 0; i < 4; i++) printf("%d %f\n", i, p[i]);
pd3dImmediateContext->Unmap(m_pCPUOutBufs, 0);
printf("\n");
}
The buffer that bind to UAV has only 4 elements. So, if all the float numbers in my texture2d are 0.7, I should have 4 0.7s get printed in CopyToCPUBuffer function instead of 0.0s.
Is anyone know what could be wrong in my code or can someone provide me an entire example or a tutorial that shows how to read DirectX 11 texture2d's data in compute shader correctly?
Thanks in advance.
The following is wrong for a start. The Pitch of your input data is the number of bytes per row of the texture, not per pixel.
InitData.SysMemPitch = sizeof(float) * 4;
Secondly:
float4 num = BufferIn.Load(uint3(groupIdx.x, groupIdx.y, 1));
You're trying to load data from the 2nd mip of the texture, it only has 1 mip level.

Resources