Dynamic buffers behaviour - buffer

I have a question regards dynamic vertex and index buffers. Can I change their topology completely? For example, having one set of vertices in one frame, throw them away and recreate vertices with their own properties and count not equal to previous count of vertices. Also I want to know the same about index buffer, can I change the number of indices in dynamic index buffer?
So far in my application I have a warning when trying to update index buffer with larger size:
D3D11 WARNING: ID3D11DeviceContext::DrawIndexed: Index buffer has not enough space! [ EXECUTION WARNING #359: DEVICE_DRAW_INDEX_BUFFER_TOO_SMALL]

Changing the size of a buffer after creation is not possible.
A dynamic buffer allows you to update the data. You can write new data to it as long as it does not exceed the buffer's size.
But buffers don't care about data layout. A buffer with a size of 200 bytes can hold 100 shorts or 50 floats or mixed data; anything less or equal to 200 bytes.

Related

How to set bytesPerRow parameter in replaceRegion when texture format is compressed in metal?

According to replaceRegion doc: For a compressed pixel format, the stride is the number of bytes from the beginning of one row of blocks to the beginning of the next.
I still don't know how to set the bytesPerRow. Can I get 'bytesPerRow' by given the current level of texture width and texture format? Or is there any general way of calculation?
Any help would be greatly appreciated. Thank you.
You need to know:
Texture Width
Block size of your format (e.g. DXT1 is 4x4)
Bytes per block for your format (e.g. DXT1 is 8 bytes per block).
Then the formula is something like:
int blocksPerRow = (textureWidth + (blockWidth - 1)) / blockWidth;
bytesPerRow = blocksPerRow * bytesPerBlock;
Edit: For PVRTC, don't miss this important note from replaceRegion
This method is supported if you're copying to an entire texture with a PowerVR Texture Compression (PVRTC) pixel format; in which case, bytesPerRow and bytesPerImage must both be set to 0. This method isn't supported for copying to a subregion of a texture that has a PVRTC pixel format.

number of elements and size of memory in Dynamic memory allocation

I have a question about dynamic memory allocation in c or c++!
When we want to figure out the size of an array we use sizeof function!
Furthermore,if we want to figure out the number of elements in which array has we do like this :
int a[20];
cout << sizeof(a) / sizeof(a[0]) << endl;
I was wondering if we could figure out the number and real size of memory which is allocated dynamically.
I would really appreciate it if you tell me how, or introduce me to a reference.
In your code, a[20] is statically allocated (in the stack). The memory used is always the size 20 * sizeof(int) and is freed at the end of the function.
When you allocate dynamically (in the heap) an array like this: int* a = malloc(20*sizeof(int)), it is your choice to take x amount of memory and fill it as you want. So it is your duty to increase or decrease the number of elements your data structure includes. This amount of memory is allocated until you free it (free(a)).
Anyway, the real size of the memory occupied is always the same (in our case 20*sizeof(int)).
See there for info about stack/heap, static/dynamic: Stack, Static, and Heap in C++

Cuda: Operating on images (linearized 2d arrays) with a single column of constant values

I am processing images, which are long, usually a few hundred thousand pixel in length. The height is usually in the 500-1000 pixel range. The process involves modifying the images on a column by column basis. So, for example, I have a column of constant values that needs to be subtracted from each column in the image.
Currently I split the image into smaller blocks, put them into linearized 2d arrays. Then I make a linearized 2d array from the column of constant values that is the same size as the smaller block. Then a (image array - constant array) operation is done until the full image is processed.
Should I copy the constant column to the device, and then just operate column by column? Or should I try to make as large of a "constant array" as possible, and then perform the subtraction. I am not looking for 100% optimization or even close to that, but an idea about what the right approach to take is.
How can I optimize this process? Any resources to learn more about this type of processing would be appreciated.
Constant memory is up to 64KB, so assuming your pixels are 4 bytes or less, then you should be able to handle an image height up to about 16K pixels, and still put the entire "constant column" in constant memory.
After that, you don't need to process things "column by column". Constant memory is optimized for access when every thread is requesting the same value from constant memory, which perfectly describes your case.
Therefore, your thread code can be trivially simple:
#define MAX_COL_SIZE 1024
__constant__ float const_column[MAX_COL_SIZE];
__global__ void img_col_kernel(float *in, float *out, int num_cols, int col_size){
int idx = threadIdx.x + blockDim.x*blockIdx.x;
if (idx < num_cols)
for (int i = 0; i < col_size; i++)
out[idx+i*num_cols] = in[idx+i*num_cols] - const_column[i];
}
(coded in browser, not tested)
Set up const_column in your host code using cudaMemcpyToSymbol prior to calling img_col_kernel. Call the kernel with a 1D grid including a total number of threads equal to or greater than your image width (num_cols). Pass the "linearized 2D" pointers to your input and output images to the kernel (in and out). The above kernel should run pretty fast, and essentially be bound by memory bandwidth for images of width 1000 or more. For small images, you may want to increase the number of threads by dividing your image vertically into say, 4 pieces, and operate with 4 times as many threads (and 4 regions of constant memory).

How to use instancing offsets

Suppose I have a single buffer of instance data for 2 different groups of instances (ie, I want to draw them in separate draw calls because they use different textures). How do I set up the offsets to accomplish this? IASetVertexBuffers and DrawIndexedInstanced both have offset parameters and its not clear to me which ones I need to use. Also, DrawIndexedInstanced isn't exactly clear if the offset value is in bytes or not.
Offsets
Those offsets works independently. You can offset either in ID3D11DeviceContext::IASetVertexBuffers or in ID3D11DeviceContext::DrawIndexedInstanced or in both (then they will combine).
ID3D11DeviceContext::IASetVertexBuffers accepts offset in bytes:
bindedData = (uint8_t*)data + sizeof(uint8_t) * offset
ID3D11DeviceContext::DrawIndexedInstanced accepts all offsets in elements (indices, vertices, instances). They are just values added to indices. Vertex and instance offsets works independently. Index offset also offsets vertices and instances (obviously):
index = indexBuffer[i] + StartIndexLocation
indexVert = index + BaseVertexLocation
indexInst = index + StartInstanceLocation
I prefer offsetting in draw call:
no byte (pointer) arithmetic needed -- less chances to make a mistake
no need to re-bind buffer if you just changing offset -- less visible state changes (and, hopefully, invisible too)
Alternative solution
Instead of splitting rendering to two draw calls, you can merge your texture resources and draw all in one draw call:
both textures binded at same draw call, branching in shader (if/else) depending on integer passed via constant buffer (simplest solution)
texture array (if target hardware supports it)
texture atlas (will need some coding, but always useful)
Hope it helps!

My preallocation of a matrix gives out of memory error in MATLAB

I use zeros to initialize my matrix like this:
height = 352
width = 288
nFrames = 120
imgYuv=zeros([height,width,3,nFrames]);
However, when I set the value of nFrames larger than 120, MATLAB gives me an error message saying out of memory.
The original function is
[imgYuv, S, A]= changeYuv(fileName, width, height, idxFrame, nFrames)
my command is
[imgYuv,S,A]=changeYuv('tilt.yuv',352,288,1:120,120);
Can anyone please tell me what's going on here?
PS: one of the purposes of the function is to load a yuv video which consists more than 2000 frames. Is there any possibility to implement that?
There are three ways to avoid the error
Process a limited number of
frames at any given time.
Work
with integer arrays. Most movies are
in 8-bit format, while Matlab
normally works with doubles.
uint8 takes 1 byte per element,
while double takes 8 bytes. Thus,
if you create your array as B =
zeros(height,width,3,nFrames,'uint8)`,
it only uses 1/8th of the memory.
This might work for 120 frames,
though for 2000 frames, you'll run
again into trouble. Note that not
all Matlab functions work for
integer arrays; you may have to
reimplement those that require
double.
Buy more RAM.
Yes, you (or rather, your Matlab session) are running out of memory.
Get out your calculator and find the product height x width x 3 x nFrames x 8 which will tell you how much memory you have tried to get in your call to zeros. That will be a number either close to or in excess of the RAM available to Matlab on your computer.
Your command is:
[imgYuv,S,A]=changeYuv('tilt.yuv',352,288,1:120,120);
That is:
352*288*120*120 = 1459814400
That is 1.4 * 10^9. If one object has 4 bytes, then you need 6GB. That is a lot of memory...
Referencing the code I've seen in your withdrawn post, your calculating the difference between adjacent frame histograms. One option to avoid massive memory allocation might be to just hold two frames in memory, instead of reading all the frames at once.
The function B = zeros([d1 d2 d3...]) creates an multi-dimensional array with dimensions d1*d2*d3*...
Depending on width and height, given the 3rd dimension of 3 and the 4th dimension of 120 (which effectively results in width*height*360), may result in a very huge array. There are certain memory limits on every machine, maybe you reached these... ;)

Resources