Memory allocated with Marshal.AllocHGlobal is getting corrupted? - memory

I need to create Bitmap objects with direct access to their pixel data.
LockBits is too slow for my needs - its no good for rapidly recreating (sometimes large) bitmaps.
So I have a custom FastBitmap object. It has a reference to a Bitmap object and an IntPtr that points to the bits in the bitmap.
The constructor looks like this:
public FastBitmap(int width, int height)
{
unsafe
{
int pixelSize = Image.GetPixelFormatSize(PixelFormat.Format32bppArgb) / 8;
_stride = width * pixelSize;
int byteCount = _stride * height;
_bits = Marshal.AllocHGlobal(byteCount);
// Fill image with red for testing
for (int i = 0; i < byteCount; i += 4)
{
byte* pixel = ((byte *)_bits) + i;
pixel[0] = 0;
pixel[1] = 0;
pixel[2] = 255;
pixel[3] = 255;
}
_bitmapObject = new Bitmap(width, height, _stride, PixelFormat.Format32bppArgb, _bits); // All bits in this bitmap are now directly modifiable without LockBits.
}
}
The allocated memory is freed in a cleanup function which is called by the deconstructor.
This works, but not for long. Somehow, without any further modification of the bits, the allocated memory gets corrupted which corrupts the bitmap. Sometimes, big parts of the bitmap are replaced by random pixels, other times the whole program crashes when I try to display it with Graphics.DrawImage - either one or the other, completely at random.

The reason the memory was being corrupted was because I was using Bitmap.Clone to copy the _bitmapObject after I was done with a FastBitmap.
Bitmap.Clone does not make a new copy of the pixel data when called, or at least such is the case when you create a Bitmap with your own allocated data.
Instead, cloning appears to use the exact same pixel data, which was problematic for me because I was freeing the pixel data memory after a clone operation, causing the cloned bitmap to become corrupt when the memory is used for other things.
The first and current solution I have found as an alternative to Bitmap.Clone is to use:
Bitmap clone = new Bitmap(bitmapToClone);
which does copy the pixel data elsewhere, making it okay to free the old memory.
There may be even better/faster ways to make a fully copied clone but this is a simple solution for now.

Related

how can I update dynamic vertex buffer fastly?

I'm trying to make a simple 3D modeling tool.
there is some work to move a vertex( or vertices ) for transform the model.
I used dynamic vertex buffer because thought it needs much update.
but performance is too low in high polygon model even though I change just one vertex.
is there other methods? or did I wrong way?
here is my D3D11_BUFFER_DESC
Usage = D3D11_USAGE_DYNAMIC;
CPUAccessFlags = D3D11_CPU_ACCESS_WRITE;
BindFlags = D3D11_BIND_VERTEX_BUFFER;
ByteWidth = sizeof(ST_Vertex) * _nVertexCount
D3D11_SUBRESOURCE_DATA d3dBufferData;
d3dBufferData.pSysMem = pVerticesInfo;
hr = pd3dDevice->CreateBuffer(&descBuffer, &d3dBufferData, &_pVertexBuffer);
and my update funtion
D3D11_MAPPED_SUBRESOURCE d3dMappedResource;
pImmediateContext->Map(_pVertexBuffer, 0, D3D11_MAP_WRITE_DISCARD, 0, &d3dMappedResource);
ST_Vertex* pBuffer = (ST_Vertex*)d3dMappedResource.pData;
for (int i = 0; i < vIndice.size(); ++i)
{
pBuffer[vIndice[i]].xfPosition.x = pVerticesInfo[vIndice[i]].xfPosition.x;
pBuffer[vIndice[i]].xfPosition.y = pVerticesInfo[vIndice[i]].xfPosition.y;
pBuffer[vIndice[i]].xfPosition.z = pVerticesInfo[vIndice[i]].xfPosition.z;
}
pImmediateContext->Unmap(_pVertexBuffer, 0);
As mentioned in the previous answer, you are updating your whole buffer every time, which will be slow depending on model size.
The solution is indeed to implement partial updates, there are two possibilities for it, you want to update a single vertex, or you want to update
arbitrary indices (for example, you want to move N vertices in one go, in different locations, like vertex 1,20,23 for example.
The first solution is rather simple, first create your buffer with the following description :
Usage = D3D11_USAGE_DEFAULT;
CPUAccessFlags = 0;
BindFlags = D3D11_BIND_VERTEX_BUFFER;
ByteWidth = sizeof(ST_Vertex) * _nVertexCount
D3D11_SUBRESOURCE_DATA d3dBufferData;
d3dBufferData.pSysMem = pVerticesInfo;
hr = pd3dDevice->CreateBuffer(&descBuffer, &d3dBufferData, &_pVertexBuffer);
This makes sure your vertex buffer is gpu visible only.
Next create a second dynamic buffer which has the size of a single vertex (you do not need any bind flags in that case, as it will be used only for copies)
_pCopyVertexBuffer
Usage = D3D11_USAGE_DYNAMIC; //Staging works as well
CPUAccessFlags = D3D11_CPU_ACCESS_WRITE;
BindFlags = 0;
ByteWidth = sizeof(ST_Vertex);
D3D11_SUBRESOURCE_DATA d3dBufferData;
d3dBufferData.pSysMem = NULL;
hr = pd3dDevice->CreateBuffer(&descBuffer, &d3dBufferData, &_pCopyVertexBuffer);
when you move a vertex, copy the changed vertex in the copy buffer :
ST_Vertex changedVertex;
D3D11_MAPPED_SUBRESOURCE d3dMappedResource;
pImmediateContext->Map(_pVertexBuffer, 0, D3D11_MAP_WRITE_DISCARD, 0, &d3dMappedResource);
ST_Vertex* pBuffer = (ST_Vertex*)d3dMappedResource.pData;
pBuffer->xfPosition.x = changedVertex.xfPosition.x;
pBuffer->.xfPosition.y = changedVertex.xfPosition.y;
pBuffer->.xfPosition.z = changedVertex.xfPosition.z;
pImmediateContext->Unmap(_pVertexBuffer, 0);
Since you use D3D11_MAP_WRITE_DISCARD, make sure to write all attributes there (not only position).
Now once you done, you can use ID3D11DeviceContext::CopySubresourceRegion to only copy the modified vertex in the current location :
I assume that vertexID is the index of the modified vertex :
pd3DeviceContext->CopySubresourceRegion(_pVertexBuffer,
0, //must be 0
vertexID * sizeof(ST_Vertex), //location of the vertex in you gpu vertex buffer
0, //must be 0
0, //must be 0
_pCopyVertexBuffer,
0, //must be 0
NULL //in this case we copy the full content of _pCopyVertexBuffer, so we can set to null
);
Now if you want to update a list of vertices, things get more complicated and you have several options :
-First you apply this single vertex technique in a loop, this will work quite well if your changeset is small.
-If your changeset is very big (close to almost full vertex size, you can probably rewrite the whole buffer instead).
-An intermediate technique is to use compute shader to perform the updates (thats the one I normally use as its the most flexible version).
Posting all c++ binding code would be way too long, but here is the concept :
your vertex buffer must have BindFlags = D3D11_BIND_VERTEX_BUFFER | D3D11_BIND_UNORDERED_ACCESS; //this allows to write wioth compute
you need to create an ID3D11UnorderedAccessView for this buffer (so shader can write to it)
you need the following misc flags : D3D11_RESOURCE_MISC_BUFFER_ALLOW_RAW_VIEWS //this allows to write as RWByteAddressBuffer
you then create two dynamic structured buffers (I prefer those over byteaddress, but vertex buffer and structured is not allowed in dx11, so for the write one you need raw instead)
first structured buffer has a stride of ST_Vertex (this is your changeset)
second structured buffer has a stride of 4 (uint, these are the indices)
both structured buffers get an arbitrary element count (normally i use 1024 or 2048), so that will be the maximum amount of vertices you can update in a single pass.
both structured buffers you need an ID3D11ShaderResourceView (shader visible, read only)
Then update process is the following :
write modified vertices and locations in structured buffers (using map discard, if you have to copy less its ok)
attach both structured buffers for read
attach ID3D11UnorderedAccessView for write
set your compute shader
call dispatch
detach ID3D11UnorderedAccessView for write (this is VERY important)
This is a sample compute shader code (I assume you vertex is position only, for simplicity)
cbuffer cbUpdateCount : register(b0)
{
uint updateCount;
};
RWByteAddressBuffer RWVertexPositionBuffer : register(u0);
StructuredBuffer<float3> ModifiedVertexBuffer : register(t0);
StructuredBuffer<uint> ModifiedVertexIndicesBuffer : register(t0);
//this is the stride of your vertex buffer, since here we use float3 it is 12 bytes
#define WRITE_STRIDE 12
[numthreads(64, 1, 1)]
void CS( uint3 tid : SV_DispatchThreadID )
{
//make sure you do not go part element count, as here we runs 64 threads at a time
if (tid.x >= updateCount) { return; }
uint readIndex = tid.x;
uint writeIndex = ModifiedVertexIndicesBuffer[readIndex];
float3 vertex = ModifiedVertexBuffer[readIndex];
//byte address buffers do not understand float, asuint is a binary cast.
RWVertexPositionBuffer.Store3(writeIndex * WRITE_STRIDE, asuint(vertex));
}
For the purposes of this question I'm going to assume you already have a mechanism for selecting a vertex from a list of vertices based upon ray casting or some other picking method and a mechanism for creating a displacement vector detailing how the vertex was moved in model space.
The method you have for updating the buffer is sufficient for anything less than a few hundred vertices, but on large scale models it becomes extremely slow. This is because you're updating everything, rather than the individual vertices you modified.
To fix this, you should only update the vertices you have changed, and to do that you need to create a change set.
In concept, a change set is nothing more than a set of changes made to the data - a list of the vertices that need to be updated. Since we already know which vertices were modified (otherwise we couldn't have manipulated them), we can map in the GPU buffer, go to that vertex specifically, and copy just those vertices into the GPU buffer.
In your vertex modification method, record the index of the vertex that was modified by the user:
//Modify the vertex coordinates based on mouse displacement
pVerticesInfo[SelectedVertexIndex].xfPosition.x += DisplacementVector.x;
pVerticesInfo[SelectedVertexIndex].xfPosition.y += DisplacementVector.y;
pVerticesInfo[SelectedVertexIndex].xfPosition.z += DisplacementVector.z;
//Add the changed vertex to the list of changes.
changedVertices.add(SelectedVertexIndex);
//And update the GPU buffer
UpdateD3DBuffer();
In UpdateD3DBuffer(), do the following:
D3D11_MAPPED_SUBRESOURCE d3dMappedResource;
pImmediateContext->Map(_pVertexBuffer, 0, D3D11_MAP_WRITE, 0, &d3dMappedResource);
ST_Vertex* pBuffer = (ST_Vertex*)d3dMappedResource.pData;
for (int i = 0; i < changedVertices.size(); ++i)
{
pBuffer[changedVertices[i]].xfPosition.x = pVerticesInfo[changedVertices[i]].xfPosition.x;
pBuffer[changedVertices[i]].xfPosition.y = pVerticesInfo[changedVertices[i]].xfPosition.y;
pBuffer[changedVertices[i]].xfPosition.z = pVerticesInfo[changedVertices[i]].xfPosition.z;
}
pImmediateContext->Unmap(_pVertexBuffer, 0);
changedVertices.clear();
This has the effect of only updating the vertices that have changed, rather than all vertices in the model.
This also allows for some more complex manipulations. You can select multiple vertices and move them all as a group, select a whole face and move all the connected vertices, or move entire regions of the model relatively easily, assuming your picking method is capable of handling this.
In addition, if you record the change sets with enough information (the affected vertices and the displacement index), you can fairly easily implement an undo function by simply reversing the displacement vector and reapplying the selected change set.

Avoiding memory leaks while using vector<Mat>

I am trying to write a code that uses opencv Mat objects it goes something like this
Mat img;
vector<Mat> images;
for (i = 1; i < 5; i++)
{
img.create(h,w,type) // h,w and type are given correctly
// input an image from somewhere to img correctly.
images.push_back(img);
img.release()
}
for (i = 1; i < 5; i++)
images[i].release();
I however still seem to have memory leakage can anyone tell me why it is so?
I thought that if the refcount of a mat object = 0 then the memory should be automatically deallocated
You rarely need to call release explicitly, since OpenCV Mat objects take automatically care of internal memory.
Also take care that Mat copy just copies creates a new header pointing to the same data. If the original Mat goes out of scope you are left with an invalid matrix. So when you push the image into the vector, use a deep copy (clone()) to avoid that it the image into the vector becomes invalid.
Since you mentioned:
I have a large 3D image stored in a Mat object. I am running over it using for loops. creating a 2D mat called "image" putting the slices into image, pushing back image to vector images. releasing the image. And later doing a for loop on the images vector releasing all the matrices one by one.
You can store all slices into the vector with the following code. To release the images in the vector, just clear the vector.
#include <opencv2/opencv.hpp>
#include <vector>
using namespace cv;
using namespace std;
int main()
{
// Init the multidimensional image
int sizes[] = { 10, 7, 5 };
Mat data(3, sizes, CV_32F);
randu(data, Scalar(0, 0, 0), Scalar(1,1,1));
// Put slices into images
vector<Mat> images;
for (int z = 0; z < data.size[2]; ++z)
{
// Create the slice
Range ranges[] = { Range::all(), Range::all(), Range(z, z + 1) };
Mat slice(data(ranges).clone()); // with clone slice is continuous, but still 3d
Mat slice2d(2, &data.size[0], data.type(), slice.data); // make the slice a 2d image
// Clone the slice into the vector, or it becomes invalid when slice goes of of scope.
images.push_back(slice2d.clone());
}
// You can deallocate the multidimensional matrix now, if needed
data.release();
// Work with slices....
// Release the vector of slices
images.clear();
return 0;
}
Please try this code, which is basically what you do:
void testFunction()
{
// image width/height => 80MB images
int size = 5000;
cv::Mat img = cv::Mat(size, size, CV_8UC3);
std::vector<cv::Mat> images;
for (int i = 0; i < 5; i++)
{
// since image size is the same for i==0 as the initial image, no new data will be allocated in the first iteration.
img.create(size+i,size+i,img.type()); // h,w and type are given correctly
// input an image from somewhere to img correctly.
images.push_back(img);
// release the created image.
img.release();
}
// instead of manual releasing, a images.clear() would have been enough here.
for(int i = 0; i < images.size(); i++)
images[i].release();
images.clear();
}
int main()
{
cv::namedWindow("bla");
cv::waitKey(0);
for(unsigned int i=0; i<100; ++i)
{
testFunction();
std::cout << "another iteration finished" << std::endl;
cv::waitKey(0);
}
std::cout << "end of main" << std::endl;
cv::waitKey(0);
return 0;
}
After the first call of testFunction, memory will be "leaked" so that the application consumes 4 KB more memory on my device. But not more "leaks" after additional calls for me...
So this looks like your code is ok and the "memory leak" isn't related to that matrix creation and releasing, but maybe to some "global" things happening within the openCV library or C++ to optimize future function calls or memory allocations.
I encountered same problems when iterate openCV mat. The memory consumption can be 1.1G, then it stopped by warning that no memory. In my program, there are macro #define new new(FILE, LINE), crashed with some std lib. So I deleted all Overloading Operator about new/delete. When debugging, it has no error. But when it runs, I got "Debug Assertion Failed! Expression: _pFirstBlock == pHead". Following the instruction
Debug Assertion Error in OpenCV
I changed setting from MT (Release)/MTd (Debug)to
Project Properties >> Configuration Properties >> C/C++ >> Code Generation and changing the Runtime Library to:
Multi-threaded Debug DLL (/MDd), if you are building the Debug version of your code.
Multi-threaded DLL(/MD), if you are building the Release version of your code.
The memory leak is gone. The memory consumption is 38M.

Update directX texture

How can I solve following task: some app need to
use dozens dx9 terxtures (render them with dx3d)
and
update some of them (whole or in part).
I.e. sometimes (once per frame/second/minute) i need to write bytes (void *) in different formats (argb, bgra, rgb, 888, 565) to some sub-rect of existing texture.
In openGL solution is very simple - glTexImage2D. But here unfamiliar platform features completely confused me.
Interested in solution for both dx9 and dx11.
To update a texture, make sure the texture is created in D3DPOOL_MANAGED memory pool.
D3DXCreateTexture( device, size.x, size.y, numMipMaps,usage, textureFormat, D3DPOOL_MANAGED, &texture );
Then call LockRect to update the data
RECT rect = {x,y,z,w}; // the dimensions you want to lock
D3DLOCKED_RECT lockedRect = {0}; // "out" parameter from LockRect function below
texture->LockRect(0, &lockedRect, &rect, 0);
// copy the memory into lockedRect.pBits
// make sure you increment each row by "Pitch"
unsigned char* bits = ( unsigned char* )lockedRect.pBits;
for( int row = 0; row < numRows; row++ )
{
// copy one row of data into "bits", e.g. memcpy( bits, srcData, size )
...
// move to the next row
bits += lockedRect.Pitch;
}
// unlock when done
texture->UnlockRect(0);

Memory Leak while using CvSeq in cvFindContours

I am new to OpenCV and i faced some problem while using it.
Currently i am work on Binary Partitioning Tree (BPT) algorithm. Basically I need split the image into many regions, and based on some parameter. 2 regions will merged and form 1 new region, which consists of these 2 regions.
I managed to get initial regions by using cvWatershed. I also created a vector to store these regions, each in 1 vector block. However, I get memory leak when I tried to move the contour information into vector. It says, memory leak.
for (int h = 0; h <compCount; h++) // compCount - Amount of regions found through cvWaterShed
{
cvZero(WSRegion); // clears out an image, used for painting
Region.push_back(EmptyNode); // create an empty vector slot
CvScalar RegionColor = colorTab[h]; // the color of the region in watershed
for (int i = 0; i <WSOut->height; i++)
{
for (int j = 0; j <WSOut->width; j++)
{
CvScalar s = cvGet2D(WSOut, i, j); // get pixel color in watershed image
if (s.val[0] == RegionColor.val[0] && s.val[1] == RegionColor.val[1] && s.val[2] == RegionColor.val[2])
{
cvSet2D(WSRegion, i, j, cvScalarAll(255)); // paint the pixel to white if it has the same color with the region[h]
}
}
}
MemStorage = cvCreateMemStorage(); // create memory storage
cvFindContours(WSRegion, MemStorage, &contours, sizeof(CvContour), CV_RETR_LIST);
Region[h].RegionContour = cvCloneSeq(contours); // clone and store in vector Region[h]
Region[h].RegionContour->h_next = NULL;
}
Is it any ways I can solve this problem? Or is there any alternative that I do not need to create a new memory storage for every region vector? Thank You in advance
You should create the memory storage only once before the loop, cvFindContours can use that, and after the loop you should release the storage with:
void cvReleaseMemStorage(CvMemStorage** storage)
You can also take a look here for the CvMemStorage specification :
http://opencv.itseez.com/modules/core/doc/dynamic_structures.html?highlight=cvreleasememstorage#CvMemStorage
EDIT:
Your next problem is with cvCloneSeq(). Here are some specifications for it:
CvSeq* cvCloneSeq(const CvSeq* seq, CvMemStorage* storage=NULL )
Parameters:
seq – Sequence
storage – The destination storage block to hold the new sequence header and the copied data, if any. If it is NULL, the function uses the storage block containing the input sequence.
As you can see if you don't specify a different memory storage, it will clone the sequence in the same memory block as the input. When you are releasing the memory storage after the loop you are also releasing the last contour and it's clone that you pushed in the list.

View GPU Memory / View Texture2D memory space for debugging

I've got a question about a PixelShader I am trying to implement, and what I currently do (this is just for debugging, and trying to figure stuff out):
int3 loc;
loc.x = (int)(In.TextureUV.x * resolution_XY.x);
loc.y = (int)(In.TextureUV.x * resolution_XY.x);
loc.z = 0;
float4 r = g_txDiffuse.Load(loc);
return float4(r.x, r.y, r.z, 1);
The point is, this is always 0,0,0,1
The texture buffer is created:
D3D11_TEXTURE2D_DESC tDesc;
tDesc.Height = 480;
tDesc.Width = 640;
tDesc.Usage = D3D11_USAGE_DYNAMIC;
tDesc.MipLevels = 1;
tDesc.ArraySize = 1;
tDesc.SampleDesc.Count = 1;
tDesc.SampleDesc.Quality = 0;
tDesc.Format = DXGI_FORMAT_R8_UINT;
tDesc.CPUAccessFlags = D3D11_CPU_ACCESS_WRITE;
tDesc.BindFlags = D3D11_BIND_SHADER_RESOURCE;
tDesc.MiscFlags = 0;
V_RETURN(pd3dDevice->CreateTexture2D(&tDesc, NULL, &g_pCurrentImage));
I upload the texture (which should be a live display at the end) via:
D3D11_MAPPED_SUBRESOURCE resource;
pd3dImmediateContext->Map(g_pCurrentImage, 0, D3D11_MAP_WRITE_DISCARD, 0, &resource);
memcpy( resource.pData, g_Images.GetData(), g_Images.GetDataSize() );
pd3dImmediateContext->Unmap( g_pCurrentImage, 0 );
I've checked the resource.pData, the data in there is a valid 8bit monochrome image. I made sure the data coming from the camera is 8bit monochrome 640x480.
There's a few things I don't fully understand:
if I run the Map / memcpy / Unmap routine in every frame, the driver will ultimately crash, the system will be unresponsive. Is there a different way to update a complete texture every frame which should be done?
the texture I uploaded is 8bit, why is the Texture2D.load() a float4 return? Do I have to use a different method to access the texture data? I tried to .sample it, but that didn't work either. Would I have to use a int buffer or something instead?
is there a way to debug the GPU memory, to check if the memcpy worked in the first place?
The Map, memcpy, Unmap really ought not to crash unless2 you are trying to copy too much data into the texture. It would be interesting to know what "GetDataSize()" returns. Does it equal 307,200? If its more than that then there lies your problem.
Texture2D returns a float4 because thats what you've asked for. If you write float r = g_txDiffuse.Load( ... ). The 8-bits get extended to a normalised float as part of the load process. Are you sure, btw, that your calculation of "loc" is correct because as you have it now loc.x and loc.y will always be the same.
You can debug whats going on with DirectX using PIX. Its a great tool and I highly recommend you familiarise yourself with it.

Resources