I have a pointer to loacation in memory which contains my pixel information.
I want to display this on screen using textures or any other form.Is it possible to create texture from memory in directx 9?
Here is the code sample
DWORD c0=D3DCOLOR_ARGB(255,0,0,0), c1=D3DCOLOR_ARGB(255,255,255,255);
DWORD *pData,
if (m_pDevice->CreateTexture(8,8,1,D3DUSAGE_DYNAMIC, D3DFMT_A8R8G8B8, D3DPOOL_DEFAULT, &m_pPointTexture,NULL)!=D3D_OK)
return false;
if (m_pPointTexture->LockRect(0,&r,NULL, D3DLOCK_DISCARD |D3DLOCK_NOOVERWRITE)!=D3D_OK)
return false;
for (int y=0; y<8; y++)
pData=(DWORD*)((BYTE*)r.pBits + r.Pitch);
for (int x=0; x<8; x++)
Then in i render it by drawing rectangle primitive
Here before setTexture if i save my texture to a BMP file by D3DXSaveTextureToFile() and then create texture using D3DXCreateTextureFromFile(). Then i get expected output
D3DXCreateTextureFromFileInMemory (doc) if it's a imagefile in memory.
However, if it's only the color data just create a texture with D3DXCreateTexture (doc), lock it and write the data to the texture manually.
I am trying to write a code that uses opencv Mat objects it goes something like this
Mat img;
vector<Mat> images;
for (i = 1; i < 5; i++)
img.create(h,w,type) // h,w and type are given correctly
// input an image from somewhere to img correctly.
for (i = 1; i < 5; i++)
I however still seem to have memory leakage can anyone tell me why it is so?
I thought that if the refcount of a mat object = 0 then the memory should be automatically deallocated
You rarely need to call release explicitly, since OpenCV Mat objects take automatically care of internal memory.
Also take care that Mat copy just copies creates a new header pointing to the same data. If the original Mat goes out of scope you are left with an invalid matrix. So when you push the image into the vector, use a deep copy (clone()) to avoid that it the image into the vector becomes invalid.
Since you mentioned:
I have a large 3D image stored in a Mat object. I am running over it using for loops. creating a 2D mat called "image" putting the slices into image, pushing back image to vector images. releasing the image. And later doing a for loop on the images vector releasing all the matrices one by one.
You can store all slices into the vector with the following code. To release the images in the vector, just clear the vector.
#include <opencv2/opencv.hpp>
#include <vector>
using namespace cv;
using namespace std;
int main()
// Init the multidimensional image
int sizes[] = { 10, 7, 5 };
Mat data(3, sizes, CV_32F);
randu(data, Scalar(0, 0, 0), Scalar(1,1,1));
// Put slices into images
vector<Mat> images;
for (int z = 0; z < data.size[2]; ++z)
// Create the slice
Range ranges[] = { Range::all(), Range::all(), Range(z, z + 1) };
Mat slice(data(ranges).clone()); // with clone slice is continuous, but still 3d
Mat slice2d(2, &data.size[0], data.type(), slice.data); // make the slice a 2d image
// Clone the slice into the vector, or it becomes invalid when slice goes of of scope.
// You can deallocate the multidimensional matrix now, if needed
// Work with slices....
// Release the vector of slices
return 0;
Please try this code, which is basically what you do:
void testFunction()
// image width/height => 80MB images
int size = 5000;
cv::Mat img = cv::Mat(size, size, CV_8UC3);
std::vector<cv::Mat> images;
for (int i = 0; i < 5; i++)
// since image size is the same for i==0 as the initial image, no new data will be allocated in the first iteration.
img.create(size+i,size+i,img.type()); // h,w and type are given correctly
// input an image from somewhere to img correctly.
// release the created image.
// instead of manual releasing, a images.clear() would have been enough here.
for(int i = 0; i < images.size(); i++)
int main()
for(unsigned int i=0; i<100; ++i)
std::cout << "another iteration finished" << std::endl;
std::cout << "end of main" << std::endl;
return 0;
After the first call of testFunction, memory will be "leaked" so that the application consumes 4 KB more memory on my device. But not more "leaks" after additional calls for me...
So this looks like your code is ok and the "memory leak" isn't related to that matrix creation and releasing, but maybe to some "global" things happening within the openCV library or C++ to optimize future function calls or memory allocations.
I encountered same problems when iterate openCV mat. The memory consumption can be 1.1G, then it stopped by warning that no memory. In my program, there are macro #define new new(FILE, LINE), crashed with some std lib. So I deleted all Overloading Operator about new/delete. When debugging, it has no error. But when it runs, I got "Debug Assertion Failed! Expression: _pFirstBlock == pHead". Following the instruction
Debug Assertion Error in OpenCV
I changed setting from MT (Release)/MTd (Debug)to
Project Properties >> Configuration Properties >> C/C++ >> Code Generation and changing the Runtime Library to:
Multi-threaded Debug DLL (/MDd), if you are building the Debug version of your code.
Multi-threaded DLL(/MD), if you are building the Release version of your code.
The memory leak is gone. The memory consumption is 38M.
In my iOS app, written in Swift, I generate a Metal buffer with:
vertexBuffer = device.newBufferWithBytes(vertices, length: vertices.count * sizeofValue(vertices[0]), options: nil)
And bind it to my shader program with:
renderCommandEncoder.setVertexBuffer(vertexBuffer, offset: 0, atIndex: 1)
In my shader program, written in Metal shading language, can I access the size of the buffer? I would like to access the next vertex in my buffer to do some differential calculation. Something like:
vertex float4 my_vertex(const device packed_float3* vertices [[buffer(1)]],
unsigned int vid[[vertex_id]]) {
float4 vertex = vertices[vid];
// Need to clamp this to not go beyond buffer,
// but how do I know the max value of vid?
float4 nextVertex = vertices[vid + 1];
float4 tangent = nextVertex - vertex;
// ...
Is my only option to pass the number of vertices as a uniform?
As far as I know, no you can't because the vertices points to an address. Just like C++, must have two things to know the count or size of an array:
1) know what data type of the array (float or some struct)
2a) the array count for the data type OR
2b) the total bytes of the array.
So yes, you would need to pass the array count as a uniform.
For texture buffers you can.
You can get the size of a texture buffer from within the shader code.
Texture buffers have a get_width() and get_height() function, which return a uint.
uint get_width() const;
uint get_height() const;
But that probably does not answer OP's question about vertex buffers.
Actually you can. You can use the resulting value for loops or conditionals. You can't use it to initialise objects. (so dynamic arrays fail)
uint tempUint = 0; // some random type
uint uintSize = sizeof(tempUint); // get the size for the type
uint aVectorSize = sizeof(aVector) / uintSize; // divide the buffer by the type.
float dynamicArray[aVectorSize]; // this fails
for (uint counter = 0; counter < aVectorSize; ++ counter) {
// do stuff
if (aVectorSize > 10) {
// do more stuff
I use Farseer Physics Engine for pump simulation.
In there Example, they always use texture2d format.
But that pump shape is given just Point(x,y) Array.
I want to make polygon or texture2d from that point array.
PolygonTools.CreatePolygon method need int[] and width, not point[].
I don`t know how to make polygon by int[] and width.
please help.
so you wish to create a texture2d from array... hm... i will try explain how will i try this, this is not working example just hint how to do it.
first you need to find with and height, so find max X and max Y to create blank texture.
Texture2D blankTexture = new Texture2D(GraphicsDevice, maxX, maxY, false, SurfaceFormat.Color);
then loop over texture and set pixel color from your array
for(int i=0; i<blankTexture .width; i++)
for(int j=0; j<blankTexture .height; j++)
// pixel = texture.GetPixel(i, j);
// loop over array, and if pointX in array = i and pointY in array = j then
pixel.Color = Color.White; //
i think this is quite cpu expensive way... but it could work.
How can I solve following task: some app need to
use dozens dx9 terxtures (render them with dx3d)
update some of them (whole or in part).
I.e. sometimes (once per frame/second/minute) i need to write bytes (void *) in different formats (argb, bgra, rgb, 888, 565) to some sub-rect of existing texture.
In openGL solution is very simple - glTexImage2D. But here unfamiliar platform features completely confused me.
Interested in solution for both dx9 and dx11.
To update a texture, make sure the texture is created in D3DPOOL_MANAGED memory pool.
D3DXCreateTexture( device, size.x, size.y, numMipMaps,usage, textureFormat, D3DPOOL_MANAGED, &texture );
Then call LockRect to update the data
RECT rect = {x,y,z,w}; // the dimensions you want to lock
D3DLOCKED_RECT lockedRect = {0}; // "out" parameter from LockRect function below
texture->LockRect(0, &lockedRect, &rect, 0);
// copy the memory into lockedRect.pBits
// make sure you increment each row by "Pitch"
unsigned char* bits = ( unsigned char* )lockedRect.pBits;
for( int row = 0; row < numRows; row++ )
// copy one row of data into "bits", e.g. memcpy( bits, srcData, size )
// move to the next row
bits += lockedRect.Pitch;
// unlock when done
What's the efficient way to render a bunch of layered textures? I have some semitransparent textured rectangles that I position randomly in 3D space and render them from back to front.
Currently I call d3dContext->PSSetShaderResources() to feed the pixel shader with a new texture before each call to d3dContext->DrawIndexed(). I have a feeling that I am copying the texture to the GPU memory before each draw. I might have 10-30 ARGB textures roughly 1024x1024 pixels each and they are associated across 100-200 rectangles that I render on screen. My FPS is OK at 100, but goes pretty bad around 200. I possibly have some inefficiencies elsewhere since this is my first semi-serious D3D code, but I strongly suspect this has to do with copying the textures back and forth. 30*1024*1024*4 is 120MB, which is a bit high for a Metro Style App that should target any Windows 8 device. So putting them all in there might be a stretch, but maybe I could at least cache a few somehow? Any ideas?
*EDIT - Some code snippets added
Constant Buffer
struct ModelViewProjectionConstantBuffer
DirectX::XMMATRIX model;
DirectX::XMMATRIX view;
DirectX::XMMATRIX projection;
float opacity;
float3 highlight;
float3 shadow;
float textureTransitionAmount;
The Render Method
void RectangleRenderer::Render()
// Clear background and depth stencil
const float backgroundColorRGBA[] = { 0.35f, 0.35f, 0.85f, 1.000f };
// Don't draw anything else until all textures are loaded
if (!m_loadingComplete)
UINT stride = sizeof(BasicVertex);
UINT offset = 0;
// The vertext buffer only has 4 vertices of a rectangle
// The index buffer only has 4 vertices
FLOAT blendFactors[4] = { 0, };
m_d3dContext->OMSetBlendState(m_blendState.Get(), blendFactors, 0xffffffff);
0, // starting at the first sampler slot
1, // set one sampler binding
// number of rectangles is in the 100-200 range
for (int i = 0; i < m_rectangles.size(); i++)
// start rendering from the farthest rectangle
int j = (i + m_farthestRectangle) % m_rectangles.size();
m_vsConstantBufferData.model = m_rectangles[j].transform;
m_vsConstantBufferData.opacity = m_rectangles[j].Opacity;
m_vsConstantBufferData.highlight = m_rectangles[j].Highlight;
m_vsConstantBufferData.shadow = m_rectangles[j].Shadow;
m_vsConstantBufferData.textureTransitionAmount = m_rectangles[j].textureTransitionAmount;
auto a = m_rectangles[j].textureId;
auto b = m_rectangles[j].targetTextureId;
auto srv1 = m_textures[m_rectangles[j].textureId].textureSRV.GetAddressOf();
auto srv2 = m_textures[m_rectangles[j].targetTextureId].textureSRV.GetAddressOf();
ID3D11ShaderResourceView* srvs[2];
srvs[0] = *srv1;
srvs[1] = *srv2;
0, // starting at the first shader resource slot
2, // set one shader resource binding
Pixel Shader
cbuffer ModelViewProjectionConstantBuffer : register(b0)
matrix model;
matrix view;
matrix projection;
float opacity;
float3 highlight;
float3 shadow;
float textureTransitionAmount;
Texture2D baseTexture : register(t0);
Texture2D targetTexture : register(t1);
SamplerState simpleSampler : register(s0);
struct PixelShaderInput
float4 pos : SV_POSITION;
float3 norm : NORMAL;
float2 tex : TEXCOORD0;
float4 main(PixelShaderInput input) : SV_TARGET
float3 lightDirection = normalize(float3(0, 0, -1));
float4 baseTexelColor = baseTexture.Sample(simpleSampler, input.tex);
float4 targetTexelColor = targetTexture.Sample(simpleSampler, input.tex);
float4 texelColor = lerp(baseTexelColor, targetTexelColor, textureTransitionAmount);
float4 shadedColor;
shadedColor.rgb = lerp(shadow.rgb, highlight.rgb, texelColor.r);
shadedColor.a = texelColor.a * opacity;
return shadedColor;
As Jeremiah has suggested, you are not probably moving texture from CPU to GPU for each frame as you would have to create new texture for each frame or using "UpdateSubresource" or "Map/UnMap" methods.
I don't think that instancing is going to help for this specific case, as the number of polygons is extremely low (I would start to worry with several millions of polygons). It is more likely that your application is going to be bandwidth/fillrate limited, as your are performing lots of texture sampling/blending (It depends on tecture fillrate, pixel fillrate and the nunber of ROP on your GPU).
In order to achieve better performance, It is highly recommended to:
Make sure that all your textures have all mipmaps generated. If they
don't have any mipmaps, It will hurt a lot the cache of the GPU. (I also assume that you are using texture.Sample method in HLSL, and not texture.SampleLevel or variants)
Use Direct3D11 Block Compressed texture on the GPU, by using a tool
like texconv.exe or preferably the sample from "Windows DirectX 11
Texture Converter".
On a side note, you will probably get more attention for this kind of question on https://gamedev.stackexchange.com/.
I don't think you are doing any copying back and forth from GPU to system memory. You usually have to explicitly do that a call to Map(...), or by blitting to a texture you created in system memory.
One issue, is you are making a DrawIndexed(...) call for each texture. GPUs work most efficiently if you send it a bunch of work to do by batching. One way to accomplish this is to set n-amount of textures to PSSetShaderResources(i, ...), and do a DrawIndexedInstanced(...). Your shader code would then read each of the shader resources and draw them. I do this in my C++ DirectCanvas code here (SpriteInstanced.cpp). This can make for a lot of code, but the result is very efficient (I even do the matrix ops in the shader for more speed).
One other, maybe a lot easier way, is to give the DirectXTK spritebatch a shot.
I used it here in this project...only for a simple blit but it may be a good start to see the minor amount of setup needed to use the spritebatch.
Also, if possible, try to "atlas" your texture. For instance, try to fit as many "images" in a texture as possible and blit from them vs having a single texture for each.