I'm working on a volume rendering program using DirectX 11.
I render both to a window ( HWND ) and to a texture ( ID3D11Texture2D ).
While the rendering for the HWND always looks correct, my ID3D11Texture2D looks corrupt for render sizes smaller than 64x64:
I wonder whether there is a minimum size limit for textures in DirectX 11.
Unfortunately, I was only able to find information about the maximum texture size limit.
There is no minimum texture size; 1x1x1 is valid.
It looks to me like you've mapped the 3D texture and are extracting the data while ignoring the "RowPitch" returned. On textures that are sufficiently small (or of unusual dimensions) the address at which the next row of texels begins need not necessarily be contiguous after the previous row, but will instead begin "RowPitch" bytes after the last.
See D3D11_MAPPED_SUBRESOURCE
Related
I am trying to work with 3D texture in webgl2 and I came to know about the
gl.texImage3D();
I have experience with 2d texture and I found it very convenient but there is another approach that people are using on the internet.
gl.texStorage3D()
and then,
gl.texSubImage3D() // with all offset of x,y and z as 0.
I just want to know what is the difference between the two approaches. I came to know that equivalent of the second option is available for the 2D texture as well but I don't use it to provide data to the target. I know that subimage is to create texture's subimage to the fragment shader but I don't understand what is the difference between two approaches.
The short answer is texStorage2D and texStorage3D allocate all of the texture memory up front. Where as texImage2D and texImage3D allocate one mip level at a time.
texSubImage2D and texSubImage3D do not allocate anything. They just copy data into a texture mip level that was previously allocated with one of the functions above.
As for why one or the other. texStorage2D and texStorage3D can immediately allocate memory on the GPU. texImage2D and texImage3D can not since they don't know the complete texture (all the mips) until you actually try to draw something with the texture. To put it another way, texStorage2D/3D might be more efficient where as texImage2D/3D is more flexible.
In order for a texture to actually be renderable, all the mip levels you are going to use need to be the same internal format and the correct sizes.
When you call texStorage2D/3D you tell the size of mip level 0 (the largest level) and how many mip levels in total to allocate. So let's say you tell it an internal format of gl.RGBA8, width and height of 8 and 4 mip levels.
gl.texStorage2D(gl.TEXTURE_2D,
4, // 4 levels
gl.RGBA8, // internal format
8, // width
8); // height
It will allocate 8x8x4, 4x4x4, 2x2x4, 1x1x4 mip levels, all 4 mip levels. It knows they are all RGBA8. It knows they are all the correct size. Textures allocated with texStorage2D can't be changed in size or internal format. If you try to call texImage2D on a texture created with texStorage2D you'll get an error.
If you instead used texImage2D well first you probably specify the first mip
gl.texImage2D(gl.TEXTURE_2D,
0, // mip level
gl.RGBA8, // internal format
8, // width
8, // height
0, // border
gl.RGBA, // data format
gl.UNSIGNED_BYTE, // data type
data);
so now you have just 1 mip level, level #0. Will you add the other 3 mips? Will they be the correct size? Will those other 3 mips have the same internal format? Will you change mip level #0 to something else, a different size, or different internal format? WebGL doesn't have any idea what your next command will be, it has to wait until you actually try to draw with the texture before it can check. With texStorage you decide the sizes and formats of all the mips up front so it only has to check one time. With texImage you don't tell it everything up front so it has to check at draw time again if things change.
I'm using SharpDX and I want to do antialiasing in the Depth buffer. I need to store the Depth Buffer as a texture to use it later. So is it a good idea if this texture is a Texture2DMS? Or should I take another approach?
What I really want to achieve is:
1) Depth buffer scaling
2) Depth test supersampling
(terms I found in section 3.2 of this paper: http://gfx.cs.princeton.edu/pubs/Cole_2010_TFM/cole_tfm_preprint.pdf
The paper calls for a depth pre-pass. Since this pass requires no color, you should leave the render target unbound, and use an "empty" pixel shader. For depth, you should create a Texture2D (not MS) at 2x or 4x (or some other 2Nx) the width and height of the final render target that you're going to use. This isn't really "supersampling" (since the pre-pass is an independent phase with no actual pixel output) but it's similar.
For the second phase, the paper calls for doing multiple samples of the high-resolution depth buffer from the pre-pass. If you followed the sizing above, every pixel will correspond to some (2N)^2 depth values. You'll need to read these values and average them. Fortunately, there's a hardware-accelerated way to do this (called PCF) using SampleCmp with a COMPARISON sampler type. This samples a 2x2 stamp, compares each value to a specified value (pass in the second-phase calculated depth here, and don't forget to add some epsilon value (e.g. 1e-5)), and returns the averaged result. Do 2x2 stamps to cover the entire area of the first-phase depth buffer associated with this pixel, and average the results. The final result represents how much of the current line's spine corresponds to the foremost depth of the pre-pass. Because of the PCF's smooth filtering behavior, as lines become visible, they will slowly fade in, as opposed to the aliased "dotted" line effect described in the paper.
How to interpret texture memory information output by deviceQuery sample to know texture memory size?
Here is output of my texture memory.
Max Texture Dimension Size (x,y,z) 1D=(65536), 2D=(65536,65535),3D=(2048,2048,2048)
Max Layered Texture Size (dim) x layers 1D=(16384) x 2048, 2D=(16384,16384) x 2048
It is a common misconception, but there is no such thing as "texture memory" in CUDA GPUs. There are only textures, which are global memory allocations accessed through dedicated hardware which has inbuilt cache, filtering and addressing limitations which lead to the size limits you see reported in the documentation and device query. So the limit is either roughly the free amount of global memory (allowing for padding and alignment in CUDA arrays) or the dimensional limits you already quoted.
The output shows that the maximum texture dimensions are:
For 1D textures 65536
For 2D textures 65536*65535
For 3D textures 2048*2048*2048
If you want the size in bytes, multiply that by the maximum number of channels (4) and the maximum sub-pixel size (4B).
(For layered textures, multiply the relevant numbers you got for the dimensions by the number of maximum layers you got.)
However, this is the maximum size for a single texture, not the available memory for all textures.
Suppose I have a texture which is naturally not square (for example, a photographic texture of something with a 4:1 aspect ratio). And suppose that I want to use PVRTC compression to display this texture on an iOS device, which requires that the texture be square. If I scale up the texture so that it is square during compression, the result is a very blurry image when the texture is viewed from a distance.
I believe that this caused by mipmapping. Since the mipmap filter sees the new larger stretched dimension, it uses that to choose a low mip level, which is actually not correct, since those pixels were just stretched to that size. If it looked at the other dimension, it would choose a higher resolution mip level.
This theory is confirmed (somewhat) by the observation that if I leave the texture in a format that doesn't have to be square, the mipmap versions look just dandy.
There is a LOD Bias parameter, but the docs say that is applied to both dimensions. It seems like what is called for is a way to bias the LOD but only in one dimension (that is, to bias it toward more resolution in the dimension of the texture which was scaled up).
Other than chopping up the geometry to allow the use of square subsets of the original texture (which is infeasible, give our production pipeline), does anyone have any clever hacks they've used to deal with this issue?
It seems to me that you have a few options, depending on what you can do with, say, the vertex UVs.
[Hmm Just realised that in the following I'm assuming that the V coordinates run from the top to the bottom... you'll need to allow for me being old school :-) ]
The first thing that comes to mind is to take your 4N*N (X*Y) source texture and repeat it 4x vertically to give a 4N*4N texture, and then adjust the V coordinates on the model to be 1/4 of their current values. This won't save you much in terms of memory (since it effectively means a 4bpp PVRTC becomes 4x larger) but it will still save bandwidth and cache space, since the other parts of the texture won't be accessed. MIP mapping will also work all the way down to 1x1 textures.
Alternatively, if you want to save a bit of space and you have a pair of 4N*N textures, you could try packing them together into a "sort of" 4N*4N atlas. Put the first texture in the top N rows, then follow it by the N/2 of the top rows. The pack the bottom N/2 rows of the 2nd texture, followed by the second texture, and then the top N/2 rows. Finally, do the bottom N/2 rows of the first texture. For the UVs that access the first texture, do the same divide by 4 for the V parameter. For the second texture, you'll need to divide by 4 and add 0.5
This should work fine until the MIP map level is so small that the two textures are being blended together... but I doubt that will really be an issue.
I want to record images, rendered with OpenGL, into a movie-file with the help of AVAssetWriter. The problem arises, that the only way to access pixels from an OpenGL framebuffer is by using glReadPixels, which only supports the RGBA-pixel format on iOS. But AVAssetWriter doesn't support this format. Here I can either use ARGB or BGRA. As the alpha-values can be ignored, I came to the conclusion, that the fastest way to convert RGBA to ARGB would be to give glReadPixels the buffer shifted by one byte:
UInt8 *buffer = malloc(width*height*4+1);
glReadPixels(0, 0, width, height, GL_RGBA, GL_UNSIGNED_BYTE, buffer+1);
The problem is, that the glReadPixels call leads to a EXC_BAD_ACCESS crash. If I don't shift the buffer by one byte, it works perfectly (but obviously with wrong colors in the video-file). What's the problem here?
I came to the conclusion, that the fastest way to convert RGBA to ARGB would be to give glReadPixels the buffer shifted by one byte
This will however shift your alpha values by 1 pixel as well. Here's another suggestion:
Render the picture to a texture (using a FBO with that texture as color attachment). Next render that texture to another framebuffer, with a swizzling fragment shader:
#version ...
uniform sampler2D image;
uniform vec2 image_dim;
void main()
{
// we want to address texel centers by absolute fragment coordinates, this
// requires a bit of work (OpenGL-ES SL doesn't provide texelFetch function).
gl_FragColor.rgba =
texture2D(image, vec2( (2*gl_FragCoord.x + 1)/(2*image_dim.y),
(2*gl_FragCoord.y + 1)/(2*image_dim.y) )
).argb; // this swizzles RGBA into ARGB order if read into a RGBA buffer
}
What happens if you put an extra 128 bytes of slack on the end of your buffer? It might be that OpenGL is trying to fill 4/8/16/etc bytes at a time for performance, and has a bug when the buffer is non-aligned or something. It wouldn't be the first time a performance optimization in OpenGL had issues on an edge case :)
Try calling
glPixelStorei(GL_PACK_ALIGNMENT,1)
before glReadPixels.
From the docs:
GL_PACK_ALIGNMENT
Specifies the alignment requirements for the start of each pixel row in memory.
The allowable values are
1 (byte-alignment),
2 (rows aligned to even-numbered bytes),
4 (word-alignment), and
8 (rows start on double-word boundaries).
The default value is 4 (see glGet). This often gets mentioned as a troublemaker in various "OpenGL pitfalls" type lists, although this is generally more to do with its row padding effects than buffer alignment.
As an alternative approach, what happens if you malloc 4 extra bytes, do the glReadPixels as 4-byte aligned starting at buffer+4, and then pass your AVAssetWriter buffer+3 (although I've no idea whether AVAssetWriter is more tolerant of alignment issues) ?
You will need to shift bytes by doing a memcpy or other copy operation. Modifying the pointers will leave them unaligned, which may or may not be within the capabilities of any underlying hardware (DMA bus widths, tile granularity, etc.)
Using buffer+1 will mean the data is not written at the start of your malloc'd memory, but rather one byte in, so it will be writing over the end of your malloc'd memory, causing the crash.
If iOS's glReadPixels will only accept GL_RGBA then you'll have to go through and re-arrange them yourself I think.
UPDATE, sorry I missed the +1 in your malloc, StilesCrisis is probably right about the cause of the crash.