What format for #greyscale' render targets in directX? - directx

I have a directx9 game engine that creates its normal adaptor with this format:
D3DFMT_X8R8G8B8
I have a system where I render some objects to an offscreen render target, as lightmaps. I then use that lightmap data to composite back to the back buffer where they act as a full screen 'mask' and let me get the effect of torches or other light sources on a dark scene.
Everything works just great.
The problem is, I'm aware that my big offscreen lightmap render targets are 16MB each, at a large res, and I only really need 8 bits of data (greyscale) from them, so 75% of the 32 bit render target memory is a waste. (I'm targeting low spec cards).
I tried creating the render targets as
D3DFMT_A8
But directx silently fails on that (if I add CheckDeviceFormat() I see it happen) and creates 32 bit anyway. I use the D3DXCreateTexture function
My question is, what format is best for creating these offscreen buffers?
Thankyou for your help, I'm not good at render target related stuff :)

D3DFMT_L8 is 8 bit luminance. I believe it's supported on GeForce 3 (i.e. the first consumer card with shader 1.1!), so must be available everywhere. I think the colour is read as L, L, L, 1, i.e. rgb = luminance value, alpha = 1.
Edit: this tool is useful for finding caps:
http://zp.lo3.wroc.pl/cdragan/wizard.php

Ontopic: If you are targeting lower spec cards, you are very likely to be running on systems where 8-bit single channel render targets are not supported at all.
If you are using shaders to do the rendering and compositing, it should be possible to use the rgba channels for 4 alternating pixels of your lightmap, packing your information. Perhaps you can tell us a little bit more about your current rendering setup?
Offtopic: AWESOME to have you here on StackOverflow, big fan of your work!

Related

Webgl Upload Texture Data to the gpu without a draw call

I'm using webgl to do YUV to RGB conversions on a custom video codec.
The video has to play at 30 fps. In order to make this happen I'm doing all my math every other requestAnimationFrame.
This works great, but I noticed when profiling that uploading the textures to the gpu takes the longest amount of time.
So I uploaded the "Y" texture and the "UV" texture separately.
Now the first "requestAnimationFrame" will upload the "Y" texture like this:
gl.activeTexture(gl.TEXTURE0);
gl.bindTexture(gl.TEXTURE_2D, yTextureRef);
gl.texImage2D(gl.TEXTURE_2D, 0, gl.LUMINANCE, textureWidth, textureHeight, 0, gl.LUMINANCE, gl.UNSIGNED_BYTE, yData);
The second "requestAnimationFrame" will upload the "UV" texture in the same way, and make a draw call to the fragment shader doing the math between them.
But this doesn't change anything in the profiler. I still show nearly 0 gpu time on the frame that uploads the "Y" texture, and the same amount of time as before on the frame that uploads the "UV" texture.
However if I add a draw call to my "Y" texture upload function, then the profiler shows the expected results. Every frame has nearly half the gpu time.
From this I'm guessing the Y texture isn't really uploaded to the gpu using the texImage2d function.
However I don't really want to draw the Y texture on the screen as it doesn't have the correct UV texture to do anything with until a frame later. So is there any way to force the gpu to upload this texture without performing a draw call?
Update
I mis-understood the question
It really depends on the driver. The problem is OpenGL/OpenGL ES/WebGL's texture API really sucks. Sucks is a technical term for 'has unintended consequences'.
The issue is the driver can't really fully upload the data until you draw because it doesn't know what things you're going to change. You could change all the mip levels in any order and any size and then fix them all in between and so until you draw it has no idea which other functions you're going to call to manipulate the texture.
Consider you create a 4x4 level 0 mip
gl.texImage2D(
gl.TEXTURE_2D,
0, // mip level
gl.RGBA,
4, // width
4, // height
...);
What memory should it allocate? 4(width) * 4(height) * 4(rgba)? But what if you call gl.generateMipmap? Now it needs 4*4*4+2*2*4+1*1*4. Ok but now you allocate an 8x8 mip on level 3. You intend to then replace levels 0 to 2 with 64x64, 32x32, 16x16 respectively but you did level 3 first. What should it do when you replace level 3 before replacing the levels above those? You then add in levels 4 8x8, 5 as 4x4, 6 as 2x2, and 7 as 1x1.
As you can see the API lets you change mips in any order. In fact I could allocate level 7 as 723x234 and then fix it later. The API is designed to not care until draw time when all the mips must be the correct size at which point they can finally allocate memory on the GPU and copy the mips in.
You can see a demonstration and test of this issue here. The test uploads mips out of order to verify that WebGL implementations correctly fail with they are not all the correct size and correctly start working once they are the correct sizes.
You can see this was arguably a bad API design.
They added gl.texStorage2D to fix it but gl.texStorage2D is not available in WebGL1 only WebGL2. gl.texStorage2D has new issues though :(
TLDR; textures get uploaded to the driver when you call gl.texImage2D but the driver can't upload to the GPU until draw time.
Possible solution: use gl.texSubImage2D since it does not allocate memory it's possible the driver could upload sooner. I suspect most drivers don't because you can use gl.texSubImage2D before drawing. Still it's worth a try
Let me also add that gl.LUMIANCE might be a bottleneck as well. IIRC DirectX doesn't have a corresponding format and neither does OpenGL Core Profile. Both support a RED only format but WebGL1 does not. So LUMIANCE has to be emulated by expanding the data on upload.
Old Answer
Unfortunately there is no way to upload video to WebGL except via texImage2D and texSubImage2D
Some browsers try to make that happen faster. I notice you're using gl.LUMINANCE. You might try using gl.RGB or gl.RGBA and see if things speed up. It's possible browsers only optimize for the more common case. On the other hand it's possible they don't optimize at all.
Two extensions what would allow using video without a copy have been proposed but AFAIK no browser as ever implemented them.
WEBGL_video_texture
WEBGL_texture_source_iframe
It's actually a much harder problem than it sounds like.
Video data can be in various formats. You mentioned YUV but there are others. Should the browser tell the app the format or should the browser convert to a standard format?
The problem with telling is lots of devs will get it wrong then a user will provide a video that is in a format they don't support
The WEBGL_video_texture extensions converts to a standard format by re-writing your shaders. You tell it uniform samplerVideoWEBGL video and then it knows it can re-write your color = texture2D(video, uv) to color = convertFromVideoFormatToRGB(texture(video, uv)). It also means they'd have to re-write shaders on the fly if you play different format videos.
Synchronization
It sounds great to get the video data to WebGL but now you have the issue that by the time you get the data and render it to the screen you've added a few frames of latency so the audio is no longer in sync.
How to deal with that is out of the scope of WebGL as WebGL doesn't have anything to do with audio but it does point out that it's not as simple as just giving WebGL the data. Once you make the data available then people will ask for more APIs to get the audio and more info so they can delay one or both and keep them in sync.
TLDR; there is no way to upload video to WebGL except via texImage2D and texSubImage2D

Strange rendering behavior with transparent texture in WebGL

I've been writing a little planet generator using Haxe + Away3D, and deploying to HTML5/WebGL. But I'm having a strange issue when rendering my clouds. I have the planet mesh, and then the clouds mesh slightly bigger in the same position.
I'm using a perlin noise function to generate the planetary features and the cloud formations, writing them to a bitmap and applying the bitmap as the texture. Now, strangely, when I deploy this to iOS or C++/OSX, it renders exactly how I wanted it to:
Now, when I deploy to WebGL, it generates an identical diffuse map, but renders as:
(The above was at a much lower resolution, due to how often I was reloading the page. The problem persisted at higher resolutions.)
The clouds are there, and the edges look alright, wispy and translucent. But the inside is opaque and seemingly being rendered differently (each pixel is the same color, only the alpha channel is changed)
I realize this is likely something to do with how the code is ultimately compiled/generated in haxe, but I'm hoping it's something simple like a render setting or blending mode I'm not setting. But since I'm not even sure exactly what is happening, I wouldn't know where to look.
Here's the diffuse map being produced. I overlaid it on red so the clouds would be viewable.
Bitmapdata.perlinNoise does not work on html5.
You should implement it by yourself, or you could use pre-rendered image.
public function perlinNoise (baseX:Float, baseY:Float, numOctaves:UInt, randomSeed:Int, stitch:Bool, fractalNoise:Bool, channelOptions:UInt = 7, grayScale:Bool = false, offsets:Array = null):Void {
openfl.Lib.notImplemented ("BitmapData.perlinNoise");
}
https://github.com/openfl/openfl/blob/c072a98a3c6699f4d334dacd783be947db9cf63a/openfl/display/BitmapData.hx
Also, WebGL-Inspector is very useful for debugging WebGL apps. Have you used it?
http://benvanik.github.io/WebGL-Inspector/
Well, then, did you upload that image from ByteArray?
Lime once allowed access ByteArray with array index operator, even though it shouldn't on js. This is fixed in the lastest version of Lime to avoid mistakes.
I used __get and __set method instead of [] to access a byte array.
Away3d itself might be the cause of this issue too, because the code of backend is generated from different source files depending on the target you use.
For example, byteArrayOffset parameter of Texture.uploadFromByteArray is supported on html5, but not on native.
If away3d is the cause of the problem, which part of the code is causing the problem? I'm not sure for now.
EDIT: I've also experienced a problem with OpenFL's latest WebGL backend. I think legacy OpenFL doesn't have this problem. OpenFL's sprite renderer was changing colorMask (and possibly other OpenGL render states) without my knowledge! This problem occured because my code and OpenFL's sprite renderer was actually using the same OpenGL context. I got rid of this problem by manually disabling OpenFL's sprite renderer.

Large images with Direct2D

Currently I am developing application for the Windows Store which does real time-image processing using Direct2D. It must support various sizes of images. The first problem I have faced is how to handle the situations when the image is larger than the maximum supported texture size. After some research and documentation reading I found the VirtualSurfaceImageSource as a solution. The idea was to load the image as IWICBitmap then to create render target with CreateWICBitmapRenderTarget (which as far as I know is not hardware accelerated). After some drawing operations I wanted to display the result to the screen by invalidating the corresponding region in the VirtualSurfaceImage source or when the NeedUpdate callback fires. I supposed that it is possible to do it by creating ID2D1Bitmap (hardware accelerated) and to call CopyFromRenderTarget with the render target created with CreateWICBitmapRenderTarget and the invalidated region as bounds, but the method returns D2DERR_WRONG_RESOURCE_DOMAIN as a result. Another reason for using IWICBitmap is one of the algorithms involved in the application which must have access to update the pixels of the image.
The question is why this logic doesn't work? Is this the right way to achieve my goal using Direct2D? Also as far as the render target created with CreateWICBitmapRenderTarget is not hardware accelerated if I want to do my image processing on the GPU with images larger than the maximum allowed texture size which is the best solution?
Thank you in advance.
You are correct that images larger than the texture limit must be handled in software.
However, the question to ask is whether or not you need that entire image every time you render.
You can use the hardware accel to render a portion of the large image that is loaded in a software target.
For example,
Use ID2D1RenderTarget::CreateSharedBitmap to make a bitmap that can be used by different resources.
Then create a ID2D1BitmapRenderTarget and render the large bitmap into that. (making sure to do BeginDraw, Clear, DrawBitmap, EndDraw). Both the bitmap and the render target can be cached for use by successive calls.
Then copy from that render target into a regular ID2D1Bitmap with the portion that will fit into the texture memory using the ID2D1Bitmap::CopyFromRenderTarget method.
Finally draw that to the real render target, pRT->DrawBitmap

iOS: playing a frame-by-frame greyscale animation in a custom colour

I have a 32 frame greyscale animation of a diamond exploding into pieces (ie 32 PNG images # 1024x1024)
my game consists of 12 separate colours, so I need to perform the animation in any desired colour
this I believe rules out any Apple frameworks, also it rules out a lot of public code for animating frame by frame in iOS.
what are my potential solution paths?
these are the best SO links I have found:
Faster iPhone PNG Animations
frame by frame animation
Is it possible using video as texture for GL in iOS?
that last one just shows it is may be possible to load an image into a GL texture each frame ( he is doing it from the camera, so if I have everything stored in memory, that should be even faster )
I can see these options ( listed laziest first, most optimised last )
option A
each frame (courtesy of CADisplayLink), load the relevant image from file into a texture, and display that texture
I'm pretty sure this is stupid, so onto option B
option B
preload all images into memory
then as per above, only we load from memory rather than from file
I think this is going to be the ideal solution, can anyone give it the thumbs up or thumbs down?
option C
preload all of my PNGs into a single GL texture of the maximum size, creating a texture Atlas. each frame, set the texture coordinates to the rectangle in the Atlas for that frame.
while this is potentially a perfect balance between coding efficiency and performance efficiency, the main problem here is losing resolution; on older iOS devices maximum texture size is 1024x1024. if we are cramming 32 frames into this ( really this is the same as cramming 64 ) we would be at 128x128 for each frame. if the resulting animation is close to full screen on the iPad this isn't going to hack it
option D
instead of loading into a single GL texture, load into a bunch of textures
moreover, we can squeeze 4 images into a single texture using all four channels
I baulk at the sheer amount of fiddly coding required here. My RSI starts to tingle even thinking about this approach
I think I have answered my own question here, but if anyone has actually done this or can see the way through, please answer!
If something higher performance than (B) is needed, it looks like the key is glTexSubImage2D http://www.opengl.org/sdk/docs/man/xhtml/glTexSubImage2D.xml
Rather than pull across one frame at a time from memory, we could arrange say 16 512x512x8-bit greyscale frames contiguously in memory, send this across to GL as a single 1024x1024x32bit RGBA texture, and then split it within GL using the above function.
This would mean that we are performing one [RAM->VRAM] transfer per 16 frames rather than per one frame.
Of course, for more modern devices we could get 64 instead of 16, since more recent iOS devices can handle 2048x2048 textures.
I will first try technique (B) and leave it at that if it works ( I don't want to over code ), and look at this if needed.
I still can't find any way to query how many GL textures it is possible to hold on the graphics chip. I have been told that when you try to allocate memory for a texture, GL just returns 0 when it has run out of memory. however to implement this properly I would want to make sure that I am not sailing close to the wind re: resources... I don't want my animation to use up so much VRAM that the rest of my rendering fails...
You would be able to get this working just fine with CoreGraphics APIs, there is no reason to deep dive into OpenGL for a simple 2D problem like this. For the general approach you should take to creating colored frames from a grayscale frame, see colorizing-image-ignores-alpha-channel-why-and-how-to-fix. Basically, you need to use CGContextClipToMask() and then render a specific color so that what is left is the diamond colored in with the specific color you have selected. You could do this at runtime, or you could do it offline and create 1 video for each of the colors you want to support. It is be easier on your CPU if you do the operation N times and save the results into files, but modern iOS hardware is much faster than it used to be. Beware of memory usage issues when writing video processing code, see video-and-memory-usage-on-ios-devices for a primer that describes the problem space. You could code it all up with texture atlases and complex openGL stuff, but an approach that makes use of videos would be a lot easier to deal with and you would not need to worry so much about resource usage, see my library linked in the memory post for more info if you are interested in saving time on the implementation.

Antialiasing/Multisampling in D3D9

I'm writing a 3d modeling application in D3D9 that I'd like to make as broadly compatible as possible. This means using few hardware-dependent features, i.e. multisampling. However, while the realtime render doesn't need to be flawless, I do need to provide nice-looking screen captures, which without multisampling, look quite aliased and poor.
To produce my screen captures, I create a temporary surface in memory, render the scene to it once, then save it to a file. My first thought of how I could achieve an antialiased capture was to create my off-screen stencilsurface as multisampled, but of course DX wouldn't allow that since the device itself had been initialized with D3DMULTISAMPLE_NONE.
To start off, here's a sample of exactly how I create the screencapture. I know that it'd be simpler to just save the backbuffer of an already-rendered frame, however I need the ability to save images of dimension different than the actual render window - which is why I do it this way. Error checking, code for restoring state, and releasing resource are ommitted here for brevity. m_d3ddev is my LPDIRECT3DDEVICE9.
//Get the current pp
LPDIRECT3DSWAPCHAIN9 sc;
D3DPRESENT_PARAMETERS pp;
m_d3ddev->GetSwapChain(0, &sc);
sc->GetPresentParameters(&pp);
//Create a new surface to which we'll render
LPDIRECT3DSURFACE9 ScreenShotSurface= NULL;
LPDIRECT3DSURFACE9 newDepthStencil = NULL;
LPDIRECT3DTEXTURE9 pRenderTexture = NULL;
m_d3ddev->CreateDepthStencilSurface(_Width, _Height, pp.AutoDepthStencilFormat, pp.MultiSampleType, pp.MultiSampleQuality, FALSE, &newDepthStencil, NULL );
m_d3ddev->SetDepthStencilSurface( newDepthStencil );
m_d3ddev->CreateTexture(_Width, _Height, 1, D3DUSAGE_RENDERTARGET, pp.BackBufferFormat, D3DPOOL_DEFAULT, &pRenderTexture, NULL);
pRenderTexture->GetSurfaceLevel(0,&ScreenShotSurface);
//Render the scene to the new surface
m_d3ddev->SetRenderTarget(0, ScreenShotSurface);
RenderFrame();
//Save the surface to a file
D3DXSaveSurfaceToFile(_OutFile, D3DXIFF_JPG, ScreenShotSurface, NULL, NULL);
You can see the call to CreateDepthStencilSurface(), which is where I was hoping I could replace pp.MultiSampleType with i.e. D3DMULTISAMPLE_4_SAMPLES, but that didn't work.
My next thought was to create an entirely different LPDIRECT3DDEVICE9 as a D3DDEVTYPE_REF, which always supports D3DMULTISAMPLE_4_SAMPLES (regardless of the video card). However, all of my resources (meshes, textures) have been loaded into m_d3ddev, my HAL device, thus I couldn't use them for rendering the scene under the REF device. Note that resources can be shared between devices under Direct3d9ex (Vista), but I'm working on XP. Since there are quite a lot of resources, reloading everything to render this one frame, then unloading them, is too time-inefficient for my application.
I looked at other options for antialiasing the image post-capture (i.e. 3x3 blur filter), but they all generated pretty crappy results, so I'd really like to try and get an antialiased scene right out of D3D if possible....
Any wisdom or pointers would be GREATLY appreciated...
Thanks!
Supersampling by either rendering to a larger buffer and scaling down or combining jittered buffers is probably your best bet. Combining multiple jittered buffers should give you the best quality for a given number of samples (better than the regular grid from simply rendering an equivalent number of samples at a multiple of the resolution and scaling down) but has the extra overhead of multiple rendering passes. It has the advantage of not being limited by the maximum supported size of your render target though and allows you to choose pretty much an arbitrary level of AA (though you'll have to watch out for precision issues if combining many jittered buffers).
The article "Antialiasing with Accumulation Buffer" at opengl.org describes how to modify your projection matrix for jittered sampling (OpenGL but the math is basically the same). The paper "Interleaved Sampling" by Alexander Keller and Wolfgang Heidrich talks about an extension of the technique that gives you a better sampling pattern at the expense of even more rendering passes. Sorry about not providing links - as a new user I can only post one link per answer. Google should find them for you.
If you want to go the route of rendering to a larger buffer and down sampling but don't want to be limited by the maximum allowed render target size then you can generate a tiled image using off center projection matrices as described here.
You could always render to a texture that is twice the width and height (ie 4x the size) and then supersample it down.
Admittedly you'd still get problems if the card can't create a texture 4x the size of the back buffer ...
Edit: There is another way that comes to mind.
If you repeat the frame n-times with tiny jitters to the view matrix you will be able to generate as many images as you like which you can then add together afterwards to form a very highly anti-aliased image. The bonus is, it will work on any machine that can render the image. It is, obviously, slower though. Still 256xAA really does look good when you do this!
This article http://msdn.microsoft.com/en-us/library/bb172266(VS.85).aspx seems to imply that you can use the render state flag D3DRS_MULTISAMPLEANTIALIAS to control this. Can you create your device with antialiasing enabled but turn it off for screen rendering and on for your offscreen rendering using this render state flag?
I've not tried this myself though.

Resources