How to upload multiple images pixel data into one texture in DX12 - directx

I have multiple images in forms of pixel arrays. I want to paste them into one big texture and then render the whole texture. For example, I have a car pixel data and a jet pixel data. I want to copy the pixel data of jet at the top of the big texture and copy the pixel data of the car at the bottom of the big texture.
My idea is creating a big buffer and copy these pixel data manually. Manually means calculate the offset of the beginning pixel of each row and copy rows in a loop. Then submit the combined pixel data to GPU. However, I think the method is inefficient as CPU is doing the loop. So I am wonderring is there any other way can improve this. For example, any D3D function already enable me to do the similar thing and effcient. On the other hand, what the most formal/correct way to do such thing?

How have you been uploading your texture data so far? Probably UpdateSubresources from d3dx12.h. Did you try to implement something similar to this function? You can peek in the source - it does roughly the following:
Map intermediate resource (D3D12_HEAP_TYPE_UPLOAD resource created by you),
A sequence of memcpy() to copy pixel data to mapped intermediate resource
Unmap
GPU copy from intermediate to final resource (D3D12_HEAP_TYPE_DEFAULT resource created by you).
You should do something similar and add required offset calculations to step 2, so the subimages are properly laid out. There isn't any more "formal/correct" way to do this, sometimes you have to do some CPU work.

Related

Can I process an image row- (or column-)wise using Metal?

I'm trying to create an implementation of seam carving which will run on the GPU using Metal. The dynamic programming part of the algorithm requires the image to be processed row-by-row (or column-by-column), so it's not perfectly suited to GPU I know, but I figure with images potentially thousands of pixels wide/tall, it will still benefit from parallelisation.
My two ideas were either
write a shader that uses 2D textures, but ensure that Metal computes over the image in the correct order, finishing one row before starting the next
write a shader using 1D textures, then manually pass in each row of the image to compute; ideally creating a view into the 2D textures rather than having to copy the data into separate 1D textures
I am still new to Metal and Swift, so my naïve attempts at both of these did not work. For option 1, I tried to dispatch threads using a threadgroup of Nx1x1, but the resulting texture just comes back all zeros – besides I am not convinced this is even right in theory, even if I tell it to use a threadgroup of height one, I'm not sure I can guarantee it will start on the first row. For option 2, I simply couldn't find a nice way to create a 1D view into a 2D texture by row/column – the documentation seems to suggest that Metal does not like giving you access to the underlying data, but I wouldn't be surprised if there was a way to do this.
Thanks for any ideas!

Direct2D – Drawing rectangeles and circles to large images and saving to disk

My task is to draw a lot of simple geometric figures like rectangles and circles to large black-and-white images (about 4000x6000 pixels in size) and save the result to both, bitmap-files and a binary array representing each pixel as 1 if drawn or 0 otherwise. I was using GDI+ (=System.Drawing). Since this, however, took too long, I started having a look at Direct2D. I quickly learned how to draw to a Win32-window and thought I could use this to draw to a bitmap instead.
I learned how to load an image and display it here: https://msdn.microsoft.com/de-de/library/windows/desktop/ee719658(v=vs.85).aspx
But I could not find information on how to create a large ID2D1Bitmap and render to it.
How can I create a render target (must that be a ID2D1HwndRenderTarget?) associated with such a newly created (how?) big bitmap and draw rectangles and circles to it and save it to file, afterwards?
Thank You very much for showing me the right direction,
Jürgen
If I was to do it, I would roll my own code instead of using GDI or DirectX calls. The structure of a binary bitmap is very simple (packed array of bits), and once you have implemented a function to set a single pixel and one to draw a single run (horizontal line segment), drawing rectangles and circles comes easily.
If you don't feel comfortable with bit packing, you can work with a byte array instead (one pixel per byte), and convert the whole image in the end.
Writing the bitmap to a file is also not a big deal once you know about the binary file I/O operations (and you will find many ready-made functions on the Web).
Actually, when you know the specs of the layout of the bitmap file data, you don't need Windows at all.

Efficiently Slice Image on GPU

I need to slice an image into N tiles (rects might be located anywhere and overlap), where N could potentially be quite huge. Since CGImage operates on the CPU and this is a performance critical operation that happens several times per second I was wondering if there is a faster way to do this on the GPU.
What's the fastest possible solution to slice an image (possibly using the GPU)?
PS: If it helps in any way, the image is only grayscale (array of floats between 0 and 1). It doesn't have to be an CGImage/UIImage, a float array suffices.
Since slicing images is basically just copying chunks of the image to a new image there is not really a way to speed up that process. Depending on what you are doing with the slices you might be able to get away with not copying the data. If you keep only the coordinates of your slices you can access the underlying storage of your original image.

How can I write a histogram-like kernel filter for CoreImage?

In the docs for Kernel Routine Rules, it says 'A kernel routine computes an output pixel by using an inverse mapping back to the corresponding pixels of the input images. Although you can express most pixel computations this way—some more naturally than others—there are some image processing operations for which this is difficult, if not impossible. For example, computing a histogram is difficult to describe as an inverse mapping to the source image.'
However, apple obviously is doing it somehow because they do have a CIAreaHistogram Core Image Filter that does just that.
I can see one theoretical way to do it with the given limitations:
Lets say you wanted a 256 element red-channel histogram...
You have a 256x1 pixel output image. The kernel function gets called for each of those 256 pixels. The kernel function would have to read EVERY PIXEL IN THE ENTIRE IMAGE each time its called, checking if that pixel's red value matches that bucket and incrementing a counter. When its processed every pixel in the entire image for that output pixel, it divides by the total number of pixels and sets that output pixel value to that calculated value. The problem is, assuming it actually works, this is horribly inefficient, since every input pixel is accessed 256 times, although every output pixel is written only once.
What would be optimal would be a way for the kernel to iterate over every INPUT pixel, and let us update any of the output pixels based on that value. Then the input pixels would each be read only once, and the output pixels would be read and written a total of (input width)x(input height) times altogether.
Does anyone know of any way to get this kind of filter working? Obviously there's a filter available from apple for doing a histogram, but I need it for doing a more limited form of histogram. (For example, a blue histogram limited to samples that have a red value in a given range.)
The issue with this is that custom kernel code in Core Image works like a function which remaps pixel by pixel. You don't actually have a ton of information to go off of except for the pixel that you are currently computing. A custom core image filter sort of goes like this
for i in 1 ... image.width
for j in 1 ... image.height
New_Image[i][j] = CustomKernel(Current_Image[i][j])
end
end
So actually, it's not really plausible to make your own histogram via custom kernels, because you literally do not have any control over the new image other than in that CustomKernel function that has been made. This is actually one of the reasons that CIImageProcessor was created for iOS10, you probably would have an easier time making a histogram via that function(and also producing other cool affects via image processing), and I suggest checking out the WWDC 2016 video on it ( Raw images and live images session).
IIRC, if you really want to make a histogram, it is still possible, but you will have to work with the UIImage version, and then convert the resulting image to an RBG image for which you can do the counting, and storing them in bins. I would recommend Simon Gladman's book on this, as he has a chapter devoted to histograms, but there is a lot more that goes into the core image default version because they have MUCH more control over the image than we do using the framework.

How should I optimize drawing a large, dynamic number of collections of vertices?

...or am I insane to even try?
As a novice to using bare vertices for 3d graphics, I haven't ever worked with vertex buffers and the like before. I am guessing that I should use a dynamic buffer because my game deals with manipulating, adding and deleting primitives. But how would I go about doing that?
So far I have stored my indices in a Triangle.cs class. Triangles are stored in Quads (which contain the vertices that correspond to their indices), quads are stored in blocks. In my draw method, I iterate through each block, each quad in each block, and finally each triangle, apply the appropriate texture to my effect, then call DrawUserIndexedPrimitives to draw the vertices stored in the triangle.
I'd like to use a vertex buffer because this method cannot support the scale I am going for. I am assuming it to be dynamic. Since my vertices and indices are stored in a collection of separate classes, though, can I still effectively use a buffer? Is using separate buffers for each quad silly (I'm guessing it is)? Is it feasible and effective for me to dump vertices into the buffer the first time a quad is drawn and then store where those vertices were so that I can apply that offset to that triangle's indices for successive draws? Is there a feasible way to handle removing vertices from the buffer in this scenario (perhaps event-based shifting of index offsets in triangles)?
I apologize that these questions may be either far too novicely or too confusing/vague. I'd be happy to provide clarification. But as I've said, I'm new to this and I may not even know what I'm talking about...
I can't exactly tell what you're trying to do, but using a seperate buffer for every quad is very silly.
The golden rule in graphics programming is batch, batch, batch. This means to pack as much stuff into a single DrawUserIndexedPrimitives call as possible, your graphics card will love you for it.
In your case, put all of your verticies and indicies into one vertex buffer and index buffer (you might need to use more, I have no idea how many verticies we're talking about). Whenever the user changes one of the primatives, regenerate the entire buffer. If you really have a lot of primatives, split them up into multiple buffers and on only regenerate the ones you need when the user changes something.
The most important thing is to minimize the amount of 'DrawUserIndexedPrimitives' calls, those things have a lot of overhead, you could easily make your game on the order of 20x faster.
Graphics cards are pipelines, they like being given a big chunk of data for them to eat away at. What you're doing by giving it one triangle at a time is like forcing a large-scale car factory to only make one car at a time. Where they can't start on building the next car before the last one is finished.
Anyway good luck, and feel free to ask any questions.

Resources