Calculating Irradiance Using Spherical Harmonics from Cubemap into a Cubemap? - directx

I'm trying to figure out how to make an irradiance map using spherical harmonics, but i want the output to be a cubemap rather than just rendering an object. There is the (deprecated) D3DX11SHProjectCubeMap but even then i don't know how it'd be used. It seems the output is just an array of colors, but it doesn't specify any details of how the function works or what the output is. Does it just output some sort of image that i'd then need to convert to a cubemap?

Related

Using CoreML to infer on Metal Texture subregions

I am building an iOS app that renders frames from the camera to Metal Textures in real time. I want to use CoreML to perform style transfer on subregions of the metal texture (imagine the camera output as a 2x2 grid, where each of the 4 squares is used as input to a style transfer network, and the output pasted back into the displayed texture). I am trying to figure out how to best use CoreML inside of a Metal pipeline to fill non-overlapping subregions of the texture with the output of the mlmodel(hopefully without decomposing the mlmodel into an MPSNNGraph). Is it possible to feed a MTLTexture or MTLBuffer to a coreML model directly? I'd like to avoid format conversions as much as possible (for speed).
My mlmodel takes CVPixelBuffers at its inputs and outputs. Can it be made to take MTLTextures instead?
The first thing I tried was: cutting the given sample buffer into subregions (by copying the pixel data ugh), inferring on each subregion, and then pasting them together into a new sample buffer which was then turned into a MTLTexture and displayed. This approach did not take advantage of metal at all, as the textures were not created until after inference. It also had alot of circuitous conversions/copy/paste operations that slow everything down.
The second thing I tried was: send the camera data to the MTLTexture directly, infer on subregions of the sample buffer, paste into the current displayed texture with MTLTexture.replace(region:...withBytes:) for each subregion. However, MTLTexture.replace() uses the cpu and is not fast enough for live video.
The idea I am about to try is: convert my mlmodel to an MPSNNGraph, get frames as textures, use the MPSNNGraph for inference on subregions, and display the output. I figured i'd check here before going through all of the effort of converting the mlmodel first though. Sorry if this is too broad, I mainly work in tensorflow and am a bit out of my depth here.

opencv: undistort part of the image

I am trying to understand how to apply cv2.undistort function only on a subset of the image.
Camera calibration was done through cv2.findChessboardCorner and it seems to be working fine. I find that undistortion, however, is very slow, averaging around 9 fps on a 1080x1920 image. For the purpose of the project, I am interested only in the fixed subset of image, usually something like img[100:400].
What is the good way to approach this problem? It seems to be wasteful to undistort entire image only when stripe of 100 pixels is needed.
From the docs:
The function is simply a combination of cv::initUndistortRectifyMap (with unity R ) and cv::remap (with bilinear interpolation). See the former function for details of the transformation being performed.
So by calling undistort in a loop you are recomputing the un-distortion maps over and over - there is no caching, and their computation is expensive, since it involves solving a polynomial equation for every pixel. IIUC your calibration is fixed, so you should compute them only once using initUndistortRectifyMap(), and then pass the maps to remap() in your main loop.
The kind of "cropping" you describe is doable, but it may require some work and little experimentation because the undistortion maps used by OpenCV are in 1:1 correspondence with the un-distorted image, and store pixel (vector) offsets. This means that you need to crop out the maps portions corresponding to the rectangle of the image you care about, and then edit their values, which are x and y offsets from the un-distorted to the distorted image. Specifically, you need to apply to them the inverse of the 2D translation that brings that rectangle you care about to the top-left corner of the image canvas. Then you should be able to call remap() passing as destination image just an empty image the size of the undistorted cropped rectangle.
All in all, I'd first try the first recommendation (don't call undistort, separate map generation from remapping), and only try the second step if really can't keep up with the frame rate.

Is it possible to get the actual value of a vertex?

I was trying to recover some vertex data from vertex shader, but I haven't found any relevant information about this on the internet.
I'm using the vertex shader to calculate my vertex positions using the GPU, but I need to get the results for the logic of my application in Javascript. Is there a possible way to do this without calculating it in Javascript too?
In WebGL2 you can use transform feedback (as Pauli suggests) and you can read back the data with getBufferSubData although ideally, if you're just going to use the data in another draw call you should not read it back as readbacks are slow.
Transform feedback simply means your vertex shader can write its output to a buffer.
In WebGL1 you could do it by rendering your vertices to a floating point texture attached to a framebuffer. You'd include a vertex id attribute with each vertex. You'd use that attribute to set gl_Position. You'd draw with gl.POINT. It would allow you to render to each individual pixel in the output texture effectively letting you get transform feedback. The difference being your result would end up in a texture instead of a buffer. You can kind of see a related example of that here
If you don't need the values back in JavaScript then you can just use the texture you just wrote to as input to future draw calls. If you do need the values back in JavaScript you'll have to first convert the values from floating point into a readable format (using a shader) and then read the values out using gl.readPixel
Transform feedback is OpenGL way to return vertex processing results back to application code. But that is only available with webgl 2. Transform feedback also outputs primitives instead of vertices making it unlikely to be perfect match.
A newer alternative is image load store and shader storage buffer objects. But I think those are missing from webgl 2 too.
In short you either need to calculate same data in javascript or move your application logic to shaders. If you need transformed vertex data for coalition detection you could use bounding box testing and do vertex level transformation only when bounding box hits.
You could use multi level bounding boxes where you have one big box around whole object and then next bounding box level that splits object in to small parts like separate box for each disjoint part in body (for instance, split in knee and ankle in legs). That way javascript mainly only transform single bounding box/sphere for every object in every frame. Only transform second level boxes when objects are near. Then do per vertex transformation only when objects are very close to touch.

Image Registration by Manual marking of corresponding points using OpenCV

I have a processed binary image of dimension 300x300. This processed image contains few object(person or vehicle).
I also have another RGB image of the same scene of dimensiion 640x480. It is taken from a different position
note : both cameras are not the same
I can detect objects to some extent in the first image using background subtraction. I want to detect corresponding objects in the 2nd image. I went through opencv functions
getAffineTransform
getPerspectiveTransform
findHomography
estimateRigidTransform
All these functions require corresponding points(coordinates) in two images
In the 1st binary image, I have only the information that an object is present,it does not have features exactly similar to second image(RGB).
I thought conventional feature matching to determine corresponding control points which could be used to estimate the transformation parameters is not feasible because I think I cannot determine and match features from binary and RGB image(am I right??).
If I am wrong, what features could I take, how should I proceed with Feature matching, find corresponding points, estimate the transformation parameters.
The solution which I tried more of Manual marking to estimate transformation parameters(please correct me if I am wrong)
Note : There is no movement of both cameras.
Manually marked rectangles around objects in processed image(binary)
Noted down the coordinates of the rectangles
Manually marked rectangles around objects in 2nd RGB image
Noted down the coordinates of the rectangles
Repeated above steps for different samples of 1st binary and 2nd RGB images
Now that I have some 20 corresponding points, I used them in the function as :
findHomography(src_pts, dst_pts, 0) ;
So once I detect an object in 1st image,
I drew a bounding box around it,
Transform the coordinates of the vertices using the above found transformation,
finally draw a box in 2nd RGB image with transformed coordinates as vertices.
But this doesnt mark the box in 2nd RGB image exactly over the person/object. Instead it is drawn somewhere else. Though I take several sample images of binary and RGB and use several corresponding points to estimate the transformation parameters, it seems that they are not accurate enough..
What are the meaning of CV_RANSAC and CV_LMEDS option, ransacReprojecThreshold and how to use them?
Is my approach good...what should I modify/do to make the registration accurate?
Any alternative approach to be used?
I'm fairly new to OpenCV myself, but my suggestions would be:
Seeing as you have the objects identified in the first image, I shouldn't think it would be hard to get keypoints and extract features? (or maybe you have this already?)
Identify features in the 2nd image
Match the features using OpenCV FlannBasedMatcher or similar
Highlight matching features in 2nd image or whatever you want to do.
I'd hope that because all your features in the first image should be positives (you know they are the features you want), then it'll be relatively straight forward to get accurate matches.
Like I said, I'm new to this so the ideas may need some elaboration.
It might be a little late to answer this and the asker might not see this, but if the 1st image is originally a grayscale then this could be done:
1.) 2nd image ----> grayscale ------> gray2ndimg
2.) Point to Point correspondences b/w gray1stimg and gray2ndimg by matching features.

WebGL - Building objects with block

Im trying to build some text using blocks, which I intend to customize later on. The attached image is a mockup of what i intend to do.
I was thinking of using WebGL, since I want to do it in 3D and I cant do any flash, but Im not sure how to contruct the structure of cubes from the letters. Can anyone give me a suggestion or a technique to map text to a series of points so that seen from far aside they draw that same text?
First, you need a font — a table of shapes for the characters — in a format you can read from your code. Do you already have one? If it's just a few letters, you could manually create polygons for each character.
Then, use a rasterization algorithm to convert the character shape into an array of present-or-absent points/cubes. If you have perfectly flat text, then use a 2D array; if your “customizations” will create depth effects then you will want a 3D array instead (“extruding” the shape by writing it identically into multiple planes of the array).
An alternative to the previous two steps, which is appropriate if your text does not vary at runtime, is to first create an image with your desired text on it, then use the pixels of the image as the abovementioned 2D array. In the browser, you can do this by using the 2D Canvas feature to draw an image onto a canvas and then reading the pixels out from it.
Then to produce a 3D shape from this voxel array, construct a polygon face for every place in the array where a “present” point meets an “absent” point. If you do this based on pairs of neighbors, you get a chunky pixel look (like Minecraft). If you want smooth slopes (like your example image), then you need a more complex technique; the traditional way to produce a smooth surface is marching cubes (but just doing marching cubes will round off all your corners).

Resources