IDirectDrawSurface7::Blt stretch method - directx

Which stretching method is used when source and destination surfaces of IDirectDrawSurface7::Blt method have different sizes ? Obviously when destination rectangle is smaller than source we can calculate destination pixel color by various methods - randomly choosing one of neighbouring pixels, calculating arithmetic mean or weighted arithmetic mean of neighbouring pixels. When destination rectangle is larger than source we can use linear interpolation, bicubic interpolation, lanczos method. How can I influence the quality of output when calling Blt method ? Its ideal for source and destination to have same sizes but sometimes its not possible. When doing 2D graphics for example I have png image on disk only for one supported screen resolution.

Whether or not Blt does any filtering when the source and destination rectangles aren't the same size is hardware dependent. What algorithm, if any, is used is hardware dependent. There's very little control over what happens, although you can request that the output be filtered in the Y direction.
From the July 2000 Platform SDK DirectX documentation, "Blitting with Blt" topic:
Hardware acceleration for scaling depends on the DDFXCAPS_BLT* flags in the dwFXCaps member of the DDCAPS structure for the device. If, for example, a device has the DDFXCAPS_BLTSTRETCHXN capability but not DDFXCAPS_BLTSTRETCHX, it can assist when the x-axis of the source rectangle is being multiplied by a whole number but not when non-integral (arbitrary) scaling is being done.
Devices might also support arithmetic scaling, which is scaling by interpolation rather than simple multiplication or deletion of pixels. For instance, if an axis was being increased by one-third, the pixels would be recolored to provide a closer approximation to the original image than would be produced by the doubling of every third pixel on that axis.
Applications cannot control the type of scaling done by the driver, except by setting the DDBLTFX_ARITHSTRETCHY flag in the dwDDFX member of the DDBLTFX structure passed to Blt. This flag requests that arithmetic stretching be done on the y-axis. Arithmetic stretching on the x-axis and arithmetic shrinking are not currently supported in the DirectDraw API, but a driver may perform them by default.
If you want more control over how the scaling is performed you either going to have to scale it yourself, or use Direct3D.

Related

Removing sun light reflection from images of IR camera in realtime OpenCV application

I am developing speed estimation and vehicle counting application with OpencV and I use IR camera.
I am facing a problem of sun light reflection which causes vertical white region or lines in the images and has bad effect on my vehicle detections.
I want an approach with very high speed, because it is a real-time application.
The vertical streak defect in those images is called "blooming", happens when the one or a few wells in a CCD saturate to the point that they spill charge over neighboring wells in the same column. In addition, you have "regular" saturation with no blooming around the area of the reflection.
If you can, the best solution is to control the exposure (faster shutter time, or close lens iris if you have one). This will reduce but not eliminate blooming occurrence.
Blooming will always occur in a constant direction (vertical or horizontal, depending on your image orientation), and will normally fill entirely one or few contiguous columns. So you can cheaply detect it by heavily subsampling in the opposite dimension and looking for maxima that repeat in the same column. E.g., in you images, you could look for saturated maxima in the same column over 10 rows or so spread over the image height.
Once you detect the blooming columns, you can follow them in a small band around them to try to locate the saturated area. Note that saturation does not necessarily imply values at the end of the dynamic range (e.g. 255 for 8-bit image). Your sensor could be completely saturated at values that the A/D conversion assign at, say, 252. Saturation simply means that the image response becomes constant with respect to the input luminance.
The easiest solution (to me) is a hardware solution. If you can modify the physical camera setup add a polarizing filter to the lens of the camera. You don't even need a(n expensive) camera specific lens, adding a simple sheet of polarized film is good enough Here is one site I just googled "polarizing film" You will have to play with the orientation, but with this mounted position most surfaces are at the same angle and glare will be polarized near horizontal. So you should find a position that works well in most situations.
I've used this method before and the best part is it adds no extra algorithmic complexity or lag. Especially for mounted cameras where all surfaces are at nearly the same angle. This won't help you process the images you currently have but it will help in processing and acquiring future images.

Why supersampling is not widely used for image scaling?

I look for an appropriate image scaling algorithm and wondered why supersampling is not as popular as bicubic, bilinear or even lanczos.
By supersampling I mean a method that divides the source image into equal rectangles, each rectangle corresponding to a pixel in the destination image. In my opinion, this is the most natural and accurate method. It takes into account all pixels of the source image, while bilinear might skip some pixels. As far as I can see, the quality is also very high, comparable with lanczos.
Why do popular image libraries (such as GraphicsMagic, GD or PIL) not implement this algorithm? I found realizations only in Intel IPP and AMD Framewave projects. I know at least one disadvantage: it can only be used for downscaling, but am I missing something else?
For comparison, this is a 4.26x scaled down image. From left to right: GraphicsMagic Sinc filter (910ms), Framewave Super method (350ms), GraphicsMagic Triangle filter (320ms):
Now I know the answer. Because pixel is not a little square. And that is why supersampling resizing gives aliased result. This can be seen on thin water jets on sample image. This is not fatal and supersampling can be used for scaling to 2x, 3x and so on to dramatically reduce picture size before resize to exact dimensions with another method. This technique is used in jpeglib to open images in smaller size.
Of course we still can think about pixels as squares and actually, GD library does. It's imagecopyresampled is true supersampling.
You are a bit mistaken (when saying that linear rescaling misses pixels). Assuming You are rescaling the image by at most factor of 2, Bilinear interpolation takes into account all the pixels of the source image. If you smooth the image a bit and use bilinear interpolation this gives you high quality results. For most practical cases even bi-qubic interpolation is not needed.
Since bi-linear interpolation is extremely fast (can be easily executed in fixed point calculations) it is by far the best image rescaling algorithm when dealing with real time processing.
If you intend to shrink the image by more than factor of 2 than bilinear interpolation is mathematically wrong and with larger factors even bi-cubic starts to make mistakes. That is why in image processing software (like photoshop) we use better algorithms (yet much more CPU demanding).
The answer to your question is speed consideration.
Given the speed of your CPU/GPU, the image size and desired frame rate you can easily compute how many operations you can do for every pixel. For example - with 2GHZ CPU and 1[Gpix] image size, you can only make few calculations for each pixel every second.
Given the amount of allowed calculations - you select the best algorithms. So the decision is usually not driven by image quality but rather by speed considerations.
Another issue about super sampling - Sometimes if you do it in frequency domain, it works much better. This is called frequency interpolation. But you will not want to calculate FFT just for rescaling an image.
Moreover - I don't know if you are familiar with back projection. This is a way to interpolate the image from destination to source instead of from source to destination. Using back projection you can enlarge the image by a factor of 10, use bilinear interpolation and still be mathematically correct.
Computational burden and increased memory demand is most likely the answer you are looking for. That's why adaptive supersampling was introduced which compromises between burden/memory demand and effectiveness.
I guess supersampling is still too heavy even for today's hardware.
Short answer: They are super-sampling. I think the problem is terminology.
In your example, you are scaling down. This means decimating, not interpolating. Decimation will produce aliasing if no super-sampling is used. I don't see aliasing in the images you posted.
A sinc filter involves super-sampling. It is especially good for decimation because it specifically cuts off frequencies above those that can be seen in the final image. Judging from the name, I suspect the triangle filter also is a form of super-sampling. The second image you show is blurry, but I see no aliasing. So my guess is that it also uses some form of super-sampling.
Personally, I have always been confused by Adobe Photoshop, which asks me if I want "bicubic" or "bilinear" when I am scaling. But Bilinear, Bicubic, and Lanczos are interpolation methods, not decimation methods.
I can also tell you that modern video games also use super-sampling. Mipmapping is a commonly-used shortcut to realtime decimation by pre-decimating individual images by powers of two.

Image Segmentation/Background Subtraction

My current project is to calculate the surface area of the paste covered on the cylinder.
Refer the images below. The images below are cropped from the original images taken via a phone camera.
I am thinking terms like segmentation but due to the light reflection and shadows a simple segmentation won’t work out.
Can anyone tell me how to find the surface area covered by paste on the cylinder?
First I'd simplify the problem by rectifying the perspective effect (you may need to upscale the image to not lose precision here).
Then I'd scan vertical lines across the image.
Further, you can simplify the problem by segmentation of two classes of pixels, base and painted. Make some statistical analysis to find the range for the larger region, consisting of base pixels. Probably will make use of mathematical median of all pixels.
Then you expand the color space around this representative pixel, until you find the highest color distance gap. Repeat the procedure to retrieve the painted pixels. There's other image processing routines you may have to do such as smoothing out the noise, removing outliers and background, etc.

OpenCV - calibrate camera using static images in water

I have a photocamera mounted vertically under water in a tank, looking downwards.
There is a flat grid on the bottom of the tank (approx 2m away from the camera).
I want to be able to place markers on the bottom, and use computer vision to know their real life exact position.
So, I need to map from pixels to mm.
If I am not mistaken, cv::calibrateCamera(...) does just this, but is dependent on moving a pattern in front of the camera.
I have just static pictures of the scene, and the camera never moves in relation to the grid. Thus, I have only a "single" image to find the parameters.
How can I do this using the grid?
Thank you.
Interesting problem! The "cute" part is the effect on the intrinsic parameters of the refraction at the water-glass interface, namely to increase the focal length (or, conversely, to reduce the field of view) compared to the same lens in air. In theory, you could calibrate in air and then correct for the difference in refraction index, but calibrating directly in water is likely to give you more accurate results.
Do know your accuracy requirements? And have you verified that your lens/sensor combination is adequate to meet them (with an adequate margin)? To answer the question you need to estimate (either by calculation from the lens and sensor specifications, or experimentally using a resolution chart) whether you can resolve in an image the minimal distances required by your application.
From the wording of your question I think that you are interested only in measurements on a single plane. So you only need to (a) remove the nonlinear (barrel or pincushion) lens distortion and (b) estimate the homography between the plane of interest and the image. Once you have the latter, you can directly convert from undistorted image coordinates to world ones by matrix multiplication. Additionally if (as I imagine) the plane of interest is roughly parallel to the image plane, you should not have any problem keeping the entire field-of-view in focus.
Of course, for all of this to work as expected, you should make sure that the tank bottom is really flat, within the measurement tolerances of your application. Otherwise you are really dealing with a 3D problem, and need to modify your procedures accordingly.
The actual procedure depends a lot on the size of the tank, which you don't indicate clearly. If it's small enough that it is practical to manufacture a chessboard-like movable calibration target, by all means go for it. You may want to take a look at this other answer for suggestions. In the following I'll discuss the more interesting case in which your tank is large, e.g. the size of a swimming pool.
I'd proceed by sticking calibration markers in a regular grid at the pool bottom. I'd probably choose checker-like markers like these, maybe printing them myself with a good laser printer on plastic with an adhesive backing (assuming you can leave them in place forever). You should plan on having quite a few of them, say, an 8x8 or 10x10 grid, covering as much as possible of the field of view of the camera in its operating position and pose. To help with lining up the grid nicely you might use a laser line projector of suitable fan angle, or a laser pointer attached to a rotating support. Note carefully that it is not necessary that they be affixed in a precise X-Y grid (which may be complicated, depending on the size of your pool), only that their positions with respect any arbitrarily chosen (but fixed) three of them be known. In other words, you can attach them to the bottom approximately in a grid, then measure the distances of three extreme corners from each other as accurately as you can, thus building a base triangle, then measure the distances of all the other corners from the vertices of the triangle, and finally reconstruct their true positions with a bit of trigonometry. It's basically a surveying problem and, depending on your accuracy requirements and budget, you may want to enroll a local friendly professional surveyor (and their tools) to get it done as precisely as necessary.
Once you have your grid, you can fill the pool, get your camera, focus and f-stop the lens as needed for the application. From now on you may not touch the focus and f-stop ever again, under penalty of miscalibrating - exposure can only be controlled by the exposure time, so make sure to have enough light. Disable any and all auto-focus and auto-iris functions, if any. If the camera has a non-rigid lens mount (e.g. a DLSR), you'll need some kind of mechanical rig to ensure that the lens-body pair stay rigid. F-stop as close as you can, given the available lighting and sensor, so to have a fair bit of depth of field available. Then take several photos (~ 10) of the grid, moving and rotating the camera, and going a bit closer and farther away than your expected operating distance from the plane. You'll want to "see" in some images some significant perspective foreshortening of the grid - this is needed to accurately calibrate the focal length. Avoid JPG and any other lossy compression format when storing the images - use lossless PNG or TIFF.
Once you have the images, you can manually mark and identify the checker markers in the images. For a once-off project like this I would not bother with automatic identification, just do it manually (e.g. in Matlab, or even in Photoshop or Gimp). To help identify the markers, you could, e.g. print a number next to them. Once you have the manual marks, you can refine them automatically to subpixel accuracy, e.g. using cv::findCornerSubpix.
You're almost done. Feed the "reference" measured position of the real corners, and the observed ones in all images, to your favorite camera calibration routine, e.g. cv::calibrateCamera. You use the nominal focal length of the camera (converted to pixels) for an initial estimate, along with null distortion. If all goes well, you will obtain the camera intrinsic parameters, which you will keep, and the camera poses at all images, which you'll throw away.
Now you can mount the camera in your final setup, as needed by your application, and take one further image of the grid. Mark and refine the corner positions as before. Undistort their image positions using the distortion parameters returned by the calibration. Finally compute the homography between the reference positions of the real markers (in meters) and their undistorted positions, and you're done.
HTH
To calibrate the camera you do need multiple images of the checkerboard (or one of the other patterns found here). What you can do, is calibrate the camera outside of the water or do a calibration sequence once.
Once you have that information (focal length, center of lens, distortion, etc). You can use the solvePNP function to estimate the orientation of a single board. This estimation provides you with a distance from the camera to the board.
A completely different alternative could be to find what kind of lens the camera uses and manually fill in the data. I've not tried this, so I'm uncertain how well this would work.

Do I need to resize the image to 2^a x 2^b before performing a fast fourier transform?

Do I need to resize the image to 2^a x 2^b before performing a fast fourier transform?
For the cases where you do need to use a power of 2, rather than resizing, you should pad with zeros.
In some cases, even if the algorithm you are using is not limited to powers of 2, it may be more efficient to pad to that size anyway (particularly if your image is just slightly smaller than the next power of 2). Also, if your image is not square, you can pad to a square image of 2^a before taking the fft. Some algorithms will do this behind the scenes anyway - pad your image to square, take the fft, then crop back to the original size.
Zero padding is also sometimes used to increase the number of points in the output - a higher frequency "resolution", although since you're not adding any more data it's more like an interpolation.
Whether you need to do this will depend on what fft library you are using.

Resources