How can I transpose an image in Assembly? - image-processing

I'm working on a project and I need to compute something based on the rows and columns of an image. It's easy to take the bits of the rows of the image. However, to take the bits of each column I need to transpose the image so the columns become rows.
I'm using a BMP picture as the input. How many rows X columns are in BMP picture? I'd like to see a pseudocode or something if possible too.

It sounds like you are wanting to perform a matrix transpose which is a little different than rotation. In rotation, the rows may become columns, but either the rows or the columns will be in reverse order depending on the rotation direction. Transposition maintains the original ordering of the rows and columns.
I think using the right algorithm is much more important than whether you use assembly or just C. Rotation by 90 degrees or transposition really boils down to just moving memory. The biggest thing to consider is the effect of cache misses if you use a naive algorithm like this:
for(int x=0; x<width; x++)
{
for(y=0; y<height; y++)
out[x][y] = in[y][x];
}
This will cause a lot of cache misses because you are jumping around in the memory a lot. It is more efficient to use a block based approach. Google for "cache efficient matrix transpose".
One place you may be able to make some gains is using SSE instructions to move more than one piece of data at a time. These are available in assembly and in C. Also check out this link. About half way down they have a section on computing a fast matrix transpose.
edit:
I just saw your comment that you are doing this for a class in assembly so you can probably disregard most of what I said. I assumed you were looking to squeeze out the best performance since you were using assembly.

It varies. BMPs can have any size (up to a limit), and they can be in different formats too (32-bit RBG, 24-bit RBG, 16-bit paletted, 8-bit paletted, 1-bit monochrome), and so on.
As with most other problems, it's best to write a solution first in the high-level language of your choice and then convert parts or all of it to ASM as needed.
But yes, in its simplest form for this task, which would be the 32-bit RGB format, rotating with some multiple of 90 degress will be like rotating a 2-D array.

Related

Delphi - image colors detection [duplicate]

I'm writing a program that works with images and at some point I need to posterize the image. This means I need to bin the colors, but I'm having trouble deciding how to tell how close one color is to another.
Given a color in RGB, I can think of at least 2 ways to see how different they are:
|r1 - r2| + |g1 - g2| + |b1 - b2|
sqrt((r1 - r2)^2 + (g1 - g2)^2 + (b1 - b2)^2)
And if I move into HSV, I can think of other ways of doing it.
So I ask, ignoring speed, what is the best way to tell how similar two colors are? Best meaning most accurate to the human eye.
Well, if speed is not an issue, the most accurate way would be to take some sample images and apply the filter to them using various cutoff values for the distance (distance being determined by one of the equations on the Color_difference page that astander linked to, meaning you'd have to use one of those color spaces listed there with the calculations, then convert to sRGB or something [which also means that you'd need to convert the image into the other color space first if it's not in it to begin with]), and then have a large number of people examine the images to see what looks best to them, then go with the cutoff value for the images that the majority agrees looks best.
Basically, it's largely a matter of subjectiveness; in fact, it also depends on how stylized you want the images, and you might even want to add in some sort of control so that you can alter the cutoff distance on the fly.
If speed does become a bit of an issue and/or you want more simplicity, then just use your second choice for distance calculation (which is simply the CIE76 equation; just make sure to use the Lab* color space) with the cutoff being around 2 or 2.3.
What do you mean by "posterize the image"?
If you're trying to cluster the colors into bins, you should look at
cluster analysis
Just a comment if you are going to move to HSV (or similar spaces):
Diffing on H: difference between 0° and 359° is numerically big but perceptually is negligible.
H difference if V or S are small - is small.
For computer vision apps, more important not perceptual difference (used mostly by paint manufacturers) but are these colors belong to the same object/segment or not. Which means that we might partially ignore V, which can change from lighting conditions.

How to multiply two pixels together

I'm reading a paper that involves finding the mean squared error of blocks of pixels. It uses the formula below. I is one image, I' is another image, and x and y are the pixel coordinates in each image.
What is confusing me is exactly how to do this math. Right now I have my images in RGB values. But how do I do this image math properly?
What is the correct way to square my resulting difference image? Is it by squaring the individual RGB channels alone, or should I be converting this to an int representation first?
Ideally I want to be able to compare several MSE's of different images, so keeping all of this data in individual channels doesn't seem to make sense. Is my intuition correct that I should just covert everything to an int representation, then square and divide by N^2 and find the smallest resulting value?
Formula
From this answer to a related question.
It really depends on what you want to detect. For example do you just want a single metric about how different are images that are substantially the same? Do you want to compare discolorations for two images that are not substantially the same, spatially?
So you could use any of a variety of approaches to determine what a value of I actually is. For example, it could be the R value, or G, or B, or something like the sum R+G+B.
I would try a bunch of these and see how your results are turning out, in addition to doing more research on color image differentiation.

Implementation of image dilation and erosion

I am trying to figure out an efficient way of implementing image dilation and erosion for binary images. As far as I understand it, the naive way would be:
loop through the image
if pixel is 1
loop through the neighborhood based on the structuring element's
height and width
(dilate) substitute each pixel of the image with the value in the
corresponding location of the SE
(erode) check if all neighborhood is equal to the SE, if so keep all
the pixels, else delete the centre
so this means that for each pixel I have to loop through the SE as well making this a O(NMW*H).
Is there a more elegant way of doing this?
Yes there are!!!
First you want to decompose (if possible) your structuring element into segments (a square being composed by a vertical and an horizontal segment). And then you perform only erosion/dilation on segments, which already decreases the complexity.
Now for the erosion/dilation parts, you have different solutions:
If you work only on 8-bits images and do not C/C++, you use an implementation with histograms in order to keep track of the minimum/maximum value. See this remarkable work here. He even adds "landmarks" in order to reduce the number of operations.
If you use C/C++ and work on different types of image encodings, then you can use fast comparisons (SSE2, SSE4 and auto-vectorization), as it is the case in the SMIL library. In this case, you compare row with row, instead of working pixel by pixel, using material accelerations. It seems to be the fastest library ever.
A last way to do, slower but works for all types of encoding, is to use the Lemmonier algorithm. It is implemented by the fulguro library.
For structuring elements of type disk, there is nothing "fast", you have to use the basic algorithm. For hexagonal structuring elements, you can work row by row, but it cannot be parallelized.

GPUImage Taking sum of columns of image

Im using GPUImage in my project and I need an efficient way of taking the column sums. Naive way would obviously be retrieving the raw data and adding values of every column. Can anybody suggest a faster way for that?
One way to do this would be to use the approach I take with the GPUImageAverageColor class (as described in this answer), only instead of reducing the total size of each frame at each step, only do this for one dimension of the image.
The average color filter determines the average color of the overall image by stepping down in a factor of four in both X and Y, averaging 16 pixels into one at each step. If operating in a single direction, you should be able to use hardware interpolation to get an 18X reduction in a single direction per step with good performance. Your final step might either require a quick CPU-based iteration on the much smaller image or a tweaked version of this shader that pulls the last few pixels in a column together into the final result pixel for that column.
You notice that I've been talking about averaging here, because the output values for any OpenGL ES operation will need to be in terms of colors, which only have a 0-255 range per channel. A sum will easily overflow this, but you could use an average as an approximation of your sum, with a more limited dynamic range.
If you only care about one color channel, you could possibly encode a larger value into the RGBA channels and maintain a 32-bit sum that way.
Beyond what I describe above, you could look at performing this sum with the help of the Accelerate framework. While probably not quite as fast as doing a shader-based reduction, it might be good enough for your needs.

Sine Table Interpolation

I want to put together a SDR system that tunes initially AM, later FM etc.
The system I am planning to use to do this will have a sine lookup table for Direct Digital Synthesis (DDS).
In order to tune properly I expect to need to be able to precisely control the frequency of the sine wave fed to the Mixer (multiplier in this case). I expect that linear interpolation will be close, but think a non-linear method will provide better results.
What is a good and fast interpolation method to use for sine tables. Multiplication and addition are cheap on the target system; division is costly.
Edit:
I am planning on implementing constants with multiply/shift functions to normalize the constants to scaled integers. Intermediate values will use wide adds, and multiplies will use 18 or 17 bits. Floating point "pre-computation" can be used, but not on the target platform. When I say "division is costly" I mean that it has to implemented using the multipliers and a lot of code. It's not unthinkable, but should be avoided. However, true floating point IEEE methods would take a significant amount of resources on this platform, as well as a custom implementation.
Any SDR experiences would be helpful.
If you don't get very good results with linear interpolation you can try the trigonometric relations.
Sum and Difference Formulas
sin(A+B)=sinA*cosB + cosA*sinB
sin(A-B)=sinA*cosB - cosA*sinB
cos(A+B)=cosA*cosB - sinA*sinB
cos(A-B)=cosA*cosB + sinA*sinB
and you can have precalculated sin and cos values for A, B ranges, ie
A range: 0, 10, 20, ... 90
B range: 0.01 ... 0.99
table interpolation for smooth functions = ick hurl bleah. IMHO I would only use table interpolation on some really weird function, or where you absolutely needed to ensure you avoid discontinuities (note that the derivatives for interpolated tables are discontinuous though). By the time you finish doing table lookups and the required interpolation code, you could have already evaluated a polynomial or two, at least if multiplication doesn't cause you too much heartburn.
IMHO you're much better off using Chebyshev approximation for each segment (e.g. -90 to +90 degrees, or -45 to +45 degrees, and then other segments of the same width) of the sine waveform, and picking the minimum degree polynomial that reduces your error to a desired value. If the segment is small enough you could get away with a quadratic or maybe even a linear polynomial; there's tradeoffs between accuracy, and # of segments, and degree of polynomial.
See my post in this other question, it'll save you the trouble of calculating coefficients (at least if you believe my math).
(edit: in case this wasn't clear, you do the Chebyshev approximation at design-time on your favorite high-powered PC, so that at run-time you can use a dirtbag microcontroller or FPGA or whatever with a simple polynomial of degree 1-4. Don't go over degree 4 unless you know what you're doing, 3 or below would be better.)
Why a table? This very fast function has its worst noise peak at -90db when the signal is at -20db. That's crazy good.
For resampling of audio, I always use one of the interpolators from the Elephant paper. This was discussed in a previous SO question.
If you're on a processor that doesn't have fp, you can still do these things, but they are harder. I've been there. I feel your pain. Good luck! I used to do conversions for fp to integer for fun, but now you'd have to pay me to do it. :-)
Cool online references that apply to your problem:
http://www.audiomulch.com/~rossb/code/sinusoids/
http://www.dattalo.com/technical/theory/sinewave.html
Edit: additional thoughts based on your comments
Since you're working on a tricky processor, maybe you should look into how to make your sine table have more angles to look up, but still keep it small.
Suppose you break a quadrant into 90 pieces (in reality, you'd probably use 256 pieces, but let's keep it 90 for familiarity and clarity). Encode those as 16 bits. That's 180 bytes of table so far.
Now, for every one of those degrees, we're going to have 9 (in reality probably 8 or 16) in-between points.
Let's take the range between 3 degrees and 4 degrees as an example.
sin(3)=0.052335956 //this will be in your table as a 16-bit number
sin(4)=0.069756474 //this will be in your table as a 16-bit number
so we're going to look at sin(3.1)
sin(3.1)=0.054978813 //we're going to be tricky and store the result
// in 8 bits as a percentage of the distance between
// sin(3) and sin(4)
What you want to do is figure out how sin(3.1) fits in between sin(3) and sin(4). If it's half way between, code that as a byte of 128. If it's a quarter of the way between, code that as 64.
That's an additional 90 bytes and you've encoded down to a tenth of a degree in 16-bit res in only 180+90*9 bytes. You can extend as needed (maybe going up to 32-bit angles and 16-bit tween angles) and linearly interpolate in between very quickly. To minimize storage space, you're taking advantage of the fact that consecutive values are close to each other.
Edit 2: better way to encode the in-between angles in a table
I just remembered that when I did this, I ended up very compactly expressing the difference between the expected value according to linear interpolation and the actual value. This error is always in the same direction.
I first calculated the maximum error in the range and then based the scale on that.
Worked great. I feel like I should do the code in a blog entry to illustrate. :-)
Interpolation in a sine table is effectively resampling. Obviously you can get perfect results by a single call to sin, so whatever your solution is it needs to outperform that. For fixed-filter resampling, you're still going to only have a fixed set of available points (a 3:1 upsampler means you'll have 2 new points available between each point in your table). How expensive is memory on the target system? My primary recommendation is simply improve the table resolution and use linear interpolation. You'll get the same results as a smaller table and simple upsample but with less computational overhead.
Have you considered using the Taylor series for the trig functions (found here)? This involves multiplication and division but depending on how your numbers are represented you may be able to turn the division into multiplication (or bit shifts if you're very lucky). You can compute as many terms of the series as you need and get your precision that way.
Alternately if this sine wave is going to be an analog signal at some point then you could just use a lookup table approach and use an analog filter to remove the sampling frequency from the resulting waveform. If your sampling frequency is 100 times the sine frequency it will be easy to remove. You'll need a variable filter to do this. I've never done such a thing but I know there's digital potentiometers that take a binary number and change their resistance. That could be the basis of a variable RC filter - probably with some op-amps for gain, etc.
Good luck!
People have written some amazingly clever code for quickly calculating sin() on systems with tiny amounts of memory that don't even have a hardware multiply instruction, much less a division instruction.
In order of increasing complexity:
Use a square wave. Many AM radios use square waves in their ring demodulator, and I fail to see why your AM demodulator requires anything more complicated.
Approximate sin() by looking up the "closest value" in a raw table of 256 values per quarter-cycle. Yes, you see horrible-looking stair-steps, but (with a little bit of analog filtering) this often works well. (In fact, this is often overkill, and a much shorter table is adequate).
Approximate sin() by looking up the 2 closest values in a raw table, and linearly interpolating between them.
Approximate sin() with 16 short, equally-spaced-in-x cubic splines per quarter-cycle "gives better than 16-bit precision" for sin(x).
Wikibooks: Fixed-Point Numbers links to some clever implementations of the last 3.

Resources