Diagonal Direction Edge Detection - image-processing

Let say that a given image has an edge in some angle (not neglectable), so we use edge detection mask rotated, for example sobel operator in 45 degrees:
0 1 2
-1 0 1
-2 -1 0
or
-2 -1 0
-1 0 1
0 1 2
In this case, how we will find the magnitude of the edge?

Related

Confused about the Text Matrix and Transformation matrix of a pdf parser

I'm developing a PdfParser and I want to print the text content of the pdf on a coordinate plane. Below is the text object and matrices that are used to render text. How can I isolate the scaling, rotation and translation and use for printing the text content on exact coordinates on a canvas?
//Decoded text stream containing text objects
S
Q
q
0.000 0.750 0.750 -0.000 15.000 301.890 cm
0.000 g
/F10 16.000 Tf
0 Tr
0.000 Tc
BT
1 0 0 -1 20.000 13.600 Tm
[<007a>]TJ
ET
Q
q
0.000 0.750 0.750 -0.000 15.000 301.890 cm
1.000 0.416 0.000 rg
/F10 6.667 Tf
0 Tr
0.000 Tc
BT
1 0 0 -1 136.667 13.600 Tm
[<0024>12<0046><0046><0058><0055>6<0048><0003><0032><0058><0057><0053><0058><0057><0003><0036>-4<0052><004f><0058><0057><004c><0052><0051><0003><0026>3<004f><0052><0058><0047><0003><0048><0051>18<0059><004c><0055>6<0052><0051><0050><0048><0051>3<0057>7<000f><0003><0027><0028><0030><0032><0003><0044><0046><0046><0058><0055>6<0048>]TJ
ET
Q
q
0.000 0.750 0.750 -0.000 15.000 301.890 cm
0.000 g
/F10 16.000 Tf
0 Tr
0.000 Tc
BT
1 0 0 -1 603.333 13.600 Tm
[<007a>]TJ
ET
Q
q
The initial S Q is a leftover of a previous instruction block ending in some path stroking and graphics state restoring. As we don't know anything to the contrary, let's assume that 'Q' restores to the initial graphics state, in particular to an unmodified current transformation matrix (CTM).
As we are interested in coordinates according to the default user space coordinate system, we can assume accordingly that the current CTM is the identity matrix,
Let's take a look at the block
q
0.000 0.750 0.750 -0.000 15.000 301.890 cm
0.000 g
/F10 16.000 Tf
0 Tr
0.000 Tc
BT
1 0 0 -1 20.000 13.600 Tm
[<007a>]TJ
ET
Q
As you implied yourself in a comment, the only relevant instructions for the total transformation matrix at the time the text rendering instruction [<007a>]TJ begins to be executed are
0.000 0.750 0.750 -0.000 15.000 301.890 cm
and
1 0 0 -1 20.000 13.600 Tm
setting the current transformation matrix to
0 0.75 0 1 0 0 0 0.75 0
0.75 0 0 * 0 1 0 = 0.75 0 0
15.00 301.89 1 0 0 1 15.00 301.89 1
and the text and text line matrices both to
1 0 0
0 -1 0
20.0 13.6 1
Thus, the effects of text matrix and current transformation matrix combine to:
1 0 0 0 0.75 0 0 0.75 0
0 -1 0 * 0.75 0 0 = -0.75 0 0
20.0 13.6 1 15.00 301.89 1 25.2 316.89 1
You can split up that combined matrix in a scaling, rotation, and translation like this:
0 0.75 0 0.75 0 0 0 1 0 1 0 0
-0.75 0 0 = 0 0.75 0 * -1 0 0 * 0 1 0
25.2 316.89 1 0 0 1 0 0 1 25.2 316.89 1
We have a scaling by .75, a rotation by 90° counterclockwise, and a translation by (25.2, 316.89).
(Of course this can still be subject to a page rotation...)

Possibility of only dealing with specific region of binary image

Recently I study the image processing.
When I go through the problem of filling the hole, it confuses me (I assume that the people able to answer the question is familiar with the step of doing this so I skip to the problem):
Let's say if I have a binary image like this:
0 0 0 0 0 0 0
0 0 1 1 0 0 0
0 1 0 0 1 0 0
0 1 0 0 1 0 0
0 0 1 0 1 0 0
0 0 1 0 1 0 0
0 1 0 0 0 1 0
0 1 0 0 0 1 0
0 1 1 1 1 0 0
0 0 0 0 0 0 0
And the book says to start form the region that is inside of the hole and perform the dilation operation and set the bound in case it fills the whole image.
I have no problem understanding the whole process, but if I try to code it, how can I only deal with a specific region (in the hole for this case)? Or the actual implement would be different method ?
If you can assume that the object with holes does not touch the border of the image, you can create an intermediate image where you call flood fill (with value e.g. 2) on the top left pixel. Any remaining '0' pixels have to be inside the contour. Take the position of the first encountered remaining '0' pixel and flood fill it in the original image.

Cannot understand Keras ConvLSTM2D - edited

I'm looking at the example: https://github.com/fchollet/keras/blob/master/examples/conv_lstm.py
This RNN is actually predicting the next frame of the movie, so the output should be a movie too (according to the test data fed in). I wonder if there are information lost due to the conv layers with padding.
For example, the underlying Tensorflow is padding bottom right, if there is a big padding: (n stands for numbers)
n n n n 0 0 0
n n n n 0 0 0
n n n n 0 0 0
n n n n 0 0 0
0 0 0 0 0 0 0
0 0 0 0 0 0 0
0 0 0 0 0 0 0
when we do the second conv, the bottom right corner will always be 0, which means the back propagation will never be able to capture anything there. As in this case a movie(a square moves on the whole screen), will it lost the information when the validation label is on the bottom right corner?
The answer is yes after asking a Ph.D. doing AI research.

Parsing line-based structure (ray tracer) without using too many vars

I want to parse a file in scala (probably using JavaTokerParsers?). Possibly without using too many vars :-)
The file is the input for a ray tracer.
It is a line based file structure.
Three types of lines exists: empty line, comment line and command line
The comment line starts with # (maybe has some whitespace before the #)
Command line starts with an identifier optionally followed by a number of parameters (float or filename).
How would I go about this. I would want to parser to be called like this
val scene = parseAll(sceneFile, file);
Sample file:
#Cornell Box
size 640 480
camera 0 0 1 0 0 -1 0 1 0 45
output scene6.png
maxdepth 5
maxverts 12
#planar face
vertex -1 +1 0
vertex -1 -1 0
vertex +1 -1 0
vertex +1 +1 0
#cube
vertex -1 +1 +1
vertex +1 +1 +1
vertex -1 -1 +1
vertex +1 -1 +1
vertex -1 +1 -1
vertex +1 +1 -1
vertex -1 -1 -1
vertex +1 -1 -1
ambient 0 0 0
specular 0 0 0
shininess 1
emission 0 0 0
diffuse 0 0 0
attenuation 1 0.1 0.05
point 0 0.44 -1.5 0.8 0.8 0.8
directional 0 1 -1 0.2 0.2 0.2
diffuse 0 0 1
#sphere 0 0.8 -1.5 0.1
pushTransform
#red
pushTransform
translate 0 0 -3
rotate 0 1 0 60
scale 10 10 1
diffuse 1 0 0
tri 0 1 2
tri 0 2 3
popTransform
#green
pushTransform
translate 0 0 -3
rotate 0 1 0 -60
scale 10 10 1
diffuse 0 1 0
tri 0 1 2
tri 0 2 3
popTransform
#back
pushTransform
scale 10 10 1
translate 0 0 -2
diffuse 1 1 1
tri 0 1 2
tri 0 2 3
popTransform
#sphere
diffuse 0.7 0.5 0.2
specular 0.2 0.2 0.2
pushTransform
translate 0 -0.7 -1.5
scale 0.1 0.1 0.1
sphere 0 0 0 1
popTransform
#cube
diffuse 0.5 0.7 0.2
specular 0.2 0.2 0.2
pushTransform
translate -0.25 -0.4 -1.8
rotate 0 1 0 15
scale 0.25 0.4 0.2
diffuse 1 1 1
tri 4 6 5
tri 6 7 5
tri 4 5 8
tri 5 9 8
tri 7 9 5
tri 7 11 9
tri 4 8 10
tri 4 10 6
tri 6 10 11
tri 6 11 7
tri 10 8 9
tri 10 9 11
popTransform
popTransform
popTransform
Maybe I've pushed it too hard for the one liner but that's my take (although idiomatic it might not be optimal):
First, CommandParams represents a command along with its arguments in a list format. If no arguments then we have None args:
case class CommandParams(command:String, params:Option[List[String]])
Then here's the file parsing and construction one liner along with line-by-line explanation:
val fileToDataStructure = Source.fromFile("file.txt").getLines() //open file and get lines iterator
.filter(!_.isEmpty) //exclude empty lines
.filter(!_.startsWith("#")) //exclude comments
.foldLeft(List[CommandParams]()) //iterate and store in a list of CommandParams
{(listCmds:List[CommandParams], line:String) => //tuple of a list of objs so far and the current line
val arr = line.split("\\s") //split line on any space delim
val command = arr.head //first element of array is the command
val args = if(arr.tail.isEmpty) None else Option(arr.tail.toList) //rest are their params
new CommandParams(command, args)::listCmds //construct the obj and cons it to the list
}
.reverse //due to cons concat we need to reverse to preserve order
A demo output iterating through it:
fileToDataStructure.foreach(println)
yields:
CommandParams(size,Some(List(640, 480)))
CommandParams(camera,Some(List(0, 0, 1, 0, 0, -1, 0, 1, 0, 45)))
CommandParams(output,Some(List(scene6.png)))
CommandParams(maxdepth,Some(List(5)))
CommandParams(maxverts,Some(List(12)))
CommandParams(vertex,Some(List(-1, +1, 0)))
...
CommandParams(pushTransform,None)
CommandParams(pushTransform,None)
CommandParams(translate,Some(List(0, 0, -3)))
...
A demo of how to iterate through it to do actual work once loaded:
fileToDataStructure.foreach{
cmdParms => cmdParms match {
case CommandParams(cmd, None) => println(s"I'm a ${cmd} with no args")
case CommandParams(cmd, Some(args))=> println(s"I'm a ${cmd} with args: ${args.mkString(",")}")
}
}
yields output:
I'm a size with args: 640,480
I'm a camera with args: 0,0,1,0,0,-1,0,1,0,45
I'm a output with args: scene6.png
I'm a maxdepth with args: 5
I'm a maxverts with args: 12
I'm a vertex with args: -1,+1,0
...
I'm a popTransform with no args
I'm a popTransform with no args

Masking in DCT Compression

I am trying to do image compression using DCT (Discrete Cosine Transform). Can someone please help me understand how masking affects bit per pixel in DCT? How is the bit allocation done in the masking?
PS: By masking, I mean multiplying the DCT coefficients with a matrix like the one below (element wise multiplication, not matrix multiplication).
mask = [1 1 1 1 0 0 0 0
1 1 1 0 0 0 0 0
1 1 0 0 0 0 0 0
1 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0]
Background on "Masking"
Compression using the DCT calculates the DCT of blocks of an image, in this case 8x8 pixels. High frequency components of an image are less important for the human perception of an image and thus can be discarded in order to save space.
The mask Matrix selects which DCT coefficients are kept and which ones get discarded in order to save space. Coefficients towards the top left corner represent low frequencies.
For more information visit Discrete Cosine Transform.
This looks like a variation of quantization matrix.
Low frequencies are in top left, high frequencies are in bottom right. Eye is more sensitive to low frequencies, so removal of high frequency coefficients will remove less important details of the image.

Resources