I'm building a web app to help students with learning Maths.
The app needs to display Maths content that comes from LaTex files.
These Latex files render (beautifully) to pdf that I can convert cleanly to svg thanks to pdf2svg.
The (svg or png or whatever image format) image looks something like this:
_______________________________________
| |
| 1. Word1 word2 word3 word4 |
| a. Word5 word6 word7 |
| |
| ///////////Graph1/////////// |
| |
| b. Word8 word9 word10 |
| |
| 2. Word11 word12 word13 word14 |
| |
|_______________________________________|
Real example:
The web app intent is to manipulate and add content to this, leading to something like this:
_______________________________________
| |
| 1. Word1 word2 | <-- New line break
|_______________________________________|
| |
| -> NewContent1 |
|_______________________________________|
| |
| word3 word4 |
|_______________________________________|
| |
| -> NewContent2 |
|_______________________________________|
| |
| a. Word5 word6 word7 |
|_______________________________________|
| |
| ///////////Graph1/////////// |
|_______________________________________|
| |
| -> NewContent3 |
|_______________________________________|
| |
| b. Word8 word9 word10 |
|_______________________________________|
| |
| 2. Word11 word12 word13 word14 |
|_______________________________________|
Example:
A large single image cannot give me the flexibility to do this kind of manipulations.
But if the image file was broken down into smaller files which hold single words and single Graphs I could do these manipulations.
What I think I need to do is detect whitespace in the image, and slice the image into multiple sub-images, looking something like this:
_______________________________________
| | | | |
| 1. Word1 | word2 | word3 | word4 |
|__________|_______|_______|____________|
| | | |
| a. Word5 | word6 | word7 |
|_____________|_______|_________________|
| |
| ///////////Graph1/////////// |
|_______________________________________|
| | | |
| b. Word8 | word9 | word10 |
|_____________|_______|_________________|
| | | | |
| 2. Word11 | word12 | word13 | word14 |
|___________|________|________|_________|
I'm looking for a way to do this.
What do you think is the way to go?
Thank you for your help!
I would use horizontal and vertical projection to first segment the image into lines, and then each line into smaller slices (e.g. words).
Start by converting the image to grayscale, and then invert it, so that gaps contain zeros and any text/graphics are non-zero.
img = cv2.imread('article.png', cv2.IMREAD_COLOR)
img_gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
img_gray_inverted = 255 - img_gray
Calculate horizontal projection -- mean intensity per row, using cv2.reduce, and flatten it to a linear array.
row_means = cv2.reduce(img_gray_inverted, 1, cv2.REDUCE_AVG, dtype=cv2.CV_32F).flatten()
Now find the row ranges for all the contiguous gaps. You can use the function provided in this answer.
row_gaps = zero_runs(row_means)
Finally calculate the midpoints of the gaps, that we will use to cut the image up.
row_cutpoints = (row_gaps[:,0] + row_gaps[:,1] - 1) / 2
You end up with something like this situation (gaps are pink, cutpoints red):
Next step would be to process each identified line.
bounding_boxes = []
for n,(start,end) in enumerate(zip(row_cutpoints, row_cutpoints[1:])):
line = img[start:end]
line_gray_inverted = img_gray_inverted[start:end]
Calculate the vertical projection (average intensity per column), find the gaps and cutpoints. Additionally, calculate gap sizes, to allow filtering out the small gaps between individual letters.
column_means = cv2.reduce(line_gray_inverted, 0, cv2.REDUCE_AVG, dtype=cv2.CV_32F).flatten()
column_gaps = zero_runs(column_means)
column_gap_sizes = column_gaps[:,1] - column_gaps[:,0]
column_cutpoints = (column_gaps[:,0] + column_gaps[:,1] - 1) / 2
Filter the cutpoints.
filtered_cutpoints = column_cutpoints[column_gap_sizes > 5]
And create a list of bounding boxes for each segment.
for xstart,xend in zip(filtered_cutpoints, filtered_cutpoints[1:]):
bounding_boxes.append(((xstart, start), (xend, end)))
Now you end up with something like this (again gaps are pink, cutpoints red):
Now you can cut up the image. I'll just visualize the bounding boxes found:
The full script:
import cv2
import numpy as np
import matplotlib.pyplot as plt
from matplotlib import gridspec
def plot_horizontal_projection(file_name, img, projection):
fig = plt.figure(1, figsize=(12,16))
gs = gridspec.GridSpec(1, 2, width_ratios=[3,1])
ax = plt.subplot(gs[0])
im = ax.imshow(img, interpolation='nearest', aspect='auto')
ax.grid(which='major', alpha=0.5)
ax = plt.subplot(gs[1])
ax.plot(projection, np.arange(img.shape[0]), 'm')
ax.grid(which='major', alpha=0.5)
plt.xlim([0.0, 255.0])
plt.ylim([-0.5, img.shape[0] - 0.5])
ax.invert_yaxis()
fig.suptitle("FOO", fontsize=16)
gs.tight_layout(fig, rect=[0, 0.03, 1, 0.97])
fig.set_dpi(200)
fig.savefig(file_name, bbox_inches='tight', dpi=fig.dpi)
plt.clf()
def plot_vertical_projection(file_name, img, projection):
fig = plt.figure(2, figsize=(12, 4))
gs = gridspec.GridSpec(2, 1, height_ratios=[1,5])
ax = plt.subplot(gs[0])
im = ax.imshow(img, interpolation='nearest', aspect='auto')
ax.grid(which='major', alpha=0.5)
ax = plt.subplot(gs[1])
ax.plot(np.arange(img.shape[1]), projection, 'm')
ax.grid(which='major', alpha=0.5)
plt.xlim([-0.5, img.shape[1] - 0.5])
plt.ylim([0.0, 255.0])
fig.suptitle("FOO", fontsize=16)
gs.tight_layout(fig, rect=[0, 0.03, 1, 0.97])
fig.set_dpi(200)
fig.savefig(file_name, bbox_inches='tight', dpi=fig.dpi)
plt.clf()
def visualize_hp(file_name, img, row_means, row_cutpoints):
row_highlight = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
row_highlight[row_means == 0, :, :] = [255,191,191]
row_highlight[row_cutpoints, :, :] = [255,0,0]
plot_horizontal_projection(file_name, row_highlight, row_means)
def visualize_vp(file_name, img, column_means, column_cutpoints):
col_highlight = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
col_highlight[:, column_means == 0, :] = [255,191,191]
col_highlight[:, column_cutpoints, :] = [255,0,0]
plot_vertical_projection(file_name, col_highlight, column_means)
# From https://stackoverflow.com/a/24892274/3962537
def zero_runs(a):
# Create an array that is 1 where a is 0, and pad each end with an extra 0.
iszero = np.concatenate(([0], np.equal(a, 0).view(np.int8), [0]))
absdiff = np.abs(np.diff(iszero))
# Runs start and end where absdiff is 1.
ranges = np.where(absdiff == 1)[0].reshape(-1, 2)
return ranges
img = cv2.imread('article.png', cv2.IMREAD_COLOR)
img_gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
img_gray_inverted = 255 - img_gray
row_means = cv2.reduce(img_gray_inverted, 1, cv2.REDUCE_AVG, dtype=cv2.CV_32F).flatten()
row_gaps = zero_runs(row_means)
row_cutpoints = (row_gaps[:,0] + row_gaps[:,1] - 1) / 2
visualize_hp("article_hp.png", img, row_means, row_cutpoints)
bounding_boxes = []
for n,(start,end) in enumerate(zip(row_cutpoints, row_cutpoints[1:])):
line = img[start:end]
line_gray_inverted = img_gray_inverted[start:end]
column_means = cv2.reduce(line_gray_inverted, 0, cv2.REDUCE_AVG, dtype=cv2.CV_32F).flatten()
column_gaps = zero_runs(column_means)
column_gap_sizes = column_gaps[:,1] - column_gaps[:,0]
column_cutpoints = (column_gaps[:,0] + column_gaps[:,1] - 1) / 2
filtered_cutpoints = column_cutpoints[column_gap_sizes > 5]
for xstart,xend in zip(filtered_cutpoints, filtered_cutpoints[1:]):
bounding_boxes.append(((xstart, start), (xend, end)))
visualize_vp("article_vp_%02d.png" % n, line, column_means, filtered_cutpoints)
result = img.copy()
for bounding_box in bounding_boxes:
cv2.rectangle(result, bounding_box[0], bounding_box[1], (255,0,0), 2)
cv2.imwrite("article_boxes.png", result)
The image is top quality, perfectly clean, not skewed, well separated characters. A dream !
First perform binarization and blob detection (standard in OpenCV).
Then cluster the characters by grouping those with an overlap in the ordinates (i.e. facing each other in a row). This will naturally isolate the individual lines.
Now in every row, sort the blobs left-to-right and cluster by proximity to isolate the words. This will be a delicate step, because the spacing of characters within a word is close to the spacing between distinct words. Don't expect perfect results. This should work better than a projection.
The situation is worse with italics as the horizontal spacing is even narrower. You may have to also look at the "slanted distance", i.e. find the lines that tangent the characters in the direction of the italics. This can be achieved by applying a reverse shear transform.
Thanks to the grid, the graphs will appear as big blobs.
I need transform view from origin(250, 250) to origin(352, 315), and width/height change from (100.0, 100.0) to (68, 68).
I know I can combine several CGAffineTransform function together, such as scale, rotate, translate.
But i don't know how to count the order of those transformations, and the exact parameter of them.
I have try several time, but can't move the view to correct position.
Anyone can help?
A little understanding about what is happening behind the scenes is always nice in these matrix transformations.
Apple docs has a great documentation about transforms, so let's use it.
A translation matrix looks like :
| 1 0 0 |
| 0 1 0 |
| tx ty 1 |
where (tx, ty) is your translation vector.
A scaling matrix looks like :
| sx 0 0 |
| 0 sy 0 |
| 0 0 1 |
where sxand sy are the scale factor in the X and Y axis.
You want to concatenate these matrix using CGAffineTransformConcat, but as according to its doc :
Note that matrix operations are not commutative—the order in which you
concatenate matrices is important. That is, the result of multiplying
matrix t1 by matrix t2 does not necessarily equal the result of
multiplying matrix t2 by matrix t1.
You have to translate your view before scaling it, otherwise your translation vector will be scaled according to sx and sy coefficients.
Let's show it easily :
let scaleMatrix = CGAffineTransformMakeScale(0.68, 0.68)
let translateMatrix = CGAffineTransformMakeTranslation(102, 65)
let translateThenScaleMatrix = CGAffineTransformConcat(scaleMatrix, translateMatrix)
NSLog("translateThenScaleMatrix : \(translateThenScaleMatrix)")
// outputs : CGAffineTransform(a: 0.68, b: 0.0, c: 0.0, d: 0.68, tx: 102.0, ty: 65.0)
// the translation is the same
let scaleThenTranslateMatrix = CGAffineTransformConcat(translateMatrix, scaleMatrix)
NSLog("scaleThenTranslateMatrix : \(scaleThenTranslateMatrix)")
// outputs : CGAffineTransform(a: 0.68, b: 0.0, c: 0.0, d: 0.68, tx: 69.36, ty: 44.2)
// the translation has been scaled too
And let's prove it mathematically. Please note that when you perform an operation A then an operation B, the related matrix is computed by doing matB*matA, the first operation is on the right. Since multiplication is not commutative for matrix, it's important.
// Translate then scaling :
| sx 0 0 | | 1 0 0 | | sx 0 0 |
| 0 sy 0 | . | 0 1 0 | = | 0 sy 0 |
| 0 0 1 | | tx ty 1 | | tx ty 1 |
// The resulting matrix has the same value for translation
// Scaling then translation :
| 1 0 0 | | sx 0 0 | | sx 0 0 |
| 0 1 0 | . | 0 sy 0 | = | 0 sy 0 |
| tx ty 1 | | 0 0 1 | | sx.tx sy.ty 1 |
// The translation values are affected by scaling coefficient
struct CGAffineTransform {
CGFloat a, b, c, d;
CGFloat tx, ty;
};
You can get parameters by this struct.And transforms always override,in another words,they won't superpose,pay attention to this.
This question is unlikely to help any future visitors; it is only relevant to a small geographic area, a specific moment in time, or an extraordinarily narrow situation that is not generally applicable to the worldwide audience of the internet. For help making this question more broadly applicable, visit the help center.
Closed 9 years ago.
I am not a specialist in C/C++.
I found this declaration today:
typedef NS_OPTIONS(NSUInteger, PKRevealControllerType)
{
PKRevealControllerTypeNone = 0,
PKRevealControllerTypeLeft = 1 << 0,
PKRevealControllerTypeRight = 1 << 1,
PKRevealControllerTypeBoth = (PKRevealControllerTypeLeft | PKRevealControllerTypeRight)
};
Can you guys translate what values every value will have?
opertor << is bitwise left shift operator. Shift all the bits to left a specified number of times: (arithmetic left shift and reserves sign bit)
m << n
Shift all the bits of m to left a n number of times. (notice one shift == multiply by two).
1 << 0 means no shift so its equals to 1 only.
1 << 1 means one shift so its equals to 1*2 = 2 only.
I explain with one byte: one in one byte is like:
MSB
+----+----+----+---+---+---+---+---+
| 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1
+----+----+----+---+---+---+---+---+
7 6 5 4 3 2 1 / 0
| / 1 << 1
| |
▼ ▼
+----+----+----+---+---+---+---+---+
| 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 2
+----+----+----+---+---+---+---+---+
7 6 5 4 3 2 1 0
Whereas 1 << 0 do nothing but its like figure one. (notice 7th bit is copied to preserve sign)
OR operator: do bit wise or
MSB PKRevealControllerTypeLeft
+----+----+----+---+---+---+---+---+
| 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | == 1
+----+----+----+---+---+---+---+---+
7 6 5 4 3 2 1 0
| | | | | | | | OR
MSB PKRevealControllerTypeRight
+----+----+----+---+---+---+---+---+
| 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | == 2
+----+----+----+---+---+---+---+---+
7 6 5 4 3 2 1 0
=
MSB PKRevealControllerTypeBoth
+----+----+----+---+---+---+---+---+
| 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | == 3
+----+----+----+---+---+---+---+---+
7 6 5 4 3 2 1 0
| is bit wise operator. in below code it or 1 | 2 == 3
PKRevealControllerTypeNone = 0, // is Zero
PKRevealControllerTypeLeft = 1 << 0, // one
PKRevealControllerTypeRight = 1 << 1, // two
PKRevealControllerTypeBoth = (PKRevealControllerTypeLeft |
PKRevealControllerTypeRight) // three
There is not more technical reason to initialized values like this, defining like that makes things line up nicely read this answer:define SOMETHING (1 << 0)
compiler optimization convert them in simpler for like: (I am not sure for third one, but i think compiler will optimize that too)
PKRevealControllerTypeNone = 0, // is Zero
PKRevealControllerTypeLeft = 1, // one
PKRevealControllerTypeRight = 2, // two
PKRevealControllerTypeBoth = 3, // Three
Edit: #thanks to Till.
read this answer App States with BOOL flags show the usefulness of declarations you got using bit wise operators.
It's an enum of bit flags:
PKRevealControllerTypeNone = 0 // no flags set
PKRevealControllerTypeLeft = 1 << 0, // bit 0 set
PKRevealControllerTypeRight = 1 << 1, // bit 1 set
And then
PKRevealControllerTypeBoth =
(PKRevealControllerTypeLeft | PKRevealControllerTypeRight)
is just the result of bitwise OR-ing the other two flags. So, bit 0 and bit 1 set.
The << operator is the left shift operator. And the | operator is bitwise OR.
In summary the resulting values are:
PKRevealControllerTypeNone = 0
PKRevealControllerTypeLeft = 1
PKRevealControllerTypeRight = 2
PKRevealControllerTypeBoth = 3
But it makes a lot more sense to think about it in terms of flags of bits. Or as a set where the universal set is: { PKRevealControllerTypeLeft, PKRevealControllerTypeRight }
To learn more you need to read up about enums, shift operators and bitwise operators.
This looks like Objective C and not C++, but regardless:
1 << 0
is just one bitshifted left (up) by 0 positions. Any integer "<<0" is just itself.
So
1 << 0 = 1
Similarly
1 << 1
is just one bitshifted left by 1 position. Which you could visualize a number of ways but the easiest is to multiply by 2.[Note 1]
So
x << 1 == x*2
or
1 << 1 == 2
Lastly the single pipe operator is a bitwise or.
So
1 | 2 = 3
tl;dr:
PKRevealControllerTypeNone = 0
PKRevealControllerTypeLeft = 1
PKRevealControllerTypeRight = 2
PKRevealControllerTypeBoth = 3
[1] There are some limitations on this generalization, for example when x is equal to or greater than 1/2 the largest value capable of being stored by the datatype.
This all comes down to bitwise arithmetic.
PKRevealControllerTypeNone has a value of 0 (binary 0000)
PKRevealControllerTypeLeft has a value of 1 (binary 0001)
PKRevealControllerTypeRight has a value of 2 (binary 0010) since 0001 shifted left 1 bit is 0010
PKRevealControllerTypeBoth has a value of 3 (binary 0011) since 0010 | 0001 (or works like addition) = 0011
In context, this is most-likely used to determine a value. The property is & (or bitwise-and) works similar to multiplication. If 1 ands with a number, then the number is preserved, if 0 ands with a number, then the number is cleared.
Thus, if you want to check if a particular controller is specifically type Left and it has a value of 0010 (i.e. type Right) 0010 & 0001 = 0 which is false as we expect (thus, you have determined it is not of correct type). However, if the controller is Both 0011 & 0001 = 1 so the result is true which is correct since we determined this is of Both types.
For example I have grid.
//grid for answers_for_online
var answersGridForOnline5 = new Ext.grid.GridPanel({
id : 'grid_for_stats',
store : storez3,
columns : answers_columns5,
});
my column:
var answers_columns5 = [{
id: "idz",
header: 'idz',
dataIndex: "idz",
renderer: fun_f
}];
and renderer function
function fun(n, j, k, m, h, i) {
var count = store.snapshot ? store.snapshot.length : store.getCount()
var cez = k.get("scale")
var ce = ( 2 / count ) * 100
return ce + " % "
}
Question: In database I have for example: scales (that user answered on scale-question)
id | scale
1 | 4
2 | 4
3 | 1
4 | 2
How i can sum scales (and group them of course) and put this in my grid?
For example in my grid i should get:
scale | scale %
1 | 25%
2 | 25%
4 | 50%
I advise you don't attempt to do it inside Grid/Store. Instead process the data before loading it to store - for example do it in database with GROUP BY statement.
To get the sum of values in a store, you can use Store.sum()