How to get the accurate font size(height) in pdf - parsing

I have a sample pdf (attached), and it includes a text object and a rectangle object that have almost the same height. Then I checked the content of the pdf by using itextrup as below:
1 1 1 RG
1 1 1 rg
0.12 0 0 0.12 16 50 cm
q
0 0 m
2926 0 l
2926 5759 l
0 5759 l
0 0 l
W
n
Q
1 1 1 RG
1 1 1 rg
q
0 0 m
2926 0 l
2926 5759 l
0 5759 l
0 0 l
W
n
/F1 205.252 Tf
BT
0 0 0 RG
0 0 0 rg
/DeviceGray CS
/OC /oc1 BDC
0 -1 1 0 1648 5330 Tm
0 Tc
100 Tz
(Hello World) Tj
ET
Q
q
0 0 m
2926 0 l
2926 5759 l
0 5759 l
0 0 l
W
n
0 0 0 RG
0 0 0 rg
/DeviceGray CS
6 w
1 j
1 J
1649 5324 m
1649 4277 l
1800 4277 l
1800 5324 l
1649 5324 l
S
EMC
Q
Obviously the user space matrix is determined by [0.12 0 0 0.12 16 50], and the height for the rectangle is (1800-1649)*0.12*1=18.12, and for the font size I use 205.252*0.12=24.63024. Since the two values are not close, my problem is how to get the height/size of the font?
sample.pdf

OK - I took a look at your file and you're basically hosed. That's the scientific answer, now let me clarify :)
Bad PDF!
The PDF you have up there as a sample contains a font that is not embedded. That "/F1 Tf" command you have there points to the font "ArialMT" in the resources dict for that page. Because the font has not been embedded, you only have two options:
Try to find the actual font on the system and extract the necessary information from there.
Live with the information in the PDF. Let's start with that.
Font Descriptor
Here is an image from pdfToolbox examining the font in the PDF file (caution: I'm associated with this tool):
I've cut off some of the "Widths" table, but other than that this is all of the information you have in the PDF document for this font. And this means you can access the widths for each glyph, but you don't have access to the heights of each glyph. The only information you have regarding heights is the font bounding box which is the union of all glyph bounding boxes. In other words, the font bounding box is guaranteed to be big enough to contain any glyph from the font (both horizontally and vertically).
System Information
You don't say why you need this information so it becomes a little harder to advise further. But if you can't get the information from the PDF, you're only option is to live with the inaccurate information from the PDF or to turn to the system your code is running on to get you more.
If you have the ArialMT font installed, you could basically try to find the font file and then parse the TrueType font file to find the bounding boxes for each glyph. I've done that, it's not funny.
Or you can see if your system can't provide you with the information in a better way. Many operating systems / languages have text calls that can get accurate measurements for you. If not, you can brute force it by rendering the text you want in black on a white image and then examining the pixels to see where you hit and thus how big the largest glyph in your text string was.
Wasteful though that last option sounds, it's probably the quickest and easiest to implement and it - depending on your needs - may actually be the best option all around.

I have a sample pdf (attached), and it includes a text object and a rectangle object that have almost the same height.
Indeed, your PDF is displayed like this:
But looking at this one quickly realizes that the glyphs in your text "Hello World" do not extend beneath the base line like a 'g', 'j' or some other glyphs would:
(The base line is the line through the glyph origins)
Since the two values are not close, my problem is how to get the height/size of the font
Obviously the space required for such descenders beneath the base line must also be part of the font size.
Thus, it is completely correct and not a problem that the height of the box (18.12) is considerably smaller than the font size (24.63024).
BTW, this corresponds with the specification which describes a font size of 1 to be arranged so that the nominal height of tightly spaced lines of text is 1 unit, cf. section 9.2.2 "Basics of Showing Text" of ISO 32000-1. Tightly spaced lines obviously need to include not only glyph parts above the base line but also those below. Additionally it furthermore includes a small gap between such lines as even tightly spaced lines are not expected to touch each other.

Related

Remove unnecessary margin in epslatex/gnuplot

I would like to remove unnecessary margins (gray part in the figure below) in a PDF figure generated by epslatex (gnuplot).
Following are scripts and commands to create the figure.
set term epslatex standalone
set output "figure.tex"
set xlabel "\\LARGE $x$"
set ylabel "\\LARGE $y$"
set format x "\\Large{%.1f}"
set format y "\\Large{%.1f}"
set key left top Left
set size square
set xrange [0.0:1.0]
set yrange [0.0:1.0]
plot x with lines dt 1 lw 5.0 lc rgb "red" title "\\Large $y = x$",\
x*x with lines dt 2 lw 5.0 lc rgb "green" title "\\Large $y = x^2$",\
x*x*x with lines dt 3 lw 5.0 lc rgb "blue" title "\\Large $y = x^3$"
and commands
$ gnuplot sample.gp
$ pdflatex figure.tex
Instead of pdflatex, xelatex would also work. I would like to directly convert to PDF file.
It will be super good if we can remove these margins without too much effort (such as removing the margin manually one-by-one).
Thanks!
If you check help latex, it will tell you that default size is 5 x 3 inches.
Since you set size square, there will be for sure an "unwanted" left and right margin.
What you can do to at least minimize the margins is to set the terminal size to square size as well, e.g. to 3 x 3 inches.
However, keep in mind the graph size is square but x and y axes have tics and labels which require space depending on numbers and labels which might be different for x and y.
set term epslatex standalone size 3 in, 3 in
From help latex:
Syntax:
set terminal {latex | emtex} {default | {courier|roman} {<fontsize>}}
{size <XX>{unit}, <YY>{unit}} {rotate | norotate}
By default the plot will inherit font settings from the embedding
document. You have the option of forcing either Courier (cmtt) or
Roman (cmr) fonts instead. In this case you may also specify a
fontsize. Unless your driver is capable of building fonts at any size
(e.g. dvips), stick to the standard 10, 11 and 12 point sizes.
...
Maybe, there are LaTeX commands to crop the graph to its bounding box.
Thanks to #AlainMarigot help, I change the system to Lua tikz with the option tightboundingbox.
Looks good, but not exactly same as epslatex.

mapping highlights/annotations to text in pdf

So i have this sample pdf file with three words on separate lines:
"
hello
there
world
"
I have highlighted the word "there" on the second line. Internally, within the pdf, i'm trying to map the highlight/annotation structure to the text (BT) area.
The section corresponding to the word "there" looks like so:
BT
/F0 14.6599998 Tf
1 0 0 -1 0 130 Tm
96 0 Td <0057> Tj
4.0719757 0 Td <004B> Tj
8.1511078 0 Td <0048> Tj
8.1511078 0 Td <0055> Tj
4.8806458 0 Td <0048> Tj
ET
I also have an annotation section where I have my highlight which has the following rect dimensions:
18 0 19 15 20 694 21 786 22 853 23 1058 24 1331 [19 0 R 20 0 R]<</AP<</N 10 0 R>>
...
(I left the top part of the annotation out on purpose because it is long. I extracted what i thought were the most important parts.
Rect[68.0024 690.459 101.054 706.37]
I'm kind of confused about how my text is mapped to this one highlight that I have. The coordinates do not seem to match (130 y vs 690 y)? Am I looking in the right place and interpreting my text and/or highlight annotation coordinates correctly?
Update:
i want to add more info on how I created this test pdf.
Its pretty simple to recreate the pdf. I went to google docs and created an empty document. On three lines i wrote my text as described above. I downloaded that as a pdf and then opened it in adobe acrobat reader DC (the newest one i think). I then used adobe acrobat reader to highlight the specified line and re save it. After that I used some python to unzip the pdf sections.
The python code to decompress the pdf sections:
import re
import zlib
pdf = open("helloworld.pdf", "rb").read()
stream = re.compile(r'.*?FlateDecode.*?stream(.*?)endstream', re.S)
for s in stream.findall(pdf):
s = s.strip('\r\n')
try:
print(zlib.decompress(s))
print("")
except:
pass
Unfortunately the OP only explained how he created his document and did not share the document itself. I followed his instructions but the coordinates of the annotation differ. As I only have this document for explanation, though, the OP will have to mentally adapt the following to the precise numbers in his document.
The starting coordinate system
The starting (default) user coordinate system in the document is implied by the crop box. In the document at hand the crop box is defined as
/CropBox [0 0 596 843]
i.e. the visible page is 596 units wide and 843 units high (given the default user unit of 1/72" this is an A4 format) and the origin is in the lower left corner. x coordinates increase to the right, y coordinate increase upwards. Thus, a coordinate system as usually started with in math, too.
The annotation rectangle
This also is the coordinate system of the annotation rectangle coordinates.
In the case at hand they are
/Rect [68.0595 741.373 101.138 757.298]
i.e. the rectangle with the lower left corner at (68.0595, 741.373) and the upper right at (101.138, 757.298).
Transformations of the coordinate system
In the page content stream up to the text object already identified by the OP the coordinate system gets transformed a number of times.
Mirroring, translation
In the very first line of the page content
1 0 0 -1 0 843 cm
This transformation moves the origin up by 843 units and mirrors (multiplies by -1) the y coordinate.
Thus, now be have a coordinate system with the origin in the upper left and y coordinate increasing downwards.
Scaling
A bit later in the content stream the coordinate system is scaled
.75062972 0 0 .75062972 0 0 cm
Thus, the coordinate units are compressed to about 3/4 of their original width and height, i.e. each unit along the x or y is only 1/96" wide/high.
The text "there"
Only after these transformations have been applied to the coordinate system, the text object identified by the OP is drawn. It starts by setting and changing the text matrix:
1 0 0 -1 0 130 Tm
This sets the text matrix to translate by 130 units in y direction and mirroring y coordinates once again. (Mirroring back again is necessary as otherwise the text would be drawn upside down.)
96 0 Td
This changes the text matrix by moving 96 units along the x axis.
And the starting point where the text is drawn is at the origin of the coordinate system first changed by the mirroring and translation, and then by scaling of the current transformation matrix, and then by mirroring and translation according to the text matrix.
Does it match?
Which coordinate would this point be in the default user coordinate system?
x = (0 + 96) * .75062972 = 72 (approximately)
y = (((0 * (-1)) + 130) * .75062972) * (-1) + 843 = 745,4 (approximately)
This matches with the annotation rectangle (see above) with x coordinates between 68.0595 and 101.138 and y coordinates between 741.373 and 757.298.
So
I'm kind of confused about how my text is mapped to this one highlight that I have. The coordinates do not seem to match (130 y vs 690 y)? Am I looking in the right place and interpreting my text and/or highlight annotation coordinates correctly?
The coordinates do match, you merely have to make sure you apply the transformations of the current transformation matrix and the text matrix.

JPEG2000 : Can number of tiles in X direction be zero?

According to JPEG2000 specs, Number of tiles in X and Y directions is calculated by following formula:
numXtiles =  (Xsiz − XTOsiz)/ XTsiz
&
numYtiles =  (Ysiz − YTOsiz)/ YTsiz
But it is not mentioned about the range of numXtiles or numYtiles.
Can we have numXtiles=0 while numYtiles=250 (or any other value) ?
In short, no. You will always need at least one row and one column of tiles to place your image in the canvas.
In particular, the SIZ marker of the JPEG 2000 stream syntax does not directly define the number of tiles, but rather the size of each tile. Since the tile width and height are defined to be larger than 0 (see page 453 of "JPEG 2000 Image compression fundamentals, standards and practice", by David Taubman and Michael Marcellin), you will always have at least one tile.
That said, depending on the particular implementation that you are using, there may be a parameter numXtiles that you can set to 0 without crashing your program. In that case, the parameter is most likely being ignored or interpreted differently.

How to concatenate a datafile with four columns

My data file has four columns because opencv saves elements in an n x 4 form in xml when exporting a cv::Mat to xml.
How can I concatenate these four columns into one?
reset
set terminal postscript eps enhanced
set output 'imp.eps'
set pm3d
set palette model HSV defined ( 0 0 1 1, 1 1 1 1 )
set style data histogram
set colorbox user
set origin 0.0,0.10
set size ratio 0 1,0.8
set colorbox horizontal origin graph 0.0, -0.15, 0 size graph 1, 0.05, 0 noborder
set xrange [0:4096]
plot 'var_imp_first_run.data' using 1
set output
I have 4094 elements, and this example creates an histogram where the first 1024 is plotted. I need to append column 2 from x=1024:2048. Please ignore the colorbox stuff; I'm just playing around to learn gnuplot.
I found out that following solve my problem above.
set xrange [0:4096]
plot newhistogram at 0, 'var_imp_first_run.data' every:: 7 using 1,\
newhistogram at 1024,'var_imp_first_run.data' every:: 7 using 2,\
newhistogram at 2048,'var_imp_first_run.data' every:: 7 using 3,\
newhistogram at 3072,'var_imp_first_run.data' every:: 7 using 4;
Then I found out that it was not what I wanted, because the data are arranged as entries,
x1,x2,x3,x4
x5,x6,x7,x8
and so on.
So what I really need is a histogram that plots the rows before columns. Is it possible?
If you want it to read
x1
x2
x3
instead of
x1,x2,x3
Then (assuming you are on *nix) use
plot 'cat var_imp_first_run.data | tr "," "\n" |' using 1
This is calling the following command and reading the output via the pipe at the end:
cat var_imp_first_run.data | tr "," "\n"
Here, tr is just transcribing commas as newlines.

How to remove or edit Exif from mp4 video?

I recorded a Full HD video with Samsung Galaxy II, when I uploaded it to YouTube I found that it turned to 90 degrees like Portrait layout 1080x1920 NOT 1920x1080.
I found the cause of the problem:
YouTube is reading video metadata and rotate video acording Exif
orientation before encoding
This is ExifTool report (please see last tag "Rotation"):
ExifTool Version Number : 8.61
File Name : video.mp4
Directory : .
File Size : 217 MB
File Modification Date/Time : 2011:08:11 00:47:23+04:00
File Permissions : rw-rw-rw-
File Type : 3GP
MIME Type : video/3gpp
Major Brand : 3GPP Media (.3GP) Release 4
Minor Version : 0.3.0
Compatible Brands : 3gp4, 3gp6
Movie Data Size : 227471371
Movie Header Version : 0
Create Date : 1900:01:00 00:00:00
Modify Date : 1900:01:00 00:00:00
Time Scale : 1000
Duration : 0:01:46
Preferred Rate : 1
Preferred Volume : 100.00%
Preview Time : 0 s
Preview Duration : 0 s
Poster Time : 0 s
Selection Time : 0 s
Selection Duration : 0 s
Current Time : 0 s
Next Track ID : 3
Track Header Version : 0
Track Create Date : 1900:01:00 00:00:00
Track Modify Date : 1900:01:00 00:00:00
Track ID : 1
Track Duration : 0:01:46
Track Layer : 0
Track Volume : 0.00%
Image Width : 1920
Image Height : 1080
Graphics Mode : srcCopy
Op Color : 0 0 0
Compressor ID : avc1
Source Image Width : 1920
Source Image Height : 1080
X Resolution : 72
Y Resolution : 72
Bit Depth : 24
Video Frame Rate : 30.023
Matrix Structure : 1 0 0 0 1 0 0 0 1
Media Header Version : 0
Media Create Date : 1900:01:00 00:00:00
Media Modify Date : 1900:01:00 00:00:00
Media Time Scale : 16000
Media Duration : 0:01:46
Handler Type : Audio Track
Handler Description : SoundHandler
Balance : 0
Audio Format : mp4a
Audio Channels : 1
Audio Bits Per Sample : 16
Audio Sample Rate : 16000
Play Mode : SEQ_PLAY
Avg Bitrate : 17.1 Mbps
Image Size : 1920x1080
Rotation : 90
How do I remove whole Exif data or just edit Rotation property?
Mp4 files (and many others) use the MPEG-4 standard, which arranges the data inside it in little boxes called atoms. You can find a great description of atoms in this Page. In short, atoms are organized in a tree like structure, where an atom can be either the parent of other atoms or a container of data, but not both (although some people break this rule)
In particular the atom you are looking for is called "tkhd" (Track Header). You can find a list of atoms here.
Within this atom you will find metadata of the video. The structure of the "tkhd" atom is specified here
Finally the chunk of metadata you need (which is not an atom), is called "Matrix Structure". From developer.apple.com:
All values in the matrix are 32-bit fixed-point numbers divided as
16.16, except for the {u, v, w} column, which contains 32-bit fixed-point numbers divided as 2.30.
This is shown in the following image:
The 9 byte matrix starts in byte 48 of the "tkhd" atom. An example of a "matrix structure" for an orientation of 0° would be 1 0 0 0 1 0 0 0 1 (the identity matrix)
SO!
After all that, what you need is to modify this matrix. The next parragraph is taken from developer.apple.com:
A transformation matrix defines how to map points from one coordinate
space into another coordinate space. By modifying the contents of a
transformation matrix, you can perform several standard graphics
display operations, including translation, rotation, and scaling. The
matrix used to accomplish two-dimensional transformations is described
mathematically by a 3-by-3 matrix.
This means that the transformation matrix defines a function, that maps each coordinate into a new one.
Since you only need to rotate the image, simply modify the left most 2 x 3 matrix, which is defined by the bytes 0, 1, 3, 4, 6 and 7.
Here are the 2 x 3 matrices I use to represent each orientation (values 0, 1, 3, 4, 6 and 7 of the 3x3 matrix):
0°: (x', y') = (x, y)
1 0
0 1
0 0
90°: (x', y') = (height - y, x)
0 1
-1 0
height 0
180°: (x', y') = (widht - x, height - y)
-1 0
0 -1
width height
270°: (x', y') = (y, width - x)
0 -1
1 0
0 width
If you don't have them, the width and height can be obtained just after the matrix structure. They are also fixed point numbers of 4 bytes (16.16).
It is quite probable your video metadata contains the 90° Matrix
(Thanks to Phil Harvey, creator of Exiftool for his help and a wonderful software)
In my case changing the exif data did not solve the problem because it is, in fact, correct. The problem is that most players ignore it (i.e. they assume it is 0).
If you do want to play with the Rotation exif tag, you can control it via MediaRecorder.setOrientationHint(). That is much easier than modifying it after the fact. If the YouTube uploader respects the tag, then that's all you need.
But the only solution I have found is to rotate the video itself, or use UI hints to guide users to record the video in the camera's natural 0 orientation.
There's no built-in mechanism for rotating videos in Android.

Resources