I recorded a Full HD video with Samsung Galaxy II, when I uploaded it to YouTube I found that it turned to 90 degrees like Portrait layout 1080x1920 NOT 1920x1080.
I found the cause of the problem:
YouTube is reading video metadata and rotate video acording Exif
orientation before encoding
This is ExifTool report (please see last tag "Rotation"):
ExifTool Version Number : 8.61
File Name : video.mp4
Directory : .
File Size : 217 MB
File Modification Date/Time : 2011:08:11 00:47:23+04:00
File Permissions : rw-rw-rw-
File Type : 3GP
MIME Type : video/3gpp
Major Brand : 3GPP Media (.3GP) Release 4
Minor Version : 0.3.0
Compatible Brands : 3gp4, 3gp6
Movie Data Size : 227471371
Movie Header Version : 0
Create Date : 1900:01:00 00:00:00
Modify Date : 1900:01:00 00:00:00
Time Scale : 1000
Duration : 0:01:46
Preferred Rate : 1
Preferred Volume : 100.00%
Preview Time : 0 s
Preview Duration : 0 s
Poster Time : 0 s
Selection Time : 0 s
Selection Duration : 0 s
Current Time : 0 s
Next Track ID : 3
Track Header Version : 0
Track Create Date : 1900:01:00 00:00:00
Track Modify Date : 1900:01:00 00:00:00
Track ID : 1
Track Duration : 0:01:46
Track Layer : 0
Track Volume : 0.00%
Image Width : 1920
Image Height : 1080
Graphics Mode : srcCopy
Op Color : 0 0 0
Compressor ID : avc1
Source Image Width : 1920
Source Image Height : 1080
X Resolution : 72
Y Resolution : 72
Bit Depth : 24
Video Frame Rate : 30.023
Matrix Structure : 1 0 0 0 1 0 0 0 1
Media Header Version : 0
Media Create Date : 1900:01:00 00:00:00
Media Modify Date : 1900:01:00 00:00:00
Media Time Scale : 16000
Media Duration : 0:01:46
Handler Type : Audio Track
Handler Description : SoundHandler
Balance : 0
Audio Format : mp4a
Audio Channels : 1
Audio Bits Per Sample : 16
Audio Sample Rate : 16000
Play Mode : SEQ_PLAY
Avg Bitrate : 17.1 Mbps
Image Size : 1920x1080
Rotation : 90
How do I remove whole Exif data or just edit Rotation property?
Mp4 files (and many others) use the MPEG-4 standard, which arranges the data inside it in little boxes called atoms. You can find a great description of atoms in this Page. In short, atoms are organized in a tree like structure, where an atom can be either the parent of other atoms or a container of data, but not both (although some people break this rule)
In particular the atom you are looking for is called "tkhd" (Track Header). You can find a list of atoms here.
Within this atom you will find metadata of the video. The structure of the "tkhd" atom is specified here
Finally the chunk of metadata you need (which is not an atom), is called "Matrix Structure". From developer.apple.com:
All values in the matrix are 32-bit fixed-point numbers divided as
16.16, except for the {u, v, w} column, which contains 32-bit fixed-point numbers divided as 2.30.
This is shown in the following image:
The 9 byte matrix starts in byte 48 of the "tkhd" atom. An example of a "matrix structure" for an orientation of 0° would be 1 0 0 0 1 0 0 0 1 (the identity matrix)
SO!
After all that, what you need is to modify this matrix. The next parragraph is taken from developer.apple.com:
A transformation matrix defines how to map points from one coordinate
space into another coordinate space. By modifying the contents of a
transformation matrix, you can perform several standard graphics
display operations, including translation, rotation, and scaling. The
matrix used to accomplish two-dimensional transformations is described
mathematically by a 3-by-3 matrix.
This means that the transformation matrix defines a function, that maps each coordinate into a new one.
Since you only need to rotate the image, simply modify the left most 2 x 3 matrix, which is defined by the bytes 0, 1, 3, 4, 6 and 7.
Here are the 2 x 3 matrices I use to represent each orientation (values 0, 1, 3, 4, 6 and 7 of the 3x3 matrix):
0°: (x', y') = (x, y)
1 0
0 1
0 0
90°: (x', y') = (height - y, x)
0 1
-1 0
height 0
180°: (x', y') = (widht - x, height - y)
-1 0
0 -1
width height
270°: (x', y') = (y, width - x)
0 -1
1 0
0 width
If you don't have them, the width and height can be obtained just after the matrix structure. They are also fixed point numbers of 4 bytes (16.16).
It is quite probable your video metadata contains the 90° Matrix
(Thanks to Phil Harvey, creator of Exiftool for his help and a wonderful software)
In my case changing the exif data did not solve the problem because it is, in fact, correct. The problem is that most players ignore it (i.e. they assume it is 0).
If you do want to play with the Rotation exif tag, you can control it via MediaRecorder.setOrientationHint(). That is much easier than modifying it after the fact. If the YouTube uploader respects the tag, then that's all you need.
But the only solution I have found is to rotate the video itself, or use UI hints to guide users to record the video in the camera's natural 0 orientation.
There's no built-in mechanism for rotating videos in Android.
Related
I have the following issue:
I'm creating a uniform gray color video (for testing) using OpenCV VideoWriter. The output video will reproduce a constant image where all the pixels must have the same value x (25, 51, 76,... and so on).
When I generate the video using MJPG Encoder:
vw = cv2.VideoWriter('./videos/input/gray1.mp4',
cv2.VideoWriter_fourcc(*'MJPG'),
fps,(resolution[1],resolution[0]))
and read the output using the VideoCapture class, everything just works fine. I got a frame array with all pixel values set to (25,51,76 and so on).
However when I generate the video using HEV1 (H.265) or also H264:
vw = cv2.VideoWriter('./videos/input/gray1.mp4',
cv2.VideoWriter_fourcc(*'HEV1'),
fps,(resolution[1],resolution[0]))
I run into the following issue. The frame I got in BGR format follows the next configuration:
The blue channel value is the expected value (x) minus 4 (25-4=21, 51-4=47, 76-4=72, and so on).
The green channel is the expected value (x) minus 1 (25-1=24, 51-1=50, 76-1=75).
The red channel is the expected value (x) minus 3 (25-3=22, 51-3=48, 76-3=73).
Notice that the value is reduced with a constant value of 4,1,3, independently of the pixel value (so there is a constant effect).
What I could explain is a pixel value dependable feature, instead of a fixed one.
What is worse is that if I choose to generate a video with frames consisting in every color (pixel values [255 0 0],[0 255 0] and [0 0 255]) I get the corresponding outputs values ([251 0 0],[0 254 0] and [0 0 252])
I though that this relation was related to the grayscale Y value, where:
Y = 76/256 * RED + 150/256 * GREEN + 29/256 * BLUE
But this coefficients are not related with the output obtained. Maybe the problem is the reading with VideoCapture?
EDIT:
In case that I want to have the same output value for the pixels (Ej: [10,10,10] experimentally I have to create a img where the red and blue channel has the green channel value plus 2:
value = 10
img = np.zeros((resolution[0],resolution[1],3),dtype=np.uint8)+value
img[:,:,2]=img[:,:,2]+2
img[:,:,1]=img[:,:,1]+0
img[:,:,0]=img[:,:,0]+2
Anyone has experience this issue? It is related to the encoding process or just that OpenCV treats the image differently, prior encoding, depending on the fourcc parameter value?
So i have this sample pdf file with three words on separate lines:
"
hello
there
world
"
I have highlighted the word "there" on the second line. Internally, within the pdf, i'm trying to map the highlight/annotation structure to the text (BT) area.
The section corresponding to the word "there" looks like so:
BT
/F0 14.6599998 Tf
1 0 0 -1 0 130 Tm
96 0 Td <0057> Tj
4.0719757 0 Td <004B> Tj
8.1511078 0 Td <0048> Tj
8.1511078 0 Td <0055> Tj
4.8806458 0 Td <0048> Tj
ET
I also have an annotation section where I have my highlight which has the following rect dimensions:
18 0 19 15 20 694 21 786 22 853 23 1058 24 1331 [19 0 R 20 0 R]<</AP<</N 10 0 R>>
...
(I left the top part of the annotation out on purpose because it is long. I extracted what i thought were the most important parts.
Rect[68.0024 690.459 101.054 706.37]
I'm kind of confused about how my text is mapped to this one highlight that I have. The coordinates do not seem to match (130 y vs 690 y)? Am I looking in the right place and interpreting my text and/or highlight annotation coordinates correctly?
Update:
i want to add more info on how I created this test pdf.
Its pretty simple to recreate the pdf. I went to google docs and created an empty document. On three lines i wrote my text as described above. I downloaded that as a pdf and then opened it in adobe acrobat reader DC (the newest one i think). I then used adobe acrobat reader to highlight the specified line and re save it. After that I used some python to unzip the pdf sections.
The python code to decompress the pdf sections:
import re
import zlib
pdf = open("helloworld.pdf", "rb").read()
stream = re.compile(r'.*?FlateDecode.*?stream(.*?)endstream', re.S)
for s in stream.findall(pdf):
s = s.strip('\r\n')
try:
print(zlib.decompress(s))
print("")
except:
pass
Unfortunately the OP only explained how he created his document and did not share the document itself. I followed his instructions but the coordinates of the annotation differ. As I only have this document for explanation, though, the OP will have to mentally adapt the following to the precise numbers in his document.
The starting coordinate system
The starting (default) user coordinate system in the document is implied by the crop box. In the document at hand the crop box is defined as
/CropBox [0 0 596 843]
i.e. the visible page is 596 units wide and 843 units high (given the default user unit of 1/72" this is an A4 format) and the origin is in the lower left corner. x coordinates increase to the right, y coordinate increase upwards. Thus, a coordinate system as usually started with in math, too.
The annotation rectangle
This also is the coordinate system of the annotation rectangle coordinates.
In the case at hand they are
/Rect [68.0595 741.373 101.138 757.298]
i.e. the rectangle with the lower left corner at (68.0595, 741.373) and the upper right at (101.138, 757.298).
Transformations of the coordinate system
In the page content stream up to the text object already identified by the OP the coordinate system gets transformed a number of times.
Mirroring, translation
In the very first line of the page content
1 0 0 -1 0 843 cm
This transformation moves the origin up by 843 units and mirrors (multiplies by -1) the y coordinate.
Thus, now be have a coordinate system with the origin in the upper left and y coordinate increasing downwards.
Scaling
A bit later in the content stream the coordinate system is scaled
.75062972 0 0 .75062972 0 0 cm
Thus, the coordinate units are compressed to about 3/4 of their original width and height, i.e. each unit along the x or y is only 1/96" wide/high.
The text "there"
Only after these transformations have been applied to the coordinate system, the text object identified by the OP is drawn. It starts by setting and changing the text matrix:
1 0 0 -1 0 130 Tm
This sets the text matrix to translate by 130 units in y direction and mirroring y coordinates once again. (Mirroring back again is necessary as otherwise the text would be drawn upside down.)
96 0 Td
This changes the text matrix by moving 96 units along the x axis.
And the starting point where the text is drawn is at the origin of the coordinate system first changed by the mirroring and translation, and then by scaling of the current transformation matrix, and then by mirroring and translation according to the text matrix.
Does it match?
Which coordinate would this point be in the default user coordinate system?
x = (0 + 96) * .75062972 = 72 (approximately)
y = (((0 * (-1)) + 130) * .75062972) * (-1) + 843 = 745,4 (approximately)
This matches with the annotation rectangle (see above) with x coordinates between 68.0595 and 101.138 and y coordinates between 741.373 and 757.298.
So
I'm kind of confused about how my text is mapped to this one highlight that I have. The coordinates do not seem to match (130 y vs 690 y)? Am I looking in the right place and interpreting my text and/or highlight annotation coordinates correctly?
The coordinates do match, you merely have to make sure you apply the transformations of the current transformation matrix and the text matrix.
According to JPEG2000 specs, Number of tiles in X and Y directions is calculated by following formula:
numXtiles = (Xsiz − XTOsiz)/ XTsiz
&
numYtiles = (Ysiz − YTOsiz)/ YTsiz
But it is not mentioned about the range of numXtiles or numYtiles.
Can we have numXtiles=0 while numYtiles=250 (or any other value) ?
In short, no. You will always need at least one row and one column of tiles to place your image in the canvas.
In particular, the SIZ marker of the JPEG 2000 stream syntax does not directly define the number of tiles, but rather the size of each tile. Since the tile width and height are defined to be larger than 0 (see page 453 of "JPEG 2000 Image compression fundamentals, standards and practice", by David Taubman and Michael Marcellin), you will always have at least one tile.
That said, depending on the particular implementation that you are using, there may be a parameter numXtiles that you can set to 0 without crashing your program. In that case, the parameter is most likely being ignored or interpreted differently.
I have a sample pdf (attached), and it includes a text object and a rectangle object that have almost the same height. Then I checked the content of the pdf by using itextrup as below:
1 1 1 RG
1 1 1 rg
0.12 0 0 0.12 16 50 cm
q
0 0 m
2926 0 l
2926 5759 l
0 5759 l
0 0 l
W
n
Q
1 1 1 RG
1 1 1 rg
q
0 0 m
2926 0 l
2926 5759 l
0 5759 l
0 0 l
W
n
/F1 205.252 Tf
BT
0 0 0 RG
0 0 0 rg
/DeviceGray CS
/OC /oc1 BDC
0 -1 1 0 1648 5330 Tm
0 Tc
100 Tz
(Hello World) Tj
ET
Q
q
0 0 m
2926 0 l
2926 5759 l
0 5759 l
0 0 l
W
n
0 0 0 RG
0 0 0 rg
/DeviceGray CS
6 w
1 j
1 J
1649 5324 m
1649 4277 l
1800 4277 l
1800 5324 l
1649 5324 l
S
EMC
Q
Obviously the user space matrix is determined by [0.12 0 0 0.12 16 50], and the height for the rectangle is (1800-1649)*0.12*1=18.12, and for the font size I use 205.252*0.12=24.63024. Since the two values are not close, my problem is how to get the height/size of the font?
sample.pdf
OK - I took a look at your file and you're basically hosed. That's the scientific answer, now let me clarify :)
Bad PDF!
The PDF you have up there as a sample contains a font that is not embedded. That "/F1 Tf" command you have there points to the font "ArialMT" in the resources dict for that page. Because the font has not been embedded, you only have two options:
Try to find the actual font on the system and extract the necessary information from there.
Live with the information in the PDF. Let's start with that.
Font Descriptor
Here is an image from pdfToolbox examining the font in the PDF file (caution: I'm associated with this tool):
I've cut off some of the "Widths" table, but other than that this is all of the information you have in the PDF document for this font. And this means you can access the widths for each glyph, but you don't have access to the heights of each glyph. The only information you have regarding heights is the font bounding box which is the union of all glyph bounding boxes. In other words, the font bounding box is guaranteed to be big enough to contain any glyph from the font (both horizontally and vertically).
System Information
You don't say why you need this information so it becomes a little harder to advise further. But if you can't get the information from the PDF, you're only option is to live with the inaccurate information from the PDF or to turn to the system your code is running on to get you more.
If you have the ArialMT font installed, you could basically try to find the font file and then parse the TrueType font file to find the bounding boxes for each glyph. I've done that, it's not funny.
Or you can see if your system can't provide you with the information in a better way. Many operating systems / languages have text calls that can get accurate measurements for you. If not, you can brute force it by rendering the text you want in black on a white image and then examining the pixels to see where you hit and thus how big the largest glyph in your text string was.
Wasteful though that last option sounds, it's probably the quickest and easiest to implement and it - depending on your needs - may actually be the best option all around.
I have a sample pdf (attached), and it includes a text object and a rectangle object that have almost the same height.
Indeed, your PDF is displayed like this:
But looking at this one quickly realizes that the glyphs in your text "Hello World" do not extend beneath the base line like a 'g', 'j' or some other glyphs would:
(The base line is the line through the glyph origins)
Since the two values are not close, my problem is how to get the height/size of the font
Obviously the space required for such descenders beneath the base line must also be part of the font size.
Thus, it is completely correct and not a problem that the height of the box (18.12) is considerably smaller than the font size (24.63024).
BTW, this corresponds with the specification which describes a font size of 1 to be arranged so that the nominal height of tightly spaced lines of text is 1 unit, cf. section 9.2.2 "Basics of Showing Text" of ISO 32000-1. Tightly spaced lines obviously need to include not only glyph parts above the base line but also those below. Additionally it furthermore includes a small gap between such lines as even tightly spaced lines are not expected to touch each other.
Would anyone know how to specify a ColorMatrix (Specifically a System::Drawing::Imaging::ColorMatrix in C++/CLI) to set an alpha threshold? For example if I were to use 10 (10/255) as my threshold then any pixel with an RGBA Alpha of 10 or less would have 0.0f Alpha and every pixel above would get 1.0f.
I'm trying to implement ColorID picking in a 2D scene editor as I'm sick of using my current unwieldy method of reversing my drawing transformations to determine which pixel of a given bitmap the mouse is pointing at. So what I want to do instead is do a ColorID rendering pass like in OpenGL as described here http://content.gpwiki.org/index.php/OpenGL_Selection_Using_Unique_Color_IDs However I can't just compare the locations of bitmaps onscreen as most of them include tons of white-space that I don't want being picked up by mouse picking which leaves me with color-picking.
For now my ColorMatrix looks like
0 0 0 0 0
0 0 0 0 0
0 0 0 0 0
0 0 0 1 0
R G B 0 1
following Hans' answer to GDI+: Set all pixels to given color while retaining existing alpha value but I'd like it to also apply a threshold to the Alpha component (provided that's even possible using a ColorMatrix)
Maybe you're looking for the ImageAttributes.SetThreshold method.