How to view the content of a file in raw format in hex editor? and how to find the header offset and tailer offset of a document in raw format in hex editor?
1. How to view the content of a file in raw format in hex editor?
On Linux / Mac you can use xxd, which also has a lot of formatting options of the output, but a simple example:
xxd file.pdf | less
00000000: 2550 4446 2d31 2e37 0d25 e2e3 cfd3 0d0a %PDF-1.7.%......
00000010: 3131 3837 3420 3020 6f62 6a0d 3c3c 2f4c 11874 0 obj.<</L
00000020: 696e 6561 7269 7a65 6420 312f 4c20 3330 inearized 1/L 30
00000030: 3934 3237 392f 4f20 3131 3837 372f 4520 94279/O 11877/E
00000040: 3133 3334 3538 2f4e 2037 362f 5420 3238 133458/N 76/T 28
00000050: 3536 3638 312f 4820 5b20 3136 3733 2034 56681/H [ 1673 4
00000060: 3331 315d 3e3e 0d65 6e64 6f62 6a0d 2020 311]>>.endobj.
...
...
002f36c0: 4134 3534 3437 3444 4434 3337 3e3c 3036 A454474DD437><06
002f36d0: 3839 3542 4133 4234 4341 3434 3044 4232 895BA3B4CA440DB2
002f36e0: 3435 3937 3645 3545 3331 3231 3738 3e5d 45976E5E312178>]
002f36f0: 3e3e 0d73 7461 7274 7872 6566 0d31 3136 >>.startxref.116
002f3700: 0d25 2545 4f46 0d .%%EOF.
You can also open any file using popular hex editor HxD on Windows ( screenshot from https://mh-nexus.de/en/graphics/HxDShotLarge.png )
2. how to find the header offset and tailer offset
Let's take a look at file signatures and magic bytes. As you can see, the lenght of them can differ:
1F 9D .. 0 z tar.z compressed file (often tar zip) using Lempel-Ziv-Welch algorithm
25 50 44 46 2d %PDF- 0 pdf PDF document[16]
ed ab ee db í«îÛ 0 rpm RedHat Package Manager (RPM) package [3]
If you don't want to manually inspect based on the previous list, but rather programatically identify file signatures, there are some libraries for different languages, such as pyfsig, and they maintain a list of current file signatures under current list that they can deal with.
I'm trying to import a table where the commas are the 1000 separator,
example: 32,100 is 32100 but it is treating it as 32.1 instead.
This is a similar table (first one / top left):
https://en.wikipedia.org/wiki/Demographics_of_the_world
imgur for screenshots:
https://imgur.com/a/hJR9tox
I want it to say:
Year million
1500 458
1600 580
1700 682
1750 791
1800 978
1850 1262
1900 1650
1950 2521
1999 5978
2008 6707
2011 7000
2015 7350
2018 7600
2020 7750
But it comes out as:
Year million
1500 458
1600 580
1700 682
1750 791
1800 978
1850 1,262
1900 1,65
1950 2,521
1999 5,978
2008 6,707
2011 7
2015 7,35
2018 7,6
2020 7,75
This is the function I'm using:
=IMPORTHTML("https://en.wikipedia.org/wiki/Demographics_of_the_world"; "table"; 1)
I have also tried using this function:
=IMPORTXML("https://en.wikipedia.org/wiki/Demographics_of_the_world"; "//*[#id='mw-content-text']/div/table[1]/tbody")
But that shows as this witch is extremely hard to understand since it looks like this and still removes the zeros:
World Population[1][2] Yearmillion 1500458 1600580 1700682 1750791 1800978 18501,262 19001,65 19502,521 19995,978 20086,707 20117 20157,35 20187,6 20207,75
Other things i have tried is:
forsing it to always print out three decimals, that wont work since it adds more numbers to the end of all numbers.
The main & easiest possible solution that you have is to change your Spreadsheet's locale setting to one that uses the , as mile separator.
As an alternative, if changing this setting is really not a possibility, you could create a script that uses URLFetchApp to retrieve the page's contents and parses the values, taking into considerations the usage of , as mile separator.
I know the opencv got a BGR order, but in my experiment, not only the order but also the values are totally messed
import cv2 as cv
import tifffile as tiff
import skimage.io
img_path = r"C:\test\pics\t100r50s16_1_19.tif"
c = cv.imread(img_path,cv.IMREAD_UNCHANGED)
t = tiff.imread(img_path)
s = skimage.io.imread(img_path)
print("c:", c.shape, "t:", t.shape, "s:", s.shape)
print("c:", c.dtype, "t:", t.dtype, "s:", s.dtype)
print(c[0, 0], c[1023, 0], c[0, 1023], c[1023, 1023])
print(t[0, 0], t[1023, 0], t[0, 1023], t[1023, 1023])
print(s[0, 0], s[1023, 0], s[0, 1023], s[1023, 1023])
print(c.sum())
print(t.sum())
print(s.sum())
And the outputs like this:
c: (1024, 1024, 4) t: (1024, 1024, 4) s: (1024, 1024, 4)
c: uint8 t: uint8 s: uint8
[ 50 63 56 182] [131 137 140 193] [29 28 27 94] [123 130 134 190]
[ 79 88 70 182] [185 181 173 193] [74 77 80 94] [180 174 165 190]
[ 79 88 70 182] [185 181 173 193] [74 77 80 94] [180 174 165 190]
# Here seems that opencv only read the alpha channel right,
# the values of first three channels are much different than other package
539623146
659997127
659997127
The image i use can be download here. So, here is my question, how open cv handle 4 channel tiff file? Because when i test on 3-channel image, everything looks alright.
I don't buy it for a minute that there is a rounding error or some error related to JPEG decoding like the linked article suggests.
Firstly because your image is integer, specifically uint8 so there is no rounding of floats, and secondly because the compression of your TIF image is not JPEG - in fact there is no compression. You can see that for yourself if you use ImageMagick and do:
identify -verbose a.tif
or if you use tiffinfo that ships with libtiff, like this:
tiffinfo -v a.tif
So, I did some experiments by generating sample images with ImageMagick like this:
# Make 8x8 pixel TIF full of RGBA(64,128,192) with full opacity
convert -depth 8 -size 8x8 xc:"rgba(64,128,192,1)" a.tif
# Make 8x8 pixel TIFF with 4 rows per strip
convert -depth 8 -define tiff:rows-per-strip=4 -size 8x8 xc:"rgba(64,128,192,1)" a.tif
And OpenCV was able to read all those correctly, however, when I did the following it went wrong.
# Make 8x8 pixel TIFF with RGB(64,128,192) with 50% opacity
convert -depth 8 -define tiff:rows-per-strip=1 -size 8x8 xc:"rgba(64,128,192,0.5)" a.tif
And the values came out in OpenCV as 32, 64, 96 - yes, exactly HALF the correct values - like OpenCV is pre-multiplying the alpha. So I tried with an opacity of 25% and the values came out at 1/4 of the correct ones. So, I suspect there is a bug in OpenCV that premultiplies the alpha.
If you look at your values, you will see that tifffile and skimage read the first pixel as:
[ 79 88 70 182 ]
if you look at the alpha of that pixel, it is 0.713725 (182/255), and if you multiply each of those values by that, you will get:
[ 50 63 56 182 ]
which is exactly what OpenCV did.
As a workaround, I guess you could divide by the alpha to scale correctly.
In case the argument is that OpenCV intentionally pre-multiplies the alpha, then that begs the question why it does that for TIFF files but NOT for PNG files:
# Create 8x8 PNG image full of rgb(64,128,192) with alpha=0.5
convert -depth 8 size 8x8 xc:"rgba(64,128,192,0.5)" a.png
Check with OpenCV:
import cv2
c = cv2.imread('a.png',cv2.IMREAD_UNCHANGED)
In [4]: c.shape
Out[4]: (8, 8, 4)
In [5]: c
Out[5]:
array([[[192, 128, 64, 128],
[192, 128, 64, 128],
...
...
In case anyone thinks that the values in the TIF file are as OpenCV reports them, I can only say that I wrote rgb(64,128,192) at 50% opacity and I tested each of the following and found that they all agree, with the sole exception of OpenCV that that is exactly what the file contains:
ImageMagick v7
libvips v8
Adobe Photoshop CC 2017
PIL/Pillow v5.2.0
GIMP v2.8
scikit-image v0.14
Is there a software/tool that can generate me a matrix of RGB values from a simple raw 8-bit RGB image?
Also, is there a software/tool that can generate an image from a given matrix of RGB values?
Thank you.
PS:
i) I am aware that this can be done using Matlab. I am looking for a tool that can do it that is not Matlab.
ii) I am aware of existing question about doing similar stuff programmatically. I need a software tool, if there is any, that can do this task.
I would suggest you use the venerable NetPBM which is available for Linux, macOS and Windows. Alternatively, you could use ImageMagick but that is much heavier weight, see later.
NetPBM Method - see Wikipedia NetPBM entry
So, let's start with a raw, 8-bit RGB file that contains a red, a green and a blue pixel:
-rw-r--r-- 1 mark staff 9 10 Oct 07:47 rgb888.bin
As you can see, it has 9 bytes. Let's look at them:
xxd -g3 rgb888.bin
00000000: ff0000 00ff00 0000ff
Now, if we want that image as a matrix of legible values:
rawtoppm -plain 3 1 rgb888.bin
Sample Output
P3
3 1
255
255 0 0 0 255 0 0 0 255
where:
-plain means to display in ASCII rather than binary
P3 tells us it is colour and ASCII
3 1 tells us its dimension are 3 pixels wide by 1 pixel high
255 essentially tells us it is 8-bit (65536 would mean 16-bit)
the last row is the pixels
Converting back to binary is a little harder, let's assume we start with a PPM file created like this:
rawtoppm -plain 3 1 rgb888.bin > image.ppm
So, we can get the binary version like this:
ppmtoppm < image.ppm | tail -c 9 > rgb888.bin
and look at it with:
xxd -g3 rgb888.bin
00000000: ff00 0000 ff00 0000 ff
ImageMagick Method
# Convert binary RGB888 to text
convert -depth 8 -size 3x1 RGB:rgb888.bin txt:
Sample Output
# ImageMagick pixel enumeration: 3,1,65535,srgb
0,0: (65535,0,0) #FF0000 red
1,0: (0,65535,0) #00FF00 lime
2,0: (0,0,65535) #0000FF blue
Or, slightly different appearance:
# Convert binary RGB888 to matrix
convert -depth 8 -size 3x1 RGB:rgb888.bin -compress none ppm:
Sample Output
P3
3 1
255
255 0 0 0 255 0 0 0 255
And now going the other way, PPM to binary
# Convert PPM image to binary
convert image.ppm rgb:image.bin
# Check how the binary looks
xxd -g 3 image.bin
00000000: ff0000 00ff00 0000ff .........
Plain dump method
Maybe you are happy with a plain dump from od:
od -An -t u1 rgb888.bin
Sample Output
255 0 0 0 255 0 0 0 255
I was wondering whether it'd be possible to draw multiple y axis plotlines on specific dates using highcharts.
I have 3 lines with different colors, one from 16 Apr to 30 Apr and the second one from 30 Apr to 28 May, the third from 28 May to 25 Jun
something similar to this :
Thanks for your help.
H