How to save dpi info in py-opencv? - opencv

import cv2
def clear(img):
back = cv2.imread("back.png", cv2.IMREAD_GRAYSCALE)
img = cv2.bitwise_xor(img, back)
ret, img = cv2.threshold(img, 120, 255, cv2.THRESH_BINARY_INV)
return img
def threshold(img):
ret, img = cv2.threshold(img, 120, 255, cv2.THRESH_BINARY_INV)
img = cv2.cvtColor(img, cv2.COLOR_RGB2GRAY)
ret, img = cv2.threshold(img, 248, 255, cv2.THRESH_BINARY)
return img
def fomatImage(img):
img = threshold(img)
img = clear(img)
return img
img = fomatImage(cv2.imread("1566135246468.png",cv2.IMREAD_COLOR))
cv2.imwrite("aa.png",img)
This is my code. But when I tried to identify it with tesseract-ocr, I got a warning.
Warning: Invalid resolution 0 dpi. Using 70 instead.
How should I set up dpi?

AFAIK, OpenCV doesn't set the dpi of PNG files it writes, so you are looking at work-arounds. Here are some ideas...
Method 1 - Use PIL/Pillow instead of OpenCV
PIL/Pillow can write dpi information into PNG files. So you would:
Step 1 - Convert your BGR OpenCV image into RGB to match PIL's channel ordering
from PIL import Image
RGBimage = cv2.cvtColor(BGRimage, cv2.COLOR_BGR2RGB)
Step 2 - Convert OpenCV Numpy array onto PIL Image
PILimage = Image.fromarray(RGBimage)
Step 3 - Write with PIL
PILimage.save('result.png', dpi=(72,72))
As Fred mentions in the comments, you could equally use Python Wand in much the same way.
Method 2 - Write with OpenCV but modify afterwards with some tool
You could use Python's subprocess module to shell out to, say, ImageMagick and set the dpi like this:
magick OpenCVImage.png -set units pixelspercentimeter -density 28.3 result.png
All you need to know is that PNG uses metric (dots per centimetre) rather than imperial (dots per inch) and there are 2.54cm in an inch, so 72 dpi becomes 28.3 dots per cm.
If your ImageMagick version is older than v7, replace magick with convert.
Method 3 - Write with OpenCV and insert dpi yourself
You could write your file to memory using OpenCV's imencode(). Then search in the file for the IDAT (image data) chunk - which is the one containing the image pixels and insert a pHYs chunk before that which sets the density. Then write to disk.
It's not that hard actually - it's just 9 bytes, see here and also look at pngcheck output at end of answer.
This code is not production tested but seems to work pretty well for me:
#!/usr/bin/env python3
import struct
import numpy as np
import cv2
import zlib
def writePNGwithdpi(im, filename, dpi=(72,72)):
"""Save the image as PNG with embedded dpi"""
# Encode as PNG into memory
retval, buffer = cv2.imencode(".png", im)
s = buffer.tostring()
# Find start of IDAT chunk
IDAToffset = s.find(b'IDAT') - 4
# Create our lovely new pHYs chunk - https://www.w3.org/TR/2003/REC-PNG-20031110/#11pHYs
pHYs = b'pHYs' + struct.pack('!IIc',int(dpi[0]/0.0254),int(dpi[1]/0.0254),b"\x01" )
pHYs = struct.pack('!I',9) + pHYs + struct.pack('!I',zlib.crc32(pHYs))
# Open output filename and write...
# ... stuff preceding IDAT as created by OpenCV
# ... new pHYs as created by us above
# ... IDAT onwards as created by OpenCV
with open(filename, "wb") as out:
out.write(buffer[0:IDAToffset])
out.write(pHYs)
out.write(buffer[IDAToffset:])
################################################################################
# main
################################################################################
# Load sample image
im = cv2.imread('lena.png')
# Save at specific dpi
writePNGwithdpi(im, "result.png", (32,300))
Whichever method you use, you can use pngcheck --v image.png to check what you have done:
pngcheck -vv a.png
Sample Output
File: a.png (306 bytes)
chunk IHDR at offset 0x0000c, length 13
100 x 100 image, 1-bit palette, non-interlaced
chunk gAMA at offset 0x00025, length 4: 0.45455
chunk cHRM at offset 0x00035, length 32
White x = 0.3127 y = 0.329, Red x = 0.64 y = 0.33
Green x = 0.3 y = 0.6, Blue x = 0.15 y = 0.06
chunk PLTE at offset 0x00061, length 6: 2 palette entries
chunk bKGD at offset 0x00073, length 1
index = 1
chunk pHYs at offset 0x00080, length 9: 255x255 pixels/unit (1:1). <-- THIS SETS THE DENSITY
chunk tIME at offset 0x00095, length 7: 19 Aug 2019 10:15:00 UTC
chunk IDAT at offset 0x000a8, length 20
zlib: deflated, 2K window, maximum compression
row filters (0 none, 1 sub, 2 up, 3 avg, 4 paeth):
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
(100 out of 100)
chunk tEXt at offset 0x000c8, length 37, keyword: date:create
chunk tEXt at offset 0x000f9, length 37, keyword: date:modify
chunk IEND at offset 0x0012a, length 0
No errors detected in a.png (11 chunks, 76.5% compression).
While I am editing PNG chunks, I also managed to set a tIME chunk and a tEXt chunk with the Author. They go like this:
# Create a new tIME chunk - https://www.w3.org/TR/2003/REC-PNG-20031110/#11tIME
year, month, day, hour, min, sec = 2020, 12, 25, 12, 0, 0 # Midday Christmas day 2020
tIME = b'tIME' + struct.pack('!HBBBBB',year,month,day,hour,min,sec)
tIME = struct.pack('!I',7) + tIME + struct.pack('!I',zlib.crc32(tIME))
# Create a new tEXt chunk - https://www.w3.org/TR/2003/REC-PNG-20031110/#11tEXt
Author = "Author\x00Sir Mark The Great"
tEXt = b'tEXt' + bytes(Author.encode('ascii'))
tEXt = struct.pack('!I',len(Author)) + tEXt + struct.pack('!I',zlib.crc32(tEXt))
# Open output filename and write...
# ... stuff preceding IDAT as created by OpenCV
# ... new pHYs as created by us above
# ... new tIME as created by us above
# ... new tEXt as created by us above
# ... IDAT onwards as created by OpenCV
with open(filename, "wb") as out:
out.write(buffer[0:IDAToffset])
out.write(pHYs)
out.write(tIME)
out.write(tEXt)
out.write(buffer[IDAToffset:])
Keywords: OpenCV, PIL, Pillow, dpi, density, imwrite, PNG, chunks, pHYs chunk, Python, image, image-processing, tEXt chunk, tIME chunk, author, comment

Related

Wrong annotation: x = 0, y = 0, < 0 or > 1 with YOLO

Im getting this error when i train my dataset using yolo, i understand this mean that the labels are not in bounds of the image but ie source by data from open images. I get this error for each and every file.
Wrong annotation: x = 0, y = 0, < 0 or > 1, file: data/obj/3149c61b7407b0c5.txt
When converting the labels, the class was initially 101 for all, this gave me an error so I converted all of them to 0 using this
import glob
for filepath in glob.iglob('./**/*.txt', recursive=True):
with open(filepath) as file:
s = file.read()
s = s.replace('101 ', '0 ')
with open(filepath, "w") as file:
file.write(s)
Heres the content of one file "3149c61b7407b0c5.txt" which also gave this error.
0 1085.6260855566406 1254.4786898033353 2.9297560546874593 102.1827418215693
0 1086.0899527441406 607.6991307711304 3.857441406250018 1176.3009893473438
0 1137.8566324316407 1307.114119795808 91.84562460937491 5.3656234452045295
0 1186.680802158203 607.1194858238297 3.8867578125000364 1175.290322583892
0 1187.6573546972656 1361.4528185098725 3.8672361230469505 101.16129032792523
0 1234.0440929785157 1195.3281309872493 92.75390625 3.2795265516661787
0 1234.0284875097657 604.9840441022702 90.90814453124995 1177.3440653795917
0 1235.4991806640626 1360.3872916281202 91.875 101.11822602580656
0 1236.9639953222656 17.328152487227214 92.75390625 4.290258067785163
0 1283.3262900390625 7.146366677539043 3.8965437597655637 14.01077505600043
0 1431.255040830078 1251.7474424893667 94.71674785156256 101.12901075597932
0 1430.7726089941407 1630.7969258238081 91.77738281250004 5.3978934344627785
0 1477.198360751953 1626.5442385087865 4.86328125 5.365623446172403
0 1528.974786533203 1789.3657435657651 90.85930644531254 103.20425816668744
0 1579.2491416015625 1898.0120122775286 3.828125 99.01077386239739
0 1625.6456455078126 1519.5280764679483 92.73437579101551 4.344172047279395
0 1627.1007041015625 1467.355033670012 91.875 100.0859898946889
0 1674.4591220703126 1358.7162488302274 4.853545517578141 102.13981699359162
0 1772.593927158203 1950.1948291506785 3.876963085937632 5.35483871502201
0 1871.709142001953 2003.349592810873 4.863222255859455 101.10751591723597
0 1871.2404211035157 1843.6894421635711 3.9062204980468778 5.33334387530102
Yolo labels are done as a proportion of the image width/height. This is so that the label stays true even if the image is resized. The label format is:
<class id> <x> <y> <width> <height>
So for example, a label that says
0 150 50 450 100
On a 600x200 image would be
0 .25 .25 .75 .5

Number of Channels-Matlab vs Opencv

I am using an image, the details of which I got using imfinfo in matlab are as follows:
Filename: 'dog.jpg'
FileModDate: '25-Mar-2011 15:54:00'
FileSize: 8491
Format: 'jpg'
FormatVersion: ''
Width: 194
Height: 206
BitDepth: 24
ColorType: 'truecolor'
FormatSignature: ''
NumberOfSamples: 3
CodingMethod: 'Huffman'
CodingProcess: 'Sequential'
Comment: {}
NewSubFileType: 0
BitsPerSample: [8 8 8]
PhotometricInterpretation: 'RGB'
ImageDescription: [1x13 char]
StripOffsets: 154
SamplesPerPixel: 3
RowsPerStrip: 206
StripByteCounts: 119892
It shows number of channels =3(NumberOfSamples: 3) but when I find the number of channels in opencv using the following code, I get No. of channels = 1
Mat img = imread("dog.jpg", 0);
printf("No. of Channels = %d\n", img.channels());
Why so?? Please explain.
As #berak commented, by using 0 as the second parameter of imread(), you are loading it as a grayscale image. Try to load it by passing it a negative value <0 in order to return the loaded image as is (with alpha channel) or a positive value >0 to return a 3-channel color image.
Like:
Mat img = imread("dog.jpg", -1); // <0 Return the loaded image as is
^^

Lua: detect rising/falling edges in a bitfield

I'm calling a function that returns an integer which represents a bitfield of 16 binary inputs each of the colors can either be on or off.
I'm trying to create a function to get the changes between the oldstate and the new state,
e.g.
function getChanges(oldColors,newColors)
sampleOutput = {white = "",orange="added",magenta="removed" .....}
return sampleOutput
end
I've tried subtracting the oldColors from the newColors and the new Colors from the oldColors but this seems to result in chaos should more then 1 value change.
this is to detect rising / falling edges from multiple inputs.
**Edit: there appears to be a subset of the lua bit api available
from:ComputerCraft wiki
colors.white 1 0x1 0000000000000001
colors.orange 2 0x2 0000000000000010
colors.magenta 4 0x4 0000000000000100
colors.lightBlue 8 0x8 0000000000001000
colors.yellow 16 0x10 0000000000010000
colors.lime 32 0x20 0000000000100000
colors.pink 64 0x40 0000000001000000
colors.gray 128 0x80 0000000010000000
colors.lightGray 256 0x100 0000000100000000
colors.cyan 512 0x200 0000001000000000
colors.purple 1024 0x400 0000010000000000
colors.blue 2048 0x800 0000100000000000
colors.brown 4096 0x1000 0001000000000000
colors.green 8192 0x2000 0010000000000000
colors.red 16384 0x4000 0100000000000000
colors.black 32768 0x8000 1000000000000000
(there was supposed to be a table of values here, but I can't work out the syntax for markdown, it would appear stackoverflow ignores the html part of the standard.)
function getChanges(oldColors,newColors)
local added = bit.band(newColors, bit.bnot(oldColors))
local removed = bit.band(oldColors, bit.bnot(newColors))
local color_names = {
white = 1,
orange = 2,
magenta = 4,
lightBlue = 8,
yellow = 16,
lime = 32,
pink = 64,
gray = 128,
lightGray = 256,
cyan = 512,
purple = 1024,
blue = 2048,
brown = 4096,
green = 8192,
red = 16384,
black = 32768
}
local diff = {}
for cn, mask in pairs(color_names) do
diff[cn] = bit.band(added, mask) ~= 0 and 'added'
or bit.band(removed, mask) ~= 0 and 'removed' or ''
end
return diff
end

Masking in DCT Compression

I am trying to do image compression using DCT (Discrete Cosine Transform). Can someone please help me understand how masking affects bit per pixel in DCT? How is the bit allocation done in the masking?
PS: By masking, I mean multiplying the DCT coefficients with a matrix like the one below (element wise multiplication, not matrix multiplication).
mask = [1 1 1 1 0 0 0 0
1 1 1 0 0 0 0 0
1 1 0 0 0 0 0 0
1 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0]
Background on "Masking"
Compression using the DCT calculates the DCT of blocks of an image, in this case 8x8 pixels. High frequency components of an image are less important for the human perception of an image and thus can be discarded in order to save space.
The mask Matrix selects which DCT coefficients are kept and which ones get discarded in order to save space. Coefficients towards the top left corner represent low frequencies.
For more information visit Discrete Cosine Transform.
This looks like a variation of quantization matrix.
Low frequencies are in top left, high frequencies are in bottom right. Eye is more sensitive to low frequencies, so removal of high frequency coefficients will remove less important details of the image.

OpenCV:Difference between a matrix with 1 column of 8UC3 type and 3 columns of 8UC1

Lets say I create a matrix M1 of 5 rows and 1 column of 8UC3 type to store RGB components of an image.Then I create another matrix M2 of 5 rows and 3 columns of 8UC1 type to again store the RGB components of the image.
Is there a difference in the way these 2 types of matrices are stored in/accessed from the memory? From what I understand from http://www.cs.iit.edu/~agam/cs512/lect-notes/opencv-intro/opencv-intro.html#SECTION00053000000000000000 (commonly recommended OpenCV tutorial on Stackoverflow), the data pointer of the matrix points to the first index of the data array(the matrix is internally stored as an array) and the various RGB components are stored in an interwoven fashion(in case of 8UC3).
My logic says that they should be the same as in case of 1 column 8UC3(M1), for each column RGB components are stored, and in the case of 3 columns 8UC1(M2), each column stores the RGB component.
I hope I have been able to formulate my question well.
Thanks in advance!
Your understanding is correct. The memory layout will be exactly the same. So you can cheaply convert the representation back-and-forth via reshape method.
The thing that would be different is how OpenCV algorithms will handle those matrices.
Let's say the memory footprint is as follow:
255 0 0
255 0 0
255 0 0
255 0 0
255 0 0
And you want to call the resize function to add 3 columns. Then in the case of a 5x1 Mat of CV_8UC3, the result will be
255 0 0 255 0 0
255 0 0 255 0 0
255 0 0 255 0 0
255 0 0 255 0 0
255 0 0 255 0 0
And in case of a 5x3 Mat of CV_8UC1, the result will be
255 255 0 0 0 0
255 255 0 0 0 0
255 255 0 0 0 0
255 255 0 0 0 0
255 255 0 0 0 0

Resources