I am using an image, the details of which I got using imfinfo in matlab are as follows:
Filename: 'dog.jpg'
FileModDate: '25-Mar-2011 15:54:00'
FileSize: 8491
Format: 'jpg'
FormatVersion: ''
Width: 194
Height: 206
BitDepth: 24
ColorType: 'truecolor'
FormatSignature: ''
NumberOfSamples: 3
CodingMethod: 'Huffman'
CodingProcess: 'Sequential'
Comment: {}
NewSubFileType: 0
BitsPerSample: [8 8 8]
PhotometricInterpretation: 'RGB'
ImageDescription: [1x13 char]
StripOffsets: 154
SamplesPerPixel: 3
RowsPerStrip: 206
StripByteCounts: 119892
It shows number of channels =3(NumberOfSamples: 3) but when I find the number of channels in opencv using the following code, I get No. of channels = 1
Mat img = imread("dog.jpg", 0);
printf("No. of Channels = %d\n", img.channels());
Why so?? Please explain.
As #berak commented, by using 0 as the second parameter of imread(), you are loading it as a grayscale image. Try to load it by passing it a negative value <0 in order to return the loaded image as is (with alpha channel) or a positive value >0 to return a 3-channel color image.
Like:
Mat img = imread("dog.jpg", -1); // <0 Return the loaded image as is
^^
Related
I'm trying to use a custom preprocessing function that uses OpenCV but there's a mismatch between the image loaded by the DataGenerator and the CV2 default type.
Is it possible to specify which function to use to load images?
Here is my code.
def read_and_process_image(im,im_size):
#read image from file
#im=cv2.imread(im)
gray = cv2.cvtColor(im,cv2.COLOR_RGB2GRAY) # convert 2 grayscale
im_pil = Image.fromarray(gray)
_,thresh = cv2.threshold(gray,10,255,cv2.THRESH_BINARY) # turn it into a binary image
contours,hierarchy = cv2.findContours(thresh,cv2.RETR_EXTERNAL,cv2.CHAIN_APPROX_SIMPLE) # find contours
if len(contours) != 0:
print("contour")
#find the biggest area
cnt = max(contours, key = cv2.contourArea)
#find the bounding rect
x,y,w,h = cv2.boundingRect(cnt)
r=int(w*0.12)
crop = im[y+r:y+h-r,x+r:x+w-r]# crop image
crop=cv2.flip(crop,40)
#crop1=cv2.resize(crop,(im_size,im_size))
# resize to im_size X im_size size
#crop1 = cv2.convertScaleAbs(crop, alpha=1, beta=0.0001)
crop1=normalize_histograms(crop)
#clahe = cv2.createCLAHE(clipLimit=2.0, tileGridSize=(8,8))
#crop1 = clahe.apply(crop1)
return crop1
else:
return( normalize_histograms(cv2.resize(im,(im_size,im_size))) )
the preprocessing function to call:
IM_SIZE=256
def preprocessing_image(image):
global IM_SIZE
image=read_and_process_image(image,IM_SIZE)
return image
and the DataGenerator:
train_datagen = keras.preprocessing.image.ImageDataGenerator(rescale=1./255,
featurewise_center=True,
featurewise_std_normalization=True,
preprocessing_function=preprocessing_image)
val_gen = train_datagen.flow_from_dataframe(dataframe=val_data,
directory="D:/PROJECTS/MLPC2019/dataset/train/train",
x_col="filename",
y_col="label",
class_mode="categorical",
shuffle=False,
target_size=(IMAGE_SIZE,IMAGE_SIZE),
batch_size=BATCH_SIZE)
plt.imshow(val_gen[0])
I get the following error:
---------------------------------------------------------------------------
error Traceback (most recent call last)
<ipython-input-130-c8fee3202272> in <module>
----> 1 plt.imshow(val_gen[0])
~\Anaconda3\lib\site-packages\keras_preprocessing\image\iterator.py in __getitem__(self, idx)
63 index_array = self.index_array[self.batch_size * idx:
64 self.batch_size * (idx + 1)]
---> 65 return self._get_batches_of_transformed_samples(index_array)
66
67 def __len__(self):
~\Anaconda3\lib\site-packages\keras_preprocessing\image\iterator.py in _get_batches_of_transformed_samples(self, index_array)
237 params = self.image_data_generator.get_random_transform(x.shape)
238 x = self.image_data_generator.apply_transform(x, params)
--> 239 x = self.image_data_generator.standardize(x)
240 batch_x[i] = x
241 # optionally save augmented images to disk for debugging purposes
~\Anaconda3\lib\site-packages\keras_preprocessing\image\image_data_generator.py in standardize(self, x)
702 """
703 if self.preprocessing_function:
--> 704 x = self.preprocessing_function(x)
705 if self.rescale:
706 x *= self.rescale
<ipython-input-101-3a910a8620ec> in preprocessing_image(image)
15 """
16 # TODO: augment more here
---> 17 image=read_and_process_image(image,IM_SIZE)
18 return image
<ipython-input-128-aa711687f072> in read_and_process_image(im, im_size)
8 im_pil = Image.fromarray(gray)
9 _,thresh = cv2.threshold(gray,10,255,cv2.THRESH_BINARY) # turn it into a binary image
---> 10 contours,hierarchy = cv2.findContours(thresh,cv2.RETR_EXTERNAL,cv2.CHAIN_APPROX_SIMPLE) # find contours
11
12 if len(contours) != 0:
error: OpenCV(4.1.2) C:\projects\opencv-python\opencv\modules\imgproc\src\contours.cpp:197: error: (-210:Unsupported format or combination of formats) [Start]FindContours supports only CV_8UC1 images when mode != CV_RETR_FLOODFILL otherwise supports CV_32SC1 images only in function 'cvStartFindContours_Impl'
A cv2 image is nothing but a numpy array.
You can easily transform a PIL image (Keras) into a cv2 image by simply calling cv2_image = np.array(pil_image).
Since cv2 works with BGR instead of RGB, you may call cv2_image = np.flip(cv2_image, axis=-1) (if there are 3 channels)
import cv2
def clear(img):
back = cv2.imread("back.png", cv2.IMREAD_GRAYSCALE)
img = cv2.bitwise_xor(img, back)
ret, img = cv2.threshold(img, 120, 255, cv2.THRESH_BINARY_INV)
return img
def threshold(img):
ret, img = cv2.threshold(img, 120, 255, cv2.THRESH_BINARY_INV)
img = cv2.cvtColor(img, cv2.COLOR_RGB2GRAY)
ret, img = cv2.threshold(img, 248, 255, cv2.THRESH_BINARY)
return img
def fomatImage(img):
img = threshold(img)
img = clear(img)
return img
img = fomatImage(cv2.imread("1566135246468.png",cv2.IMREAD_COLOR))
cv2.imwrite("aa.png",img)
This is my code. But when I tried to identify it with tesseract-ocr, I got a warning.
Warning: Invalid resolution 0 dpi. Using 70 instead.
How should I set up dpi?
AFAIK, OpenCV doesn't set the dpi of PNG files it writes, so you are looking at work-arounds. Here are some ideas...
Method 1 - Use PIL/Pillow instead of OpenCV
PIL/Pillow can write dpi information into PNG files. So you would:
Step 1 - Convert your BGR OpenCV image into RGB to match PIL's channel ordering
from PIL import Image
RGBimage = cv2.cvtColor(BGRimage, cv2.COLOR_BGR2RGB)
Step 2 - Convert OpenCV Numpy array onto PIL Image
PILimage = Image.fromarray(RGBimage)
Step 3 - Write with PIL
PILimage.save('result.png', dpi=(72,72))
As Fred mentions in the comments, you could equally use Python Wand in much the same way.
Method 2 - Write with OpenCV but modify afterwards with some tool
You could use Python's subprocess module to shell out to, say, ImageMagick and set the dpi like this:
magick OpenCVImage.png -set units pixelspercentimeter -density 28.3 result.png
All you need to know is that PNG uses metric (dots per centimetre) rather than imperial (dots per inch) and there are 2.54cm in an inch, so 72 dpi becomes 28.3 dots per cm.
If your ImageMagick version is older than v7, replace magick with convert.
Method 3 - Write with OpenCV and insert dpi yourself
You could write your file to memory using OpenCV's imencode(). Then search in the file for the IDAT (image data) chunk - which is the one containing the image pixels and insert a pHYs chunk before that which sets the density. Then write to disk.
It's not that hard actually - it's just 9 bytes, see here and also look at pngcheck output at end of answer.
This code is not production tested but seems to work pretty well for me:
#!/usr/bin/env python3
import struct
import numpy as np
import cv2
import zlib
def writePNGwithdpi(im, filename, dpi=(72,72)):
"""Save the image as PNG with embedded dpi"""
# Encode as PNG into memory
retval, buffer = cv2.imencode(".png", im)
s = buffer.tostring()
# Find start of IDAT chunk
IDAToffset = s.find(b'IDAT') - 4
# Create our lovely new pHYs chunk - https://www.w3.org/TR/2003/REC-PNG-20031110/#11pHYs
pHYs = b'pHYs' + struct.pack('!IIc',int(dpi[0]/0.0254),int(dpi[1]/0.0254),b"\x01" )
pHYs = struct.pack('!I',9) + pHYs + struct.pack('!I',zlib.crc32(pHYs))
# Open output filename and write...
# ... stuff preceding IDAT as created by OpenCV
# ... new pHYs as created by us above
# ... IDAT onwards as created by OpenCV
with open(filename, "wb") as out:
out.write(buffer[0:IDAToffset])
out.write(pHYs)
out.write(buffer[IDAToffset:])
################################################################################
# main
################################################################################
# Load sample image
im = cv2.imread('lena.png')
# Save at specific dpi
writePNGwithdpi(im, "result.png", (32,300))
Whichever method you use, you can use pngcheck --v image.png to check what you have done:
pngcheck -vv a.png
Sample Output
File: a.png (306 bytes)
chunk IHDR at offset 0x0000c, length 13
100 x 100 image, 1-bit palette, non-interlaced
chunk gAMA at offset 0x00025, length 4: 0.45455
chunk cHRM at offset 0x00035, length 32
White x = 0.3127 y = 0.329, Red x = 0.64 y = 0.33
Green x = 0.3 y = 0.6, Blue x = 0.15 y = 0.06
chunk PLTE at offset 0x00061, length 6: 2 palette entries
chunk bKGD at offset 0x00073, length 1
index = 1
chunk pHYs at offset 0x00080, length 9: 255x255 pixels/unit (1:1). <-- THIS SETS THE DENSITY
chunk tIME at offset 0x00095, length 7: 19 Aug 2019 10:15:00 UTC
chunk IDAT at offset 0x000a8, length 20
zlib: deflated, 2K window, maximum compression
row filters (0 none, 1 sub, 2 up, 3 avg, 4 paeth):
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
(100 out of 100)
chunk tEXt at offset 0x000c8, length 37, keyword: date:create
chunk tEXt at offset 0x000f9, length 37, keyword: date:modify
chunk IEND at offset 0x0012a, length 0
No errors detected in a.png (11 chunks, 76.5% compression).
While I am editing PNG chunks, I also managed to set a tIME chunk and a tEXt chunk with the Author. They go like this:
# Create a new tIME chunk - https://www.w3.org/TR/2003/REC-PNG-20031110/#11tIME
year, month, day, hour, min, sec = 2020, 12, 25, 12, 0, 0 # Midday Christmas day 2020
tIME = b'tIME' + struct.pack('!HBBBBB',year,month,day,hour,min,sec)
tIME = struct.pack('!I',7) + tIME + struct.pack('!I',zlib.crc32(tIME))
# Create a new tEXt chunk - https://www.w3.org/TR/2003/REC-PNG-20031110/#11tEXt
Author = "Author\x00Sir Mark The Great"
tEXt = b'tEXt' + bytes(Author.encode('ascii'))
tEXt = struct.pack('!I',len(Author)) + tEXt + struct.pack('!I',zlib.crc32(tEXt))
# Open output filename and write...
# ... stuff preceding IDAT as created by OpenCV
# ... new pHYs as created by us above
# ... new tIME as created by us above
# ... new tEXt as created by us above
# ... IDAT onwards as created by OpenCV
with open(filename, "wb") as out:
out.write(buffer[0:IDAToffset])
out.write(pHYs)
out.write(tIME)
out.write(tEXt)
out.write(buffer[IDAToffset:])
Keywords: OpenCV, PIL, Pillow, dpi, density, imwrite, PNG, chunks, pHYs chunk, Python, image, image-processing, tEXt chunk, tIME chunk, author, comment
I am trying to read image from the text.
I am getting better result if I break the images into small chunks but the problem is when i try to split the image it is cutting/slicing my characters.
code I am using :
from __future__ import division
import math
import os
from PIL import Image
def long_slice(image_path, out_name, outdir, slice_size):
"""slice an image into parts slice_size tall"""
img = Image.open(image_path)
width, height = img.size
upper = 0
left = 0
slices = int(math.ceil(height/slice_size))
count = 1
for slice in range(slices):
#if we are at the end, set the lower bound to be the bottom of the image
if count == slices:
lower = height
else:
lower = int(count * slice_size)
#set the bounding box! The important bit
bbox = (left, upper, width, lower)
working_slice = img.crop(bbox)
upper += slice_size
#save the slice
working_slice.save(os.path.join(outdir, "slice_" + out_name + "_" + str(count)+".png"))
count +=1
if __name__ == '__main__':
#slice_size is the max height of the slices in pixels
long_slice("/python_project/screenshot.png","longcat", os.getcwd(), 100)
Sample Image : The image i want to process
Expected/What i am trying to do :
I want to split every line as separate image without cutting the character
Line 1:
Line 2:
Current result:Characters in the image are cropped
I dont want to cut the image based on pixels since each document will have separate spacing and line width
Thanks
Jk
Here is a solution that finds the brightest rows in the image (i.e., the rows without text) and then splits the image on those rows. So far I have just marked the sections, and am leaving the actual cropping up to you.
The algorithm is as follows:
Find the sum of the luminance (I am just using the red channel) of every pixel in each row
Find the rows with sums that are at least 0.999 (which is the threshold I am using) as bright as the brightest row
Mark those rows
Here is the code that will return a list of these rows:
def find_lightest_rows(img, threshold):
line_luminances = [0] * img.height
for y in range(img.height):
for x in range(img.width):
line_luminances[y] += img.getpixel((x, y))[0]
line_luminances = [x for x in enumerate(line_luminances)]
line_luminances.sort(key=lambda x: -x[1])
lightest_row_luminance = line_luminances[0][1]
lightest_rows = []
for row, lum in line_luminances:
if(lum > lightest_row_luminance * threshold):
lightest_rows.add(row)
return lightest_rows
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13 ... ]
After colouring these rows red, we have this image:
In my perspective, image normalization is to make every pixel to be normalized with an value between 0 and 1, am I right?
But what does the following code mean?
image_size = 28 # Pixel width and height.
pixel_depth = 255.0 # Number of levels per pixel.
for image in image_files:
image_file = os.path.join(folder, image)
try:
image_data = (ndimage.imread(image_file).astype(float) -
pixel_depth / 2) / pixel_depth # WHY ??
if image_data.shape != (image_size, image_size):
raise Exception('Unexpected image shape: %s' % str(image_data.shape))
dataset[num_images, :, :] = image_data
num_images = num_images + 1
except IOError as e:
print('Could not read:', image_file, ':', e, '- it\'s ok, skipping.')
Image normalization is merely the process of changing the range of pixel intensity values.
The choice of the new range is up to you.
In the case you've shown, it looks like the range -0.5 .. 0.5 has been chosen.
Apologies for a basic question. I have checking out the for loops here and here and for example if we analyse the first code :
for(int i = 0; i < CFDataGetLength(pixelData); i += 4) {
pixelBytes[i] // red
pixelBytes[i+1] // green
pixelBytes[i+2] // blue
pixelBytes[i+3] // alpha
}
The variable i is being incremented from 0 to the length of the array pixelData, in steps of 4.
However how does pixelBytes[i+3] access the alpha channel of the image? So for example if i=5, how does pixelBytes[5+3] equal the alpha channel instead of just accessing the 8th element of pixelBytes?
If i starts at zero and is incremented by 4 each time, how can it ever equal 5?
Presumably, the structure is stored with each channel occupying one byte, first red, then green, then blue, then alpha, then red again and so on. The for loop mimics this structure by increment i by four each time, so if the first time through pixelBytes[i+1] is the first green value, the second time through it will be four bytes later and thus the second green value.
Sometimes it helps to unrool the loop on a sheet of paper
// First pixel
RGBA
^ Index 0 = i(0) + 0
^ Index 1 = i(0) + 1
^ Index 2 = i(0) + 2
^ Index 3 = i(0) + 3
i + 4
// Second pixel
RGBA RGBA
^ Index 4 = i(4) + 0
^ Index 5 = i(4) + 1
^ Index 6 = i(4) + 2
^ Index 7 = i(4) + 3
i + 4
// Third pixel
RGBA RGBA RGBA
^ Index 8 = i(8) + 0
^ Index 9 = i(8) + 1
^ Index 10 = i(8) + 2
^ Index 11 = i(8) + 3
You have colours stored in the RGBA format. In the RGBA format, one colour is stored in 4 bytes, the first byte being the value for red (R), second is green (G), third is blue (B), and last is alpha (A).
Your own code explains this pretty well in its comments:
pixelBytes[i] // red
pixelBytes[i+1] // green
pixelBytes[i+2] // blue
pixelBytes[i+3] // alpha
It is important to note though, that if i is not a multiple of 4, you're not going to be reading the colours correctly anymore.
While the code isn't there, it is likely that pixelBytes is an array of size equal to the total number of colours times 4, which is the same thing as the number of total bytes used to represent the colours (since each colour is stored in 4 bytes)
A typical 32 bit pixel consists of four channels, alpha, red, green and blue.
My guess is that pixelbytes is a bytebuffer of these, so:
pixelbuffer[0] = r
pixelbuffer[1] = g
pixelbuffer[2] = b
pixelbuffer[3] = a
as your code says.
On each iteration, it adds four bytes (8 bit * 4 = 32 bit) to the counter, equaling the offset to the next 32bit pixel. The individual components can be accessed through a byte offset (i + <0-3>).