I have an image like
and I would like to detect most of the lines in the image. I am using HoughLines function in OpenCV.
The code looks like:
src_img = cv.imread("./snapshot_0711.png")
gray_img = cv.cvtColor(src_img, cv.COLOR_BGR2GRAY)
kernel_size = 5
blur_img = cv.GaussianBlur(gray_img, (kernel_size, kernel_size),0)
dst_img = cv.Canny(blur_img, 100, 250, None, 3)
cv.imshow("dt", dst_img)
lines = cv.HoughLines(dst_img, 1, np.pi * 1 / 180, 100, None, 0, 0)
It detected many lines, but the results (first case) is not satisfactory, like:
Is there anything doing wrong in my code? Will I need a preprocessing?
Using the LineSegmentDetector recommended by #Micka, the result looks like (second case):
I want to improve the first case. In the case output, is there way(s) connecting the small segments?
Thanks.
Related
I'm facing with this error properly and I could not see any exact solution or a solution formula for this error. My inputs are like (48x48) and that's not matching with the input shape of the resnet101. How can I edit my input to fit to the resnet101? You can see my code below, it probably helps you to understand my problem.
if __name__ == "__main__":
vid = cv2.VideoCapture(0)
emotions = []
while vid.isOpened():
image = cv2.imread("/home/berkay/Desktop/angry_man.jpg")
_, frame = vid.read()
# takes in a gray coloured filter of the frame
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
# initializing the haarcascade face detector
faces = face_cascade.detectMultiScale(frame)
for (x,y,w,h) in faces:
# takes the region of interest of the face only in gray
roi_gray = gray[y:y+h, x:x+h]
resized = cv2.resize(roi_gray, (48, 48)) # resizes to 48x48 sized image
# predict the mood
img = img2tensor(resized)
prediction = predict(img)
In that point, I'm getting this error:
weight of size [64, 3, 7, 7], expected input[1, 1, 229, 229] to have 3 channels, but got 1 channels instead
How can I fix this? Thanks in advance
You can modify the input layer of resnet so that it would accept a single-channel tensors inputs using
In [1]: model = resnet101()
In [2]: model.conv1 = nn.Conv2d(1, 64, kernel_size=(2, 2))
In [3]: model(torch.rand(10, 1, 48, 48))
Out[3]:
tensor([[-0.5015, 0.6124, 0.1370, ..., 1.2181, -0.4707, 0.3285],
[-0.4776, 1.1027, 0.0161, ..., 0.6363, -0.4733, 0.6218],
[-0.3935, 0.8276, -0.0316, ..., 0.6853, -0.4735, 0.6424],
...,
[-0.2986, 1.1758, 0.0158, ..., 0.7422, -0.4422, 0.4792],
[-0.2668, 0.7884, -0.1205, ..., 1.1445, -0.6249, 0.6697],
[-0.2139, 1.0412, 0.2326, ..., 0.8332, -0.8744, 0.4827]],
grad_fn=<AddmmBackward0>)
(you will probably need to modify the kernel size accordingly too)
I want to split an image of a table at the vertical lines into three images as shown below. Is it possible? The width of each column is variable. And the sad thing is that the left vertical line is drawn down from the header as you can see.
Input image (input.png)
Output image (output1.png)
Output image (output2.png)
Output image (output3.png)
Update 1
And the sad thing is that the left vertical line is drawn down from the header as you can see.
It means I guess the following image B is easier to split. But my case is A.
Update 2
I am trying to do the way #HansHirse gave me. My expectation is sub_image_1.png, sub_image_2.png and sub_image_3.png are stored in the out folder. But no luck so far. I'm looking into it.
https://github.com/zono/ocr/blob/16fd0ec9a2c7d2e26279ec53947fe7fbab9f526d/src/opencv.py
$ git clone https://github.com/zono/ocr.git
$ cd ocr
$ git checkout 16fd0ec9a2c7d2e26279ec53947fe7fbab9f526d
$ docker-compose up -d
$ docker exec -it ocr /bin/bash
$ python3 opencv.py
Since your table is perfectly aligned, you can inverse binary threshold your image, and count (white) pixels along the y-axis to detect the vertical lines:
You'll need to clean the peaks, since you might get plateaus for the thicker lines.
That'd be my idea in Python OpenCV:
import cv2
import numpy as np
from skimage import io # Only needed for web reading images
# Web read image via scikit-image; convert to OpenCV's BGR color ordering
img = cv2.cvtColor(io.imread('https://i.stack.imgur.com/BTqBs.png'), cv2.COLOR_RGB2BGR)
# Inverse binary threshold grayscale version of image
img_thr = cv2.threshold(cv2.cvtColor(img, cv2.COLOR_BGR2GRAY), 128, 255, cv2.THRESH_BINARY_INV)[1]
# Count pixels along the y-axis, find peaks
thr_y = 200
y_sum = np.count_nonzero(img_thr, axis=0)
peaks = np.where(y_sum > thr_y)[0]
# Clean peaks
thr_x = 50
temp = np.diff(peaks).squeeze()
idx = np.where(temp > thr_x)[0]
peaks = np.concatenate(([0], peaks[idx+1]), axis=0) + 1
# Save sub-images
for i in np.arange(peaks.shape[0] - 1):
cv2.imwrite('sub_image_' + str(i) + '.png', img[:, peaks[i]:peaks[i+1]])
I get the following three images:
As you can see, you might want to modify the selection by +/- 1 pixel, if an actual line is only 1 pixel wide.
Hope that helps!
----------------------------------------
System information
----------------------------------------
Platform: Windows-10-10.0.16299-SP0
Python: 3.8.1
NumPy: 1.18.1
OpenCV: 4.2.0
----------------------------------------
OpenCV has a line detection function:
You can filter the lines that are returned by passing min_theta and max_theta. For vertical lines you can specify maybe : 88 and 92 respectively for margin.
This is a edited sample taken from openCV documentation:
import sys
import math
import cv2 as cv
import numpy as np
def main(argv):
default_file = 'img.png'
filename = argv[0] if len(argv) > 0 else default_file
# Loads an image
src = cv.imread(cv.samples.findFile(filename), cv.IMREAD_GRAYSCALE)
#some preparation of the photo
dst = cv.Canny(src, 50, 200, None, 3)
# Copy edges to the images that will display the results in BGR
cdst = cv.cvtColor(dst, cv.COLOR_GRAY2BGR)
cdstP = np.copy(cdst)
lines = cv.HoughLines(dst, 1, np.pi / 180, 150, None, 88, 92) #min and max theta
You can get the x, y coordinate of the line and draw them by using the following code.
if lines is not None:
for i in range(0, len(lines)):
rho = lines[i][0][0]
theta = lines[i][0][2]
a = math.cos(theta)
b = math.sin(theta)
x0 = a * rho
y0 = b * rho
pt1 = (int(x0 + 1000*(-b)), int(y0 + 1000*(a)))
pt2 = (int(x0 - 1000*(-b)), int(y0 - 1000*(a)))
cv.line(cdst, pt1, pt2, (0,0,255), 3, cv.LINE_AA)
Alternatively you can also use HoughLinesP as this allows you to specify a minimum length, which will help your filtering. Also the lines are returned as x,y pairs for each end making it easier to work with.
linesP = cv.HoughLinesP(dst, 1, np.pi / 180, 50, None, 50, 10)
if linesP is not None:
for i in range(0, len(linesP)):
l = linesP[i][0]
cv.line(cdstP, (l[0], l[2]), (l[2], l[3]), (0,0,255), 3, cv.LINE_AA)
cv.imshow("Source", src)
cv.imshow("Detected Lines (in red) - Standard Hough Line Transform", cdst)
cv.imshow("Detected Lines (in red) - Probabilistic Line Transform", cdstP)
cv.waitKey()
return 0
Documentation
To crop your image you can take the x coordinates of the lines you detected and use numpy slicing.
for i in range(0, len(linesP) - 1):
l = linesP[i][0]
xcoords = l[0], linesP[i+1][0][0]
slice = img[:xcoords[0],xcoords[1]]
cv.imshow('slice', slice)
cv.waitKey(0)
I am trying to read image from the text.
I am getting better result if I break the images into small chunks but the problem is when i try to split the image it is cutting/slicing my characters.
code I am using :
from __future__ import division
import math
import os
from PIL import Image
def long_slice(image_path, out_name, outdir, slice_size):
"""slice an image into parts slice_size tall"""
img = Image.open(image_path)
width, height = img.size
upper = 0
left = 0
slices = int(math.ceil(height/slice_size))
count = 1
for slice in range(slices):
#if we are at the end, set the lower bound to be the bottom of the image
if count == slices:
lower = height
else:
lower = int(count * slice_size)
#set the bounding box! The important bit
bbox = (left, upper, width, lower)
working_slice = img.crop(bbox)
upper += slice_size
#save the slice
working_slice.save(os.path.join(outdir, "slice_" + out_name + "_" + str(count)+".png"))
count +=1
if __name__ == '__main__':
#slice_size is the max height of the slices in pixels
long_slice("/python_project/screenshot.png","longcat", os.getcwd(), 100)
Sample Image : The image i want to process
Expected/What i am trying to do :
I want to split every line as separate image without cutting the character
Line 1:
Line 2:
Current result:Characters in the image are cropped
I dont want to cut the image based on pixels since each document will have separate spacing and line width
Thanks
Jk
Here is a solution that finds the brightest rows in the image (i.e., the rows without text) and then splits the image on those rows. So far I have just marked the sections, and am leaving the actual cropping up to you.
The algorithm is as follows:
Find the sum of the luminance (I am just using the red channel) of every pixel in each row
Find the rows with sums that are at least 0.999 (which is the threshold I am using) as bright as the brightest row
Mark those rows
Here is the code that will return a list of these rows:
def find_lightest_rows(img, threshold):
line_luminances = [0] * img.height
for y in range(img.height):
for x in range(img.width):
line_luminances[y] += img.getpixel((x, y))[0]
line_luminances = [x for x in enumerate(line_luminances)]
line_luminances.sort(key=lambda x: -x[1])
lightest_row_luminance = line_luminances[0][1]
lightest_rows = []
for row, lum in line_luminances:
if(lum > lightest_row_luminance * threshold):
lightest_rows.add(row)
return lightest_rows
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13 ... ]
After colouring these rows red, we have this image:
Imagine we have a DoubleTensor - size: 5x32x3000 and we ant to convert it to DoubleTensor - size: 5x32x100 to feed in further. Now, what I would do is the following:
local seq = nn.Sequential()
seq:add(nn.SplitTable(1))
seq:add(nn.MapTable():add(nn.Linear(3000,100)))
seq:add(nn.JoinTable(1)):add(nn.View(5,32,100))
This looks a bit complicated, I feel like there should be a more efficient way. Can you come up with a better solution?
I have tried this, it will output size (5, 32, 1000) as you wanted
data = torch.Tensor(5, 32, 3000)
mul = torch.Tensor(3000, 1000)
res = torch.mm(data:view(5*32, 3000), mul):view(5, 32, 1000)
print(res:size())
Another way could be also:
seq = nn.Sequential()
seq:add(nn.SplitTable(1)):add(nn.MapTable():add(nn.Linear(3000,100)))
seq:add(nn.JoinTable(1))
from turtle import *
color('red', 'yellow')
begin_fill()
while True:
forward(200)
left(170)
if abs(pos()) < 1:
break
end_fill()
done()
I do not understand this part of the code. if abs(pos()) < 1: what does it mean?
This code draws a star with red lines & filled with yellow. The abs(pos()) < 1 statement is used to compare the current turtle location with the original starting turtle position after executing each iteration of the while statement. If the turtle position is less than 1 unit away, the while statement terminates and the end_fill() statement executes to complete the yellow color fill.
Comment out the if statement and watch what happens, also, experiment with different numbers in the abs(pos())<1 expression, including 10, 20, 30, etc. to see the effect.
abs(pos()) means absolute position. if abs(pos())<1: means you come back to starting point. Hope it clarifies to you.
Another option is to use 'if t.heading() == 0:'.
If my understanding isn't too incorrect,
when 'turtle.heading == 0' then the turtle is facing 'east',
the direction it started drawing from.
This has worked for all the angles I've tried so far.
Using 'if abs(pos()) < 1:' ...
I can only draw images at the origin (0,0).
(Maybe there's a way to draw images at
other locations using 'if abs(pos()) < 1:'
but I haven't figured out how.)
Using 'if t.heading() == 0:'
I can draw images anywhere on the screen.
import turtle
wn = turtle.Screen()
wn.title("Drawing Geometric Shapes")
t = turtle.Turtle()
t.color('red', 'yellow')
t.speed(0)
#=====================================
def star(x, y, length, angle):
t.penup()
t.goto(x, y)
t.pendown()
t.begin_fill()
while True:
t.forward(length)
t.left(angle)
if t.heading() == 0: #================
break
t.end_fill()
# ( x, y, length, angle)
star(-470, 300, 100, 120)
star( 360, 320, 100, 160)
star(-450, -340, 100, 100)
star( 360, -340, 100, 170)
star(-360, 0, 750, 178)
t.penup()
t.goto(-500, 0)