So I'm a very experienced developer trying to get into some machine learning/neural networking code.
Essentially I need a HUGE dataset so my first problem is that I need to find a way of labelling a lot of images quickly. So take this as the example.
I was thinking I could use template matching on the main image with the image below it? So that way I would simply need to get permission to use this data and I could label it very quickly.
When using openCV(from the examples) I get some very funky results which don't find the plate in the image, it does draw boxes but not around the plate, having tested it, it gets very very close a few times, but not much, code is...
import cv2 as cv
import numpy as np
from matplotlib import pyplot as plt
img = cv.imread('./image2.jpg',0)
img2 = img.copy()
template = cv.imread('./Plate2.test.png',0)
w, h = template.shape[::-1]
# All the 6 methods for comparison in a list
methods = ['cv.TM_CCOEFF', 'cv.TM_CCOEFF_NORMED', 'cv.TM_CCORR',
'cv.TM_CCORR_NORMED', 'cv.TM_SQDIFF', 'cv.TM_SQDIFF_NORMED']
for meth in methods:
img = img2.copy()
method = eval(meth)
# Apply template Matching
res = cv.matchTemplate(img,template,method)
min_val, max_val, min_loc, max_loc = cv.minMaxLoc(res)
# If the method is TM_SQDIFF or TM_SQDIFF_NORMED, take minimum
if method in [cv.TM_SQDIFF, cv.TM_SQDIFF_NORMED]:
top_left = min_loc
else:
top_left = max_loc
bottom_right = (top_left[0] + w, top_left[1] + h)
cv.rectangle(img,top_left, bottom_right, 255, 2)
plt.subplot(121),plt.imshow(res,cmap = 'gray')
plt.title('Matching Result'), plt.xticks([]), plt.yticks([])
plt.subplot(122),plt.imshow(img,cmap = 'gray')
plt.title('Detected Point'), plt.xticks([]), plt.yticks([])
plt.suptitle(meth)
plt.show()
The first thing is I'm guessing this isn't working because the Main image we're looking for the template on is orientated differently.
The second thing I should point out is I am NOT a Python programmer so I'm learning this also, and this is my first time touching OpenCV so I'm trying to apply what I DO understanding about object detecting to things I don't properly understand.
What I want to do is get the coordinates for a bounding box in the MAIN image from the smaller plate that way I can(with permission) create a decent dataset to train really quick - otherwise, I have to do it manually :-(
Any help would be greatly appreciated, I have a lot of examples working but this was an interesting problem I didn't find any reading on.
In my mind the steps are:
1)Find the plate and create bounding box
2)Train the dataset across as many images a possible for object detection on said plates
3) When testing the plate needs extracting from the main image and then a perspective transform applying.
4) If you wanted to, then you'd do text extraction once you've got the plate flattened out.
UPDATE:
So I tried SIFT from here the results are as follows(note this image is already in the public domain from the above website.) - not quite on target!
UPDATE 2
I've managed to cobble together a solution from an article as suggested JD in the comments, essentially it lets me label enough images to create a neural network that in turn is much better at detecting them - I'll post an update soon with the answer.
Related
I have image of text document. It includes text and block-schemes. The main problem is to detect block-schemes. I think there are two approaches to solve this task: 1) detect geometric primitive that make up the scheme; 2) detect the whole scheme.
How can I solve this task, please, give me some aproaches.
UPDATE 1
I try to detect where in document block-scheme is placed. Example is shown on the picture below. I didn't try to detect text in block-scheme.
UPDATE 2 The main problem is that i should find block-schemes in different varieties. Even part of the block-scheme.
You can either do 1) Object Detection 2) Semantic Segmentation. I would suggest segmentation because boundary extraction is crucial for your application.
I'm assuming you have the pages of the documents as images.
The following are the steps involved in projects involving segmentation.
Dataset
Collect the images of the pages required to solve you problem and do
preprocessing steps such as image resizing to bring all images in
your dataset to a common shape and to reduce the number of computations performed. Be sure to maintain variability in your samples.
Now you have to annotate the regions of the images that you are interested and mark them with a name. Here assigning a class (like classification) to certain regions of the image. You can use the following tools for this.
Labelme -- (my recommendation)
Vgg Annotation tool -- (highly portable tool written in html but has less features than labelme)
Model
You can use U-Net Model for your task. Unet Paper. It is very easy to implement but performs very robustly on most real-world tasks such as yours.
We have done something similar at work. This is the blog post. We have explained in detail the steps involved in the pipe line from the data collection stage to the results.
Literature on Document Layout Analysis.
https://arxiv.org/pdf/1804.10371.pdf -- They have used U-Net with ResNet-50 as encoder. They have achieved very good results compared to previous approaches
https://github.com/leonlulu/DeepLayout-- This is a Python implementation of page layout analysis tool using a Deep Lab v2 model which does semantic segmentation.
Conclusion
The approach presented here might seem tedious and time consuming but it is robust to variability in the documents when you are testing. Comment below if you have any questions.
I would prefer if there were more examples for the types of diagram you are searching for, but based on the example you have given, here is my attempt of solving it naively.
1) Resize image to a manageable size to improve speed and reduce operations.
2) Use morphological open to cluster all the dark objects together.
3) Binarize the dark objects.
4) Label the objects using openCV connected components. This will give us the bounding box of each region.
5) Cluster overlapping bounding box together.
6) Analyze each bounding box to find the one with diagram. Here you can apply a more sophisticated algorithm like box detection or even arrow detection but in your example, i think a simple box ratio is sufficient.
Here is the code for the implementation
import cv2
import numpy as np
# Function to fill all the bounding box
def fill_rects(image, stats):
for i,stat in enumerate(stats):
if i > 0:
p1 = (stat[0],stat[1])
p2 = (stat[0] + stat[2],stat[1] + stat[3])
cv2.rectangle(image,p1,p2,255,-1)
# image name
img_name = 'test_image.png'
# Load image file
diagram = cv2.imread(img_name,0)
diagram = cv2.blur(diagram,(5,5))
fScale = 0.25
# Make it smaller to speed up everything and easier to cluster
small_img = cv2.resize(diagram,(0,0),fx = fScale, fy = fScale)
img_h, img_w = np.shape(small_img)
# Morphological close process to cluster nearby objects
fat_img = cv2.morphologyEx(small_img,cv2.MORPH_OPEN,None,iterations = 1)
# Threshold strong signals
_, bin_img = cv2.threshold(fat_img,210,255,cv2.THRESH_BINARY_INV)
# Analyse connected components
num_labels, labels, stats, centroids = cv2.connectedComponentsWithStats(bin_img)
# Cluster all the intersected bounding box together
rsmall, csmall = np.shape(small_img)
new_img1 = np.zeros((rsmall, csmall), dtype=np.uint8)
fill_rects(new_img1,stats)
# Analyse New connected components to get filled regions
num_labels_new, labels_new, stats_new, centroids_new = cv2.connectedComponentsWithStats(new_img1)
# Check for regions that satifies conditions coresponds to diagram
min_dia_width = img_w * 0.1
dia_regions = []
for i ,stat in enumerate(stats):
if i > 0:
# get basic dimensions
x,y,w,h = stat[0:4]
# calculate ratio
ratio = w / float(h)
# if condition met, save in list
if ratio < 1 and w > min_dia_width:
dia_regions.append((x/fScale,y/fScale,w/fScale,h/fScale))
# For display purpose
diagram_disp = cv2.imread(img_name)
for region in dia_regions:
x,y,w,h = region
x = int(x)
y = int(y)
w = int(w)
h = int(h)
cv2.rectangle(diagram_disp,(x,y),(x+w,y+h),(0,255,0),2)
labels_disp = np.uint8(200*labels/np.max(labels)) + 50
labels_disp2 = np.uint8(200*labels_new/np.max(labels_new)) + 50
cv2.imshow('small_img',small_img)
cv2.imshow('fat_img',fat_img)
cv2.imshow('bin_img',bin_img)
cv2.imshow("labels",labels_disp)
cv2.imshow("labels_disp2",labels_disp2)
cv2.imshow("diagram_disp",diagram_disp)
cv2.waitKey(0)
Here is the result for another type of input.
One of the problems that I am working on is to do OCR on documents. A few of the paystub document have a highlighted line with dots to differentiate important elements like Gross Pay, Net Pay, etc.
These dots give erroneous results in OCR, it considers them as ':' character and doesn't give desired results. I have tried a lot of things for image processing such as ImageMagick, etc to remove these dots. But in each case the quality of entire text data is degraded resulting in poor OCR.
ImageMagick commands that I have tried is:
convert mm150.jpg -kuwahara 3 mm2.jpg
I have also tried connected components, erosion with kernels, etc, but each method fails in some way.
I would like to know if there is some method that I should follow, or am I missing something from Image Processing capabilities.
This issue can be resolved using connectedComponentsWithStats function of opencv. I found reference for this from this question How do I remove the dots / noise without damaging the text?
I changed it a bit to fit as per my needs. And this is the code that helped me get desired output.
import cv2
import numpy as np
import sys
img = cv2.imread(sys.argv[1], 0)
_, blackAndWhite = cv2.threshold(img, 127, 255, cv2.THRESH_BINARY_INV)
nlabels, labels, stats, centroids = cv2.connectedComponentsWithStats(blackAndWhite, 4, cv2.CV_32S)
sizes = stats[1:, -1] #get CC_STAT_AREA component
img2 = np.zeros((labels.shape), np.uint8)
for i in range(0, nlabels - 1):
if sizes[i] >= 8: #filter small dotted regions
img2[labels == i + 1] = 255
res = cv2.bitwise_not(img2)
cv2.imwrite('res.jpg', res)
The output file that I got is pretty clear with dotted band removed such as it gives perfect OCR results.
I'm an experienced developer, new to Machine Learning. I'm experimenting with Keras/TensorFlow, starting with the mnist_mlp.py example. I installed Keras and TensorFlow using pip on a Mac.
In order to understand the inner workings better, instead of running the file ('python mnist_mlp.py'), I'm cutting and pasting the file contents into a Python (2.7.12) interactive window.
Everything runs fine and I get the 98.4% test accuracy as noted in the comments of that file.
What I want to do next is to feed it novel input and use model.predict() to see how it performs. I create 28x28 images in GIMP and bring them into my Python session (being careful to convert from 4-channel, 8-bit RGBA images to a linear single-channel floating-point array).
When I feed this into the model, I get what look like strange results to me. Some images are correctly categorized while others are wildly off.
They look like perfectly reasonable numbers to me, and they match the MNIST set examples pretty closely. When I extract the array back out and look at it it looks OK, so it doesn't seem to be a flipping or flopping issue. When I feed MNIST images in the same way, they appear to work correctly.
I'm not sure what's going on here. Is it a case of overfitting? Why is the validation data set the same as the test set?
Test images and python code with instructions can be found here:
https://s3.amazonaws.com/stackoverflow-47799896/StackOverflow_47799896.zip
Thanks.
EDIT: I tried the same test with the convnet example (mnist_cnn.py) and got slightly better results but still similar errors. If anyone wants to try that, they can use the same functions in the readme.py file but make these changes:
import numpy as np
x = np.ndarray((1,28,28,1), dtype='float32')
def l (s):
with open(s, 'rb') as fd:
_ = fd.read(1)
for i in xrange(28):
for j in xrange(28):
v = ord(fd.read(1))
x[0][i][j][0] = v / 255.0
_ = fd.read(3)
EDIT 2: Interestingly, if I replace the first 19 items in the training data set (out of 60,000) with my images in the MLP case, I get at or near perfect prediction of all my images after training. Does this suggest overfitting?
I would like to see what my ImageDataGenerator yields to my network.
Edit:
removed the channel_shift_range, accidently left it in the code
Generator
genNorm = ImageDataGenerator(rotation_range=10, width_shift_range=0.1,
height_shift_range=0.1, zoom_range=0.1, horizontal_flip=True)
Get Batches
batches = genNorm.flow_from_directory(path+'train', target_size=(224,224),
class_mode='categorical', batch_size=64)
x_batch, y_batch = next(batches)
Plot Images
for i in range (0,32):
image = x_batch[i]
plt.imshow(image.transpose(2,1,0))
plt.show()
Result
Generator Output
Is this normal or am I doing something wrong here?
The strange colors result from your channel shift. Do you really need that to augment your samples? Is a value of 10 (=very high) really what you want?
In addition to that: Another and likely more efficient way of checking what your generator yields is to set a directory with save_to_dir (parameter of flow/flow from directory function). In that you´ll find all the images that have been transformed and been delivered to your fit/flow function.
Edit:
You still somehow seem to invert your images during processing or while displaying them. I assume the original images look more like this:
Save the results of your ImageDataGenerator to a directory and compare these with the results that you see with plt.show.
Try this; change the generator as follow:
import numpy as np
def my_preprocessing_func(img):
image = np.array(img)
return image / 255
genNorm = ImageDataGenerator(rotation_range=10, width_shift_range=0.1,
height_shift_range=0.1, zoom_range=0.1, horizontal_flip=True,
preprocessing_function=my_preprocessing_func)
That worked for me,
Bruno
keras uses the image operation with Pillow backend, which comes with 'RGB' format default. So you don't need to reverse the channel axis in your plt.imshow().
Only in cv2.imread() (which is 'BGR' default), you may need plt.imshow(img[:, :, ::-1]) to display the right image.
BR
I'm trying to create an example using the Keras built in the latest version of TensorFlow from Google. This example should be able to classify a classic image of an elephant. The code looks like this:
# Import a few libraries for use later
from PIL import Image as IMG
from tensorflow.contrib.keras.python.keras.preprocessing import image
from tensorflow.contrib.keras.python.keras.applications.inception_v3 import InceptionV3
from tensorflow.contrib.keras.python.keras.applications.inception_v3 import preprocess_input, decode_predictions
# Get a copy of the Inception model
print('Loading Inception V3...\n')
model = InceptionV3(weights='imagenet', include_top=True)
print ('Inception V3 loaded\n')
# Read the elephant JPG
elephant_img = IMG.open('elephant.jpg')
# Convert the elephant to an array
elephant = image.img_to_array(elephant_img)
elephant = preprocess_input(elephant)
elephant_preds = model.predict(elephant)
print ('Predictions: ', decode_predictions(elephant_preds))
Unfortunately I'm getting an error when trying to evaluate the model with model.predict:
ValueError: Error when checking : expected input_1 to have 4 dimensions, but got array with shape (299, 299, 3)
This code is taken from and based on the excellent example coremltools-keras-inception and will be expanded more when it is figured out.
The reason why this error occured is that model always expects the batch of examples - not a single example. This diverge from a common understanding of models as mathematical functions of their inputs. The reasons why model expects batches are:
Models are computationaly designed to work faster on batches in order to speed up training.
There are algorithms which takes into account the batch nature of input (e.g. Batch Normalization or GAN training tricks).
So four dimensions comes from a first dimension which is a sample / batch dimension and then - the next 3 dimensions are image dims.
Actually I found the answer. Even though the documentation states that if the top layer is included the shape of the input vector is still set to take a batch of images. Thus we need to add this before the code line for the prediction:
elephant = numpy.expand_dims(elephant, axis=0)
Then the tensor is in the right shape and everything works correctly. I am still uncertain why the documentation states that the input vector should be (3x299x299) or (299x299x3) when it clearly wants 4 dimensions.
Be careful!