I am implementing a backend in Rust to run ONNX inferences on images using the opencv crate. Everything's working great but I have some doubts about a function I have written to crop an image at a specified location:
// assume coordinates are always valid
img
.col_range(&opencv::core::Range::new(xmin, xmax).unwrap()).unwrap()
.row_range(&opencv::core::Range::new(ymin, ymax).unwrap()).unwrap();
Is this the only way to accomplish a simple Mat crop? Hope I am not missing a better and more efficient solution.
There's nothing inefficient with what you've written. OpenCV Mats use data-sharing by default so there's no worry that you're needlessly copying large portions of original image. There's many ways to "slice and dice" the data and what you've chosen is a sensible way to crop an image. In fact, its the primary way you would do this in other languages (from Cropping an Image using OpenCV):
Python: cropped_image = img[80:280, 150:330]
C++: Mat cropped_image = img(Range(80,280), Range(150,330));
What I would recommend, which may be more clear and direct in Rust, is to use Mat::roi (a.k.a. the "region of interest" constructor):
let cropped_image = Mat::roi(&img, opencv::core::Rect {
x: xmin,
y: ymin,
width: xmax - xmin,
height: ymax - ymin,
}).unwrap();
See also:
How to crop a CvMat in OpenCV?
How to crop an image in OpenCV using Python
Related
contrast_img = cv2.addWeighted(img, 2.5, np.zeros(img.shape, img.dtype), 0, 0)
How to change contrast of half of the image,i'm using the above code ,what change I need to in my code to do so
You could slice out the top half of the image and apply the contrast function to just that.
top_half = img[0:int(height/2), :, :];
I'm fairly certain that with python aliasing applying the contrast to "top_half" would change the original "img" as well, but if not you can always just overlay the top_half onto img with this:
img[0:int(height/2), :, :] = top_half;
Note: images in opencv are numpy arrays with shape [height, width, channels] (grayscale images or other single-channel representations might be missing the last dimension). I'm assuming here that you're working with an rgb (bgr in opencv) image.
so i want to segment a tree from an aerial image
sample image (original image) :
and i expect the result like this (or better) :
the first thing i do is using threshold function in opencv and i didn't get expected result (it cant segment the tree crown), and then i'm using black and white filter in photoshop using some adjusted parameter (the result is shown beloww) and do the threshold and morphological filter and got result like shown above.
my question, is there a some ways to do the segmentation to the image without using photoshop first, and produce segmented image like the second image (or better) ? or maybe is there a way to do produce image like the third image ?
ps: you can read the photoshop b&w filter question here : https://dsp.stackexchange.com/questions/688/whats-the-algorithm-behind-photoshops-black-and-white-adjustment-layer
You can do it in OpenCV. The code below will basically do the same operations you did in Photoshop. You may need to tune some of the parameters to get exactly what you want.
#include "opencv2\opencv.hpp"
using namespace cv;
int main(int, char**)
{
Mat3b img = imread("path_to_image");
// Use HSV color to threshold the image
Mat3b hsv;
cvtColor(img, hsv, COLOR_BGR2HSV);
// Apply a treshold
// HSV values in OpenCV are not in [0,100], but:
// H in [0,180]
// S,V in [0,255]
Mat1b res;
inRange(hsv, Scalar(100, 80, 100), Scalar(120, 255, 255), res);
// Negate the image
res = ~res;
// Apply morphology
Mat element = getStructuringElement( MORPH_ELLIPSE, Size(5,5));
morphologyEx(res, res, MORPH_ERODE, element, Point(-1,-1), 2);
morphologyEx(res, res, MORPH_OPEN, element);
// Blending
Mat3b green(res.size(), Vec3b(0,0,0));
for(int r=0; r<res.rows; ++r) {
for(int c=0; c<res.cols; ++c) {
if(res(r,c)) { green(r,c)[1] = uchar(255); }
}
}
Mat3b blend;
addWeighted(img, 0.7, green, 0.3, 0.0, blend);
imshow("result", res);
imshow("blend", blend);
waitKey();
return 0;
}
The resulting image is:
The blended image is:
This has been an interesting topic of research in the past - mainly in the remote sensing literature.
While the morphological methods proposed using OpenCV will work in certain cases, you might want to consider more sophisticated approaches (depending on how variable your data is and how robust a detector you want to build).
For example, this paper, and those who cite it - give you a flavour of what has been attempted.
Pragmatically speaking - I think a neat solution would be one founded more on statistical texture analysis. There are many ways to classify (and then count) regions of an image as belong to a texture (co-occurance matrices, filter banks, textons, wavelets, etc, etc.).
Sadly, this is an area where OpenCV is rather deficient - it only provides a subset of the useful algorithms out there... However, here are a few quick ideas (none of which I have tried directly, just what I'm aware of are based on underlying OpenCV):
Use OpenCV Gabor filter support and cluster (for example).
You could also possibly train an OpenCV SVM with Local Binary Patterns.
A new library - but probably not so relevant for static images - LIBDT
Anyways, I hope you get something that just works for your purposes!
I'm trying to develop an App that uses Tesseract to recognize text from documents taken by a phone's cam. I'm using OpenCV to preprocess the image for better recognition, applying a Gaussian blur and a Threshold method for binarization, but the result is pretty bad.
Here is the the image I'm using for tests:
And here the preprocessed image:
What others filter can I use to make the image more readable for Tesseract?
I described some tips for preparing images for Tesseract here:
Using tesseract to recognize license plates
In your example, there are several things going on...
You need to get the text to be black and the rest of the image white (not the reverse). That's what character recognition is tuned on. Grayscale is ok, as long as the background is mostly full white and the text mostly full black; the edges of the text may be gray (antialiased) and that may help recognition (but not necessarily - you'll have to experiment)
One of the issues you're seeing is that in some parts of the image, the text is really "thin" (and gaps in the letters show up after thresholding), while in other parts it is really "thick" (and letters start merging). Tesseract won't like that :) It happens because the input image is not evenly lit, so a single threshold doesn't work everywhere. The solution is to do "locally adaptive thresholding" where a different threshold is calculated for each neighbordhood of the image. There are many ways of doing that, but check out for example:
Adaptive gaussian thresholding in OpenCV with cv2.adaptiveThreshold(...,cv2.ADAPTIVE_THRESH_GAUSSIAN_C,...)
Local Otsu's method
Local adaptive histogram equalization
Another problem you have is that the lines aren't straight. In my experience Tesseract can handle a very limited degree of non-straight lines (a few percent of perspective distortion, tilt or skew), but it doesn't really work with wavy lines. If you can, make sure that the source images have straight lines :) Unfortunately, there is no simple off-the-shelf answer for this; you'd have to look into the research literature and implement one of the state of the art algorithms yourself (and open-source it if possible - there is a real need for an open source solution to this). A Google Scholar search for "curved line OCR extraction" will get you started, for example:
Text line Segmentation of Curved Document Images
Lastly: I think you would do much better to work with the python ecosystem (ndimage, skimage) than with OpenCV in C++. OpenCV python wrappers are ok for simple stuff, but for what you're trying to do they won't do the job, you will need to grab many pieces that aren't in OpenCV (of course you can mix and match). Implementing something like curved line detection in C++ will take an order of magnitude longer than in python (* this is true even if you don't know python).
Good luck!
Scanning at 300 dpi (dots per inch) is not officially a standard for OCR (optical character recognition), but it is considered the gold standard.
Converting image to Greyscale improves accuracy in reading text in general.
I have written a module that reads text in Image which in turn process the image for optimum result from OCR, Image Text Reader .
import tempfile
import cv2
import numpy as np
from PIL import Image
IMAGE_SIZE = 1800
BINARY_THREHOLD = 180
def process_image_for_ocr(file_path):
# TODO : Implement using opencv
temp_filename = set_image_dpi(file_path)
im_new = remove_noise_and_smooth(temp_filename)
return im_new
def set_image_dpi(file_path):
im = Image.open(file_path)
length_x, width_y = im.size
factor = max(1, int(IMAGE_SIZE / length_x))
size = factor * length_x, factor * width_y
# size = (1800, 1800)
im_resized = im.resize(size, Image.ANTIALIAS)
temp_file = tempfile.NamedTemporaryFile(delete=False, suffix='.jpg')
temp_filename = temp_file.name
im_resized.save(temp_filename, dpi=(300, 300))
return temp_filename
def image_smoothening(img):
ret1, th1 = cv2.threshold(img, BINARY_THREHOLD, 255, cv2.THRESH_BINARY)
ret2, th2 = cv2.threshold(th1, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)
blur = cv2.GaussianBlur(th2, (1, 1), 0)
ret3, th3 = cv2.threshold(blur, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)
return th3
def remove_noise_and_smooth(file_name):
img = cv2.imread(file_name, 0)
filtered = cv2.adaptiveThreshold(img.astype(np.uint8), 255, cv2.ADAPTIVE_THRESH_MEAN_C, cv2.THRESH_BINARY, 41,
3)
kernel = np.ones((1, 1), np.uint8)
opening = cv2.morphologyEx(filtered, cv2.MORPH_OPEN, kernel)
closing = cv2.morphologyEx(opening, cv2.MORPH_CLOSE, kernel)
img = image_smoothening(img)
or_image = cv2.bitwise_or(img, closing)
return or_image
Note: this should be a comment to Alex I answer, but it's too long so i put it as answer.
from "An Overview of the Tesseract OCR engine, by Ray Smith, Google Inc." at https://github.com/tesseract-ocr/docs/blob/master/tesseracticdar2007.pdf
"Processing follows a traditional step-by-step
pipeline, but some of the stages were unusual in their
day, and possibly remain so even now. The first step is
a connected component analysis in which outlines of
the components are stored. This was a computationally
expensive design decision at the time, but had a
significant advantage: by inspection of the nesting of
outlines, and the number of child and grandchild
outlines, it is simple to detect inverse text and
recognize it as easily as black-on-white text. Tesseract
was probably the first OCR engine able to handle
white-on-black text so trivially."
So it seems it's not needed to have black text on white background, and should work the opposite too.
You can play around with the configuration of the OCR by changing the --psm and --oem values, in your case specifically I will suggest using
--psm 3
--oem 2
you can also look at the following link for further details
here
I guess you have used the generic approach for Binarization, that is the reason whole image is not binarized uniformly. You can use Adaptive Thresholding technique for binarization. You can also do some skew correction, perspective correction, noise removal for better results.
Refer to this medium article, to know about the above-mentioned techniques along with code samples.
For wavy text like yours there is this fantastic Python code on GitHub, which transforms the text to straight lines: https://github.com/tachylatus/page_dewarp.git (this is the most updated version of MZucker's original post and the mechanics are explained here:https://mzucker.github.io/2016/08/15/page-dewarping.html)
Basically, I want to be able to compare two histograms, but not of whole images just specific areas. I have image A and have a specific rectangular region on it that I want to compare to another image B. Is there a way to get the histogram of a definable rectangular region on an image? I have the x y position of the rectangular area, as well as it's width and height, and want to get its histogram. I'm using opencv with python.
Sorry if that isn't very clear :(
(I'm setting up a program that takes a picture of a circuit board, and checks each solder pad for consistency with an image of a perfect board. If one pad is off, the program raises a flag saying that specific pad is off by x percent, not the whole board.
Note: The following is in C++ but I think it is not hard to find the equivalent functions for python.
You can find the histogram of an image using this tutorial. So for example for the lena image we get:
In your case, since you have the rectangle coordinates, you can just extract the ROI of the image:
// C++ code
cv::Mat image = cv::imread("lena.png", 0);
cv::Rect roiRect = cv::Rect(150, 150, 250, 250);
cv::Mat imageRoi = image(roiRect);
and then find the histogram of just the ROI with the same way as above:
Is this what you wanted (in theory at least) or I misunderstood?
I have this project where I need (on iOS) to detect simple geometric shapes inside an image.
After searching the internet I have concluded that the best tool for this is OpenCV. The thing is that up until two hours ago I had no idea what OpenCV is and I have never even remotely did anything involving image processing. My main experience is JS/HTML,C#,SQL,Objective-C...
Where do I start with this?
I have found this answer that I was able to digest and by reading already other stuff, I understand that OpenCV should return an Array of shapes with the points/corners, is this true? Also how will it represent a circle or a half circle?
Also what about the shape orientation?
Do you know of any Demo iOS project that can demonstrate a similar functionality?
If you have only these regular shapes, there is a simple procedure as follows :
Find Contours in the image ( image should be binary as given in your question)
Approximate each contour using approxPolyDP function.
First, check number of elements in the approximated contours of all the shapes. It is to recognize the shape. For eg, square will have 4, pentagon will have 5. Circles will have more, i don't know, so we find it. ( I got 16 for circle and 9 for half-circle.)
Now assign the color, run the code for your test image, check its number, fill it with corresponding colors.
Below is my example in Python:
import numpy as np
import cv2
img = cv2.imread('shapes.png')
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
ret,thresh = cv2.threshold(gray,127,255,1)
contours,h = cv2.findContours(thresh,1,2)
for cnt in contours:
approx = cv2.approxPolyDP(cnt,0.01*cv2.arcLength(cnt,True),True)
print len(approx)
if len(approx)==5:
print "pentagon"
cv2.drawContours(img,[cnt],0,255,-1)
elif len(approx)==3:
print "triangle"
cv2.drawContours(img,[cnt],0,(0,255,0),-1)
elif len(approx)==4:
print "square"
cv2.drawContours(img,[cnt],0,(0,0,255),-1)
elif len(approx) == 9:
print "half-circle"
cv2.drawContours(img,[cnt],0,(255,255,0),-1)
elif len(approx) > 15:
print "circle"
cv2.drawContours(img,[cnt],0,(0,255,255),-1)
cv2.imshow('img',img)
cv2.waitKey(0)
cv2.destroyAllWindows()
Below is the output:
Remember, it works only for regular shapes.
Alternatively to find circles, you can use houghcircles. You can find a tutorial here.
Regarding iOS, OpenCV devs are developing some iOS samples this summer, So visit their site : www.code.opencv.org and contact them.
You can find slides of their tutorial here : http://code.opencv.org/svn/gsoc2012/ios/trunk/doc/CVPR2012_OpenCV4IOS_Tutorial.pdf
The answer depends on the presence of other shapes, level of noise if any and invariance you want to provide for (e.g. rotation, scaling, etc). These requirements will define not only the algorithm but also required pre-procesing stages to extract features.
Template matching that was suggested above works well when shapes aren't rotated or scaled and when there are no similar shapes around; in other words, it finds a best translation in the image where template is located:
double minVal, maxVal;
Point minLoc, maxLoc;
Mat image, template, result; // template is your shape
matchTemplate(image, template, result, CV_TM_CCOEFF_NORMED);
minMaxLoc(result, &minVal, &maxVal, &minLoc, &maxLoc); // maxLoc is answer
Geometric hashing is a good method to get invariance in terms of rotation and scaling; this method would require extraction of some contour points.
Generalized Hough transform can take care of invariance, noise and would have minimal pre-processing but it is a bit harder to implement than other methods. OpenCV has such transforms for lines and circles.
In the case when number of shapes is limited calculating moments or counting convex hull vertices may be the easiest solution: openCV structural analysis
You can also use template matching to detect shapes inside an image.