How to detect text in a photo - opencv

I am researching into the best way to detect test in a photo using open source libraries.
I think the standard way is as follows (note: steps 1 - 4 all use OpenCV):
1) detect outline of document
2) transform document so it's flat and cropped, using said outline
3) Make the background of document white, using a filter
4) Feed resulting image to Tesseract
Is this the optimum process, or is there a better way, or better tools?
Also, what happens for case if the photo doesn't have a document outline (It's possible that step 1 & 2 are redundant)?
Is there anyway to automatically detect document orientation (i.e. portrait / landscape)?

I think your process is fine. I've used a similar process for an Android project.
I think that the only way you can discover if a document is portrait/landscape is to reason with the length of the sides of the bounding box of your outline.
I don't think there's an automatic way to do this, maybe you can find the most external contour approximable with a 4 segment polyline (all doable in opencv). In order to get this you'll have to work with contour hierarchy and contous approximation (see cv2.approxPolyDP).
This is how I would go for automatic outline detection. As I said, the rest of your algorithm seems just fine to me.
PS. I'll leave my Android project GitHub link. I don't know if it can be useful to you, but here I specify the outline by dragging some handles, then transform the image and feed it to Tesseract, using Java and OpenCV. Yeah It's a very bad idea to do that in the main thread of an Android app and yeah, the app is not finished. I just wanted to experiment with OCR, so I didn't care much of performance and usability, since this was not intended to use, but just for studying.

Look up the uniform width transform.
What this does is detect edges which have more or less the same width with respect to their opposite edge. So things like drainpipes (which can be eliminated at a later pass) but also the majority of text. Whilst conceptually it's similar to a distance transform, the published method uses rather ad hoc normal projection methods and Canny edge detection.

Related

How to restrict findTransformEcc to a partial affine transform with scale but without shear?

I built a stereoscopic camera mobile app which performs automatic alignment using findTransformEcc and the app is working pretty well with it. I know I should probably be using rectifyStereoUncalibrated preceded by keypoints and descriptors etc. etc. but I get bad results from that despite many different approaches attempted and I'm super frustrated. So instead, I'm sticking with findTransformEcc (at least for now). At the moment I'm using MotionType.Euclidean (restricted to translations and rotations) but I would like to change that.
So far, the app has worked by having the user take one picture and move to the side to capture the next (chacha method). But now I'm adding the ability to have two phones capture simultaneously. The problem is that the focal length and sensor size (angular field of view) may be different between the two cameras, so in order to align the two pictures I need to allow scaling/zooming. However, if I want to do that with findTransformEcc I can only step up from Euclidean to Affine, it seems like I can't go between. That is, it seems I cannot allow scaling without also allowing shearing, and I don't want shearing.
As another way to explain this, I'd like to get the type of transform that you can get from estimateRigidTranform(array,array,FALSE) (partial affine) but rather than using keypoints as that function does, I want to use findTransformEcc because from my experimentation it just seems to be more reliable.
(https://github.com/KRA2008/crosscam/blob/develop/AutoAlignment/OpenCV.cs is the auto-alignment code if that helps at all)
Take a look at Fourier-Mellin transform based approach: https://github.com/Smorodov/LogPolarFFTTemplateMatcher
It will give you offset, scale and rotation parameters, nothing more.

Best approach for coding a painting app on iOS / iPad

I’m trying to build a drawing/painting app for the iPad, with textured brush tips and paper.
So far, all drawing app example codes I've come across seem to work by stroking a path. However, I'd like to actually apply a texture all along the path, to simulate say, an oil brush, or charcoal.
Here is an example of a brush tip texture: Bursh tip
The result when painting with the same brush tip: Result
In the results, the top output is what it looks like when the "brush tip" texture is applied far apart along the path.
The bottom result is the texture applied with very small steps along the path. Those who've worked in Photoshop with custom brushes will find this familiar.
I had once prototyped this in Processing years ago (I've since lost the source code), and got it to work in real-time.
In Processing, I converted both the brush tip PNG and the canvas (or the image I'm painting on to) into an array of integers. Then, I simply copied the values from the brush tip to the canvas texture, at the appropriate index. At the end of the cycle, I displayed the image, for that time-step. Repeat this dozens of times in-between each point returned by the mouse.
How would I approach this in iOS, and in real-time? I tried this (https://blog.avenuecode.com/how-to-use-uikit-for-low-level-image-processing-in-swift) but it's way too slow.
This makes me believe Metal might be the only way forward. Is that true, or am complicating this unnecessarily?
Thank you for any guidance!
PS. I'm coding in Swift 5, targeting iOS 13, in Xcode 11.5.
Welcome!
I recommend you check out Core Image. It's Apple's framework for image processing (on a higher level than Metal, though it can integrate with Metal). Unfortunately, the documentation is a bit out-dated, but I'm sure you can translate it into Swift.
Here Apple describes how you would realize a painting app with Core Image and here you can download the corresponding sample project.

Complex Number App - graphing with core-plot, power-plot or else?

I'm coding iOS app that will explain complex numbers to the user. Complex numbers can be displayed in Cartesian coordinates and that's what I want to do; print one or more vectors on the screen.
I am looking for the easiest way to print 3 vectors into a coordinate system that will adjust itself to the vector-size (if x-coord is > y-coord adjust both axis to x-coord and vice versa).
I tried using Core Plot, which I think is way too multifunctional for my purpose.
Right now I am working with PowerPlot and my coordinate system looks okay already, but I still encounter some problems (x- and y-axis are set to the x and y values which results in a 45 degree angled line, no matter the user input).
The functionality of the examples in CorePlot and PowerPlot don't seem to meet my needs.
My last two approaches were using HTML and a web view, and doing it all myself with Quartz (not the simple way...)
Do you have any advice how to do this the simple way, as it is a simple problem, I guess?
If you're not wanting to do much actual graphing and plotting, then using Core Plot or similar sounds like overkill to me. The extra bloat of adding coreplot to your project, not to mention the time taken for you to understand how to use it, might not be worth it for some simple graphics.
Quartz is well equipped for the job of showing a few vectors on the screen, assuming you're not interested in fancy 3D graphics. There are plenty of tutorials and examples of using Core Graphics (AKA Quartz) to draw lines etc. If you're going the Quartz route, perhaps get some simple line drawing going in Quartz, then ask more questions if you need help with the maths aspect of it.
The typical technique used when rendering with Quartz is to override drawRect in a subclass of UIView and place calls to Core Graphics drawing functions in there.
A decent question and example of Quartz line drawing is here:
How do I draw a line on the iPhone?
If you aren't adverse to using Google Chart Image you can load reasonably complex data sets in a simple manner by calling the appropriate URL and then putting the image in a UIImageView. It takes very little code: here is a blog post explanation with sample code.
The limitations are
length of the data set is restricted by the max URL length you can request from Google (2048 characters, with encoding is large), though I've plotted with 120 data points in 4 series.
a net connection is required (at least to get the initial chart)
and perhaps the biggest problem, API is deprecated and will be discontinued in 2015 at some point. You would then have to switch to the UIWebView/Javascript Google Chart API implementation...
Sample image:

OpenCv Issue of Image Subtraction?

i am trying to subtract 2 image using the function cvAbsDiff(img1, img2, dest);
it working but sometimes when i bring my hand before my head or body the hand is not clear and background comes into picture... the background image(head) overlays my foreground.(hand)..
it works correctly on plain surfaces i.e when the background is even like a wall.
please check out my image...so that you can better understand my problem...!!!!
http://www.2shared.com/photo/hJghiq4b/bg_overlays_foreground.html
if you have any solution/hint please help me.......
There's nothing wrong with your code . Background subtraction is not a preffered way for motion detection or silhoutte detection because its not very robust.The problem is coming because both the background and the foreground are similar in colour at many regions which on subtractions pushes the foreground to back . You might try using
- optical flow for motion detection
- If your task is just detecting silhoutte or hand try training a HOG classifier over it
In case you do not want to try a new approach . You may try around playing with the threshold value(in your case 30).So when you subtract similar colour image there difference is less than 30 . And later you threshold with 30 so it just blacks out. Also you may try HSV or some other colourspace as well .
Putting in the relevant code would help. Also knowing what you're actually trying to achieve.
Which two images are you subtracting? I've done subtracting subsequent images (so, images taken with a delay of a fraction of a second), and the background subtraction generally results in the edges of moving objects, for example the edges of a hand, and not the entire silhouette of a hand. I'm guessing you're taking the difference of the current frame and a static startup frame. It's possible that parts aren't different enough (skin+skin).
I've got some computer problems tonight, I'll test it out tomorrow (pls put up at least the steps you actually carry thorough though) and let you know.
I'm still not sure what your ultimate goal is, although I'm guessing you want to do some gesture-recognition (since you have a vector called "fingers").
As Manpreet said, your biggest problem is robustness, and that is from the subjects having similar color.
I reproduced your image by having my face in the static comparison image, then moving it. If I started with only background, it was already much more robust and in anycase didn't display any "overlaying".
Quick fix is, make sure to have a clean subject-free static image.
Otherwise, you'll want to have dynamic comparison image, simplest would be comparing frame_n with frame_n-1. This will generally give you just the moving edges though, so if you want the entire silhouette you can either:
1) Use a different segmenting algorithm (what I recommend. Background subtraction is fast and you can use it to determine a much smaller ROI in which to search, and then use a different algorithm for more robust segmentation.)
2) Try to make a compromise between the static and dynamic comparison image, for example as an average of the past 10 frames or something like that. I don't know how well this works, but would be quite simple to implement, worth a try :).
Also, try with CV_THRESH_OTSU instead of 30 for your threshold value, see if you like that better.
Also, I noticed often the output flares (regions which haven't changed switch from black to white). Checking with the live stream, I'm quite certain it because of the webcam autofocusing/adjusting white balance etc.. If you're getting that too, turning off the autofocus etc. should help (which btw isn't done through openCV but depends on the camera. Possibly check this: How to programatically disable the auto-focus of a webcam?)

recognize the moving objects and differentiate them from the background?

iam working in a project that i take a vedio by a camera and convert this vedio to frames (this part of project is done )
what iam facing now is how to detect moving object in these frames and differentiate them from the background so that i can distinguish between them ?
I recently read an awesome CodeProject article about this. It discusses several approaches to the problem and then walks you step by step through one of the solutions, with complete code. It's written at a very accessible level and should be enough to get you started.
One simple way to do this (if little noise is present, I recommend smoothing kernel thought) is to compute the absolute difference of two consecutive frames. You'll get an image of things that have "moved". The background needs to be pretty static in order to work. If you always get the abs diff from the current frame to the nth frame you'll have a grayscale image with the object that moved. The object has to be different from the background color or it will disappear...

Resources