I am currently developing a commercial software. I need to add the feature of chinese character and word detection but it seems the functions of Scene Text Detection can only detect english characters and words. I searched on google and nothing related showed up.
I will feed a scanned A4 paper image to the application for it to find some chinese words based on some pre-set conditions.For example, the image contains the word "你好"(it means "Hello" in chinese) twice but only extract it once and save it as a string when it meets the pre-set condition of it is next to the title of 姓名(Name).
Here is a small illustration of the example:
Greeting: 你好
姓名(Name): 你好 <--- this word detection only
Can someone please, who has decent experience with opencv or emgucv help me out?
If a custom dataset is needed in order to achieve my goal, can someone guide me on how to perform dataset training for word detection in opencv or emgucv.
(OpenCV or EmguCV is not your solution) You need Deep Neural NetWork(DNN) such as TensorFlow
Related
I want to detect the offers on a page in an offer-catalogue.
Example pages:
One offer consists of a price, a description and an image. I have marked an offer on each page with a red square.
Am I in over my head?
Any suggestions are greatly appreciated,
Thanks!
This problem will require a lot of work using different technologies. I'll try to write a basic step-by-step process that you could choose to follow. I'd like to point out that this would not return a very high level of accuracy and would depend on how accurate each component is.
Extract contours from images: Opencv has a simple tutorial on contouring. Try to find the right number of iterations for the morphological transforms so that the image and the text next to it gets clubbed as one contour
Use OCR to extract text from these contours: Tesseract OCR is probably the best option you have. You might have to convert your image to binary or grayscale as a pre-processing step so improve your results.
Make a corpus which determines which text corresponds to a discount: Something as simple as a python list should do. Make a list of key words that indicate a promotional offer/discount. Map your OCR results to this corpus to determine if the text is talking about an offer.
I'm sorry but I cannot make out the small logos next to the descriptions in your images. If any small stamp/logo corresponds to a discount, you would try using template matching techniques also.
What is the best way to go about reading characters in a grid-like format, like a Sudoku puzzle? I tried using this tutorial to get started with Tesseract OCR, but it was pretty inconsistent even with similar fonts. I want to be able to read a pattern of characters and store it in a multidimensional array. How can I train Tesseract to do this like I would train it to interpret different fonts?
I would go with OpenCV (http://opencv.org/) to interpret the Sudoku grid and then I would recognise the numbers using Tesseract. You can check these tutorials: https://www.raywenderlich.com/93276/implementing-tesseract-ocr-ios, https://github.com/BloodAxe/OpenCV-Tutorial and https://github.com/aptogo/OpenCVForiPhone. I hope it helps you.
I'm a complete beginner in character recognition as well as machine learning in general.
I want to write a program which is able to process the following input:
A Chinese character (in either pixels of vector format), for example:
The decomposition of the previous character, ie for the example above:
and and the information that they are aligned horizontally.
The decomposition of a Chinese character is always 3 things: 2 other characters and the pattern describing how the 2 character form the initial character (it is called the compoisition kind). In the example above the composition kind is "aligned horizontally".
Given such an input, I want my program to tell which pixels or which contours in the initial character belongs to which subcharacter in its decomposition.
Where to start?
Well, I can't say that I provide a full answer but think about:
1) Reading the papers on how Google Translate app works. You know, when you point your iPhone's camera at text and it instantly translates the text (even preserving the fonts!). It supports the chineese language so it would be interesting for you to see if they solved similar task and how they did it
2) Another big question to answer - how to prepare your input data. You will need to provide at least some input data - i.e. decomposition of at least some characters. Try to do this manually for couple of characters and try to formalize what exactly you are doing - this will help you to better formulate what exactly you want your algorithm to do.
3) Try to use some deep neural net with your data from #2. Use something with convolution layers. Pre-train it with RBM (restricted boltzmann machine). After that - just take a really close look into the resulting neural network. Don't expect to get any good results, but looking into the ANN layers will help you to understand what the net have learned from data and might provide some insight into where to move next
I am doing an assignment about DSP. I want to apply DSP to OCR. I searched around the internet but found not much. Please give me some keywork or document about this. I don't know how to convert an image file into digital signals. I know that images are stored as digital data. But I don't know the connection between a file in computer and what I learnt in DSP (signal, transform, filter, ...).
Digital Signal Processing is the study of pattern processing by logical algorithmic processes.
It involves measurements, statistics, geometry, and many other easy digital tasks that you don't realize that you already know.
The patterns are often 2D graphs, 3D, Sounds, in a variety of bit depths and resolutions...
Question: How to convert an image file into digital signals? An image file is an X,Y, graph. You mostly read through them with loops. What did you expect?
You can represent some images as vectors. You can represent writings as vectors in a PC as well, like "L" is two vectors.
99 percent of common image DSP uses vectors and rasters.
So when you talk about OCR signal processing, have a think about how a human brain can recognize the letters.
Apprentice readers first locate the line and travel forwards through the line.
The they use the space around every letter to single out individual letters.
Then they compare the letters to their memory, to recall if it is A,a, B,b, C,c.
That gives you an idea of a way for a computer to do it.
find lines of characters. They have long horizontal spacings that are easy to see in an XY loop.
find elements in every line which are seperated by a vertical spacer.
compare that character using the simplest OCR science.
OCR represents only a millionth of DSP science, but OCR porbably has dozens and dozens PhD's, web tutorials and github projects for you to search in.
When you have recognized 1.2.3. steps of the process of your logic, you can compare other methods and learn from them, and then you will have read 2 pages about OCR. by the time you have scanned 10,20 or 100 pages on OCR you probably found some nicely illustrated web pages which chill you out and which make you a genius of DSP OCR.
If you are intending to use Matlab. This is the way to do it . you read the image and find its descriptors. Each letter in the alphabet should have its own unique descriptors .
I=imreadbw('data/box.jpg') ;
[f,d,gss,dogss] = sift(I,'verbosity',1,'boundarypoint',0,'threshold',.0282,'firstoctave',-1,'edgethreshold',0) ;
d = uint8(512*d) ;
I am developing an application to read the letters and numbers from an image using opencv in c++. I first changed the given colour image and colour template to binary image, then called the method cvMatchTemplate(). This method just highlighted the areas where the template matches.. But not clear.. I just dont want to see the area.. I need to parse the characters(letters & numbers) from the image. I am new to openCV. Does anybody know any other method to get the result??
Image is taken from camera. the sample image is shown above. I need to get all the texts from the LED display(130 and Delft Tanthaf).
Friends I tried with the sample application of face detection, It detects the faces. the HaarCascade file is provided with the openCV. I just loaded that file and called the method cvHaarDetectObjects(); To detect the letters I created the xml file by using the application letter_recog.cpp provided by openCV. But when I loading this file, it shows some error(OpenCV error: UnSpecified error > in unknown function, file ........\ocv\opencv\src\cxcore\cxpersistence.cpp,line 4720). I searched in web for this error and got the information about lib files used. I did so, but the error still remains. Is the error with my xml file or calling the method to load this xml file((CvHaarClassifierCascade*)cvLoad("builded xml file name",0,0,0);)?? please HELP...
Thanks in advance
As of OpenCV 3.0 (in active dev), you can use the built-in "scene text" object detection module ~
Reference: http://docs.opencv.org/3.0-beta/modules/text/doc/erfilter.html
Example: https://github.com/Itseez/opencv_contrib/blob/master/modules/text/samples/textdetection.cpp
The text detection is built on these two papers:
[Neumann12] Neumann L., Matas J.: Real-Time Scene Text Localization
and Recognition, CVPR 2012. The paper is available online at
http://cmp.felk.cvut.cz/~neumalu1/neumann-cvpr2012.pdf
[Gomez13] Gomez L. and Karatzas D.: Multi-script Text Extraction from
Natural Scenes, ICDAR 2013. The paper is available online at
http://refbase.cvc.uab.es/files/GoK2013.pdf
Once you've found where the text in the scene is, you can run any sort of standard OCR against those slices (Tesseract OCR is common). And there's now an end-to-end sample in opencv using OpenCV's new interface to Tesseract:
https://github.com/Itseez/opencv_contrib/blob/master/modules/text/samples/end_to_end_recognition.cpp
Template matching tend not to be robust for this sort of application because of lighting inconsistencies, orientation changes, scale changes etc. The typical way of solving this problem is to bring in machine learning. What you are trying to do by training your own boosting classifier is one possible approach. However, I don't think you are doing the training correctly. You mentioned that you gave it 1 logo as a positive training image and 5 other images not containing the logo as negative examples? Generally you need training samples to be in the order of hundreds or thousands or more. You cannot possibly train with 6 training samples and expect it to work.
If you are unfamiliar with machine learning, here is roughly what you should do:
1) You need to collect many positive training samples (from hundred onwards but generally the more the merrier) of the object you are trying to detect. If you are trying to detect individual characters in the image, then get cropped images of individual characters. You can start with the MNIST database for this. Better yet, to train the classifier for your particular problem, get many cropped images of the characters on the bus from photos. If you are trying to detect the entire rectangular LED board panel, then use images of them as your positive training samples.
2) You will need to collect many negative training samples. Their number should be in the same order as the number of positive training samples you have. These could be images of the other objects that appear in the images you will run your detector on. For example, you could crop images of the front of the bus, road surfaces, trees along the road etc. and use them as negative examples. This is to help the classifier rule out these objects in the image you run your detector on. Hence, negative examples are not just any image containing objects you don't want to detect. They should be objects that could be mistaken for the object you are trying to detect in the images you run your detector on (at least for your case).
See the following link on how to train the cascade of classifier and produce the XML model file: http://note.sonots.com/SciSoftware/haartraining.html
Even though you mentioned you only want to detect the individual characters instead of the entire LED panel on the bus, I would recommend first detecting the LED panel so as to localize the region containing the characters of interest. After that, either perform template matching within this smaller region or run a classifier trained to recognize individual characters on patches of pixels in this region obtained using sliding window approach, and possibly at multiple scale. (Note: The haarcascade boosting classifier you mentioned above will detect characters but it won't tell you which character it detected unless you only train it to detect that particular character...) Detecting characters in this region in a sliding window manner will give you the order the characters appear so you can string them into words etc.
Hope this helps.
EDIT:
I happened to chance upon this old post of mine after separately discovering the scene text module in OpenCV 3 mentioned by #KaolinFire.
For those who are curious, this is the result of running that detector on the sample image given by the OP. Notice that the detector is able to localize the text region, even though it returns more than one bounding box.
Note that this method is not foolproof (at least this implementation in OpenCV with the default parameters). It tends to generate false-positives, especially when the input image contains many "distractors".
Here are more examples obtained using this OpenCV 3 text detector on the Google Street View dataset:
Notice that it has a tendency to find "text" between parallel lines (e.g., windows, walls etc). Since the OP's input image is likely going to contain outdoor scenes, this will be a problem especially if he/she does not restrict the region of interest to a smaller region around the LED signs.
It seems that if you are able to localize a "rough" region containing just the text (e.g., just the LED sign in the OP's sample image), then running this algorithm can help you get a tighter bounding box. But you will have to deal with the false-positives though (perhaps discarding small regions or picking among the overlapping bounding boxes using a heuristic based on knowledge about the way letters appear on the LED signs).
Here are more resources (discussion + code + datasets) on text detection.
Code
Extracting text OpenCV
http://libccv.org/doc/doc-swt/
Stroke Width Transform (SWT) implementation (Python)
https://github.com/subokita/Robust-Text-Detection
Datasets
You will find the google streetview and MSRA datasets here. Although the images in these datasets are not exactly the same as the ones for the LED signs on buses, they may be helpful either for picking the "best" performing algorithm from among several competing algorithms, or to train a machine learning algorithm from scratch.
http://www.iapr-tc11.org/mediawiki/index.php/Datasets_List
See my answer to How to read time from recorded surveillance camera video? You can/should use cvMatchTemplate() to do that.
If you are working with a fixed set of bus destinations, template matching will do.
However, if you want the system to be more flexible, I would imagine you would need some form of contour/shape analysis for each individual letter.
You can also look at EAST: Efficient Scene Text Detector - https://www.learnopencv.com/deep-learning-based-text-detection-using-opencv-c-python/
Under this link, you have examples with C++ and Python. I used this code to detect numbers of buses (after detecting that given object is a bus).