Arabic OCR in .Net - opencv

I used Tesseract and trained it with complete word as character, How chinese OCR are doing. But this kills me to make my own fonts and its a time consuming and slow process. This approach is good for some scenario but I wanted to trained tesseract based on arabic characters.
Or Suggest me which can help me developed my own arabic ocr with or without Tesseract.
I have researched on OpenCV but it didnt go well.
I will highly appreicate your quick response.

Tesseract has pre-trained files for a lot of languages, here is the Arabic one.

This is a very old question, but for whoever is looking for the same, now tesseract 4 comes with pre-trained Arabic data alongside many other languages which can be found here
And here is a demo of Arabic OCR based on tesseract 4, you can see how accurate it becomes now.

Related

Tesseract OCR iOS detect text from handwritten form and autofill online form with the text

I have used Tesseract for extracting text from scanned documents and I am able to fetch text from scanned documents. Now I want to extract text from a handwritten form (Hard copy) and use that text to autofill my online form (soft copy of the same handwritten form).
Anybody knows how to do that?
Thanks in advance for the help.
Tesseract OCR is quite powerful, but does have the following limitations:
Unlike some OCR engines (like those used by the U.S. Postal Service to sort mail), Tesseract is unable to recognize handwriting and is limited to about 64 fonts in total.
Tesseract requires a bit of preprocessing to improve the OCR results; images need to be scaled appropriately, have as much image contrast as possible, and have horizontally-aligned text.
Finally, Tesseract OCR only works on Linux, Windows, and Mac OS X.
Original article :
https://www.raywenderlich.com/93276/implementing-tesseract-ocr-ios

How to detect character in natural text image?

I have a project about Character Recognition (using openCV libraries).
I don't know how to detect character in text image.
Can you recommend some methods to do this?
Thanks all!
Here is a tutorial, it is dated and uses the C-style API though. This online book has a bunch related to OCR using OpenCV in chapter 5. Many people have done work intergrating tesseract (an OCR engine) with OpenCV, so you might want to check that out.

Keyword extraction from short dutch texts

I would like to extract keywords from short dutch texts. Is there an API for this or some library which i could use.
In case those are not available for dutch, any tips on how to extract them myself are also appreciated. I already tried it myself by running the texts through a part of speech tagger and lemmatizer. But from then on i find it quite difficult to extract decent keywords. TF-IDF is not useful sice texts are too short to get good results.
I prefer Java, but any other language implementations are also very welcome.
Here is my video series on text mining with RapidMiner. It shows how to easily get the TF-IDF and more:
http://vancouverdata.blogspot.ca/2010/11/text-analytics-with-rapidminer-loading.html

What libraries can I use to modify a video?

I'm new to video processing and I'm wondering what libraries I can use to do things like detecting letters, drawing boxes around them and so on. If you can name me a couple of good ones, I'd appreciate it very much!
OpenCV: (Open Source Computer Vision) is a cross-platform library of programming functions for real time computer vision.
It provides interfaces for both C and C++ programming laguages.
As for detecting the text region and drawing boxes around it, you can take a look at this article, which explains how to do this stuff using OpenCV. For better OCR capabilities I think that tesseract is the best open source tool available right now.
I've worked on a similar project some time ago and used OpenCV to detect the text region and then tesseract to do proper text recognition.

OCR for Devanagari (Hindi / Marathi / Sanskrit)

Does anybody have any idea about any recent work being done on optical character recognition for Indian scripts using modern Machine Learning techniques ? I know of some research being done at ISI, calcutta, but nothing new has come up in the last 3-4 years to the best of my knowledge, and OCR for Devanagari is sadly lacking!
FYI: There's an article in the New York Times from 2003 referencing a tool called ILT.
This is surely too old to be useful, but is cool: a video of the Ingalls speaking on Sanskrit and OCR. (Daniel H. H. Ingalls, Sr., Sanskrit professor and translator, and his son Dan Ingalls, computer scientist involved with Smalltalk etc.) The first half is Ingalls Sr. describing a project to automatically analyze text, and the second is by Ingalls Jr. describing how he implemented OCR for Sanskrit from scratch.

Resources