I am trying to read text from an image using Microsoft Cognitive Vision service. Sample of my request as below.
POST https://centralindia.api.cognitive.microsoft.com/vision/v1.0/ocr?language=unk&detectOrientation =true HTTP/1.1
Host: centralindia.api.cognitive.microsoft.com
Content-Type: application/json
Ocp-Apim-Subscription-Key: <subscription key>
{"url":"https://i.imgur.com/tu8fNUM.png"}
But the result returns blank. If I use tesseract to read this data, I get 25178 as text as expected. I do not want to go with tesseract as I would have a dependency on tesseract installable as pre-requisite.
Can anyone please help me understand what I am doing wrong here.
Also, attaching original and scaled image (this one:
as well for reference.
Original:
Scaled:
You are not doing something wrong: I tried with the link you provided and go nothing detected, like you.
Looks like the OCR capability is not good enough to detect this value
Related
I see that ImageMogr2 is some kind of tool used by qiniu.com (a chinese hosting provider), If some one could help me understand what it is and what similar tech e have with any other hosting provider available.
Yes.
You may see a very similar service provided by tencent cloud has exactly the same name.
its an image processing utility that can scale, crop, rotate images on-the-fly using URI-programming, which means, defining the image processing command and parameters in the request URIs and you'll get the cropped images based on the original image you uploaded before.
You can easily get their documentations and some simple examples on their website.
e.g. https://developer.qiniu.com/dora/api/1270/the-advanced-treatment-of-images-imagemogr2
but not sure if you can read Chinese.
there are similar solutions provided by a us company. e.g.
https://cloudinary.com/
I am writing a 'clean-room' program that requires parsing/unparsing of jpegs. I have found all the information I need to parse/unparse baseline jpegs, but I cannot find the information that I need to parse/unparse progressive jpegs.
I need to be able to convert the compressed data to macroblocks and back, so most available frameworks are too high level. I also want to understand what is going on, hence the 'clean room' approach.
Can anybody help me please? A specification of the SOF1 header would be useful, as would be the layout of the compressed data in the scan segment.
Thanks in advance.
If you want to figure this out, I'd get this book:
https://www.amazon.com/Compressed-Image-File-Formats-JPEG/dp/0201604434/ref=sr_1_5?ie=UTF8&qid=1486949641&sr=8-5&keywords=jpeg
It explains it all in easy-to-understand terms. The author has source code at
http://colosseumbuilders.com/sourcecode/imagelib403.zip
that is designed to be easy to understand.
The SOF1 header is the same as all other SOF headers. You need to have a copy of the JPEG standard (as obtuse as it is). The other sources above will help you get through it.
I have tried to implement the ellipse fitting algorithm descibed in the following paper: “ElliFit: An unconstrained, non-iterative, least squares
based geometric ellipse fitting method”, by Prasad, Leung, Quek. A free version can be downloaded online from http://azadproject.ir/wp-content/uploads/2014/07/2013-ElliFit-A-non-constrainednon-iterative-least-squares-based-geometric-Ellipse-Fitting-method.pdf
The authors did not provide any publicly available implementation.
I have implemented the algorithm in Mathematica, I believe I have implemented it correctly, yet it fails to correctly find the fit parameters. The PDF of the experiment can be downloaded here: http://zvrba.net/downloads/ElliFit-fail-example.pdf
Did somebody else try to implement this particular algorithm and, if yes, what is the key to get it working? Is there a "bug" in the paper? Can somebody take another look at my implementation and see whether there's a bug there?
I know it's been almost a year since this question, but it seems that the authors have now provided public source code for ElliFit, both a MATLAB version and an OpenCV version.
Both are available on the the author's homepage. In case the homepage goes offline for some reason, both source codes are shared on Google and are available here (MATLAB) and here (OpenCV).
At the time of writing, I have not personally tested their code, but am planning to use them for a project. I will post any updates here in the next few days.
EDIT:
I got around to test the code sooner than I expected. I gave the OpenCV code a try. It works pretty well, as demonstrated by the image below (ignore the "almost-closed-ellipses". It's an artifact caused by something else in my code).
As you can see, it works pretty well, most of the times. There are some failure cases too (the small ellipse on the spray bottle next to the cup).
I am interested in making an app that reads "text" from a file and prints it into a label.
I would receive the data from a server and the label will print that text.
Does anyone have basic websites, documents I should take a look at?
I need to get a starting point. NOT interested in any code for I am doing this as a learning experience. Do you know of a good place to start?
Thanks!
Start by taking a look at NSURLConnection. You can find a basic outline (and code) here. There are lots of examples around this if you Google it.
I downloaded the EverNote API Xcode Project but I have a question regarding the OCR feature. With their OCR service, can I take a picture and show the extracted text in a UILabel or does it not work like that?
Or is the text that is extracted not shown to me but only is for the search function of photos?
Has anyone ever had any experience with this or any ideas?
Thanks!
Yes, but it looks like it's going to be a bit of work.
When you get an EDAMResource that corresponds to an image, it has a property called recognition that returns an EDAMData object that contains the XML that defines the recognition info. For example, I attached this image to a note:
I inspected the recognition info that was attached to the corresponding EDAMResource object, and found this:
the xml i found on pastie.org, because it's too big to fit in an answer
As you can see, there's a LOT of information here. The XML is defined in the API documentation, so this would be where you parse the XML and extract the relevant information yourself. Fortunately, the structure of the XML is quite simple (you could write a parser in a few minutes). The hard part will be to figure out what parts you want to use.
It doesn't really work like that. Evernote doesn't really do "OCR" in the pure sense of turning document images into coherent paragraphs of text.
Evernote's recognition XML (which you can retrieve after via the technique that #DaveDeLong shows above) is most useful as an index to search against; the service will provide you sets of rectangles and sets of possible words/text fragments with probability scores attached. This makes a great basis for matching search terms, but a terrible one for constructing a single string that represents the document.
(I know this answer is like 4 years late, but Dave's excellent description doesn't really address this philosophical distinction that you'll run up against if you try to actually do what you were suggesting in the question.)