I am trying to create a simple tool that uses this website's functionality http://cat.prhlt.upv.es/mer/ which parses some strokes of text to a math formula. I noticed that they mention that it converts the input to InkML or MathML.
Now I noticed that according to this link: Tradeoff between LaTex, MathML, and XHTMLMathML in an iOS app? you can use MathJax to convert certain input to MathML.
What I need clarification/assistance with is how can I take input (say from finger strokes) or a picture and then convert it to a format in which I can provide this website from an iOS device and read the result at the top of the page. I have done everything regarding taking a picture or drawing an equation on an iPhone but I am just confused how I can take that and feed it to this site in order to get a result.
Is this possible, and if so how?
I think there's a misunderstanding here. http://cat.prhlt.upv.es/mer/ isn't an API for converting images into formulae—it's just an example demonstration of the Seshat software tool.
If you're looking to convert hand-drawn math expressions into LaTeX or MathML (which can then be pretty printed on your device), you want to compile Seshat and then feed it, not the website, your input. My answer here explains how to format input for Seshat.
Related
Local travel cards in Saint-Petersburg, Russia have got huge id numbers that aren't easy to read and type into a web page when topping up the card online. So I want to build a small app that would take a photo of a travel card and parse the number out.
The task is a bit easier than a free form recognition:
card is of the very well known size
id numbers are of known size, are located in the very well known location on a card and they are number only, no letters (okay, there are two variations I think and maybe they will add 1-2 more in the future)
even the font is known in advance
even the first several numbers are the same for most of the card (so far there are only two prefixes used)
How would you do it? Are there any libraries tuned not for the general OCR, but for a "hinted" OCR like I need?
Best regards,
Artem.
P.S.
Actually a free/cheap web service for this task would also be good enough
Yes Google has a library called Tesseract and there is an iOS SDK on Github you can import into your application. So you can use this SDK and it has some documentation that you can read that will explain how to set it up in your app. It has methods that will return you a string with the text of the card in the string. BUT it will be ALL of the text from the card. So best thing to do would be to:
1 "clip" the original image to extract a sub image that displays only the portion of the card you wish to get the numbers from.
2 Process this sub image through Tesseract to retrieve the string you are looking for.
3 Then parse through the string and pick out the data that you need.
But just be warned, it can be a bit quirky. This SDK tends to recognize words best from images that are scanned, not "taken a picture of". Because although it is an advance piece of technology, it isn't perfect. So to get it to work as perfectly as possible for you, try to get scanned copies of the originals.
Best of luck.
The ideal solution for you would have three components:
1) Detection of the card. This is useful because if you have the detection, then the end users have much easier time actually using the scanner, because they can place the phone above the card in an arbitrary direction
2) Accurate OCR component. Ideally, customizable for this exact font you have on the card, for the exact position on the card.
3) Parsing mechanism. This would enable you to obtain the exact string written on the card without writing huge amount of OCR parsing code.
BlinkID SDK has all this. It has a preset for detection cards in the ID-1 format. It has integrated OCR engine. And it provides RegexParser, where you can define the exact format of the text which you're trying to extract from the document.
BlinkID was initially built for scanning ID documents which have very similar properties as the problem you're trying to solve.
Note. I'm one of the developers working on BlinkID.
I am programming a website on the subject of chemistry and for obvious reasons I also have to include structural and molecular formulas on that site. I want to have as few images as possible on the side and would therefore like to know how I can compile LaTeX code on my website, so I can show everything I could do in LaTeX itself.
Thanks in advance.
As outlined in a previous comment, Chemistry.SE has enabled mhchem in MathJax to allow the rendering of simple formula and reaction equations. The MathJax documention actually gives some directions.
As far as structures of organic molecules are concerned, I'm usually draw them using BkChem and export them as the png images.
If I understand you correctly, you would like to avoid the images themselves and not just the act of drawing. Therefore, the idea to generate the drawings from a linear representation (InChi, SMILES) using openbabel will probably not convince you.
As a matter of fact, it is possible to create structure in LaTeX using chemfig and there have been requests to support this package in MathJax. However, it seems that so far, the strong dependance of chemfig on TikZ has prevented this.
Right now, I am using TesseractOCR for iOS to scan images and convert them into text. I want to be able to find a word a highlight it in the original image, so I am thinking to scan the document word by word and look for the phrase or word passed in by the user. However, I can't find any resources on the tesseractOCR website that point me in this direction. So basically, I am looking to scan an image word by word so I can find a phrase. I need to be able to highlight the word on the original image which is why I think i should be should scan the original image word by word. Is there any way I can scan the original image word by word using tesseractOCR (probably involving detecting whitespace)? If so any relevant resources would be helpful. If I can't use tesseractOCR should I be using something else or is it not possible at all?
Thanks in advance.
TesseractOCR for iOS has an api call which returns recognized blocks by iterator level. You can set the iterator level to G8PageIteratorLevelWord to obtain words.
What's also important is that each recognized block has boundingBox property, which points directly to the location of the block on the image. You can use this for highlighting the word on the image.
If, after this, you want to find some specific phrase or word in obtained set of words, you will have to be a little more creative :) OCR results can contain errors, so you can use exact string matching, but fuzzy matching. Also, searching phrases (as opposed to searching just words), opens questions of layouting OCR results, because words in one phrase aren't always adjacent in OCR result.
Note: my company MicroBlink offes commercial OCR engine for mobile devices. On iOS you can easily try it using cocoapods
pod try PPBlinkOCR
BlinkOCR solves all of the problems above, and you can contact us for support while you use it.
Would it be possible to derive the text, images, and LaTeX equations from a particular website so that you can directly customize your own PDF without having the objects blurry? Only the image will have a fixed resolution.
I realize that there are a couple ways of generating a PDF indirectly. Attempting to render a PDF from Wolfram MathWorld on the Riemann Zeta Function, for instance, would be possible by printing and saving it as a PDF via Chrome, but as you zoom in more closely, the LaTeX equations and text naturally become blurry. I tried downloading "Wolfram's CDF Player," but it contains only the syntax for Mathematica's libraries - not the helpful explanations that the Wolfram MathWorld provides. What would be required for me to extract the text, images, and LaTeX equations in a PDF file wihtout having them blurry?
Unless you have access to the LaTeX source that was used to produce the images in a way that isn't apparent from your question, the answer is "you cannot." Casual inspection of the website linked implies that the LaTeX that is used to produce the equations is not readily available (it's probably on a backend system somewhere that produces the images that get put on the web server).
To a browser, it's just an image. The method by which the image was produced is irrelevant to how it appears on the web page, and how it would appear in a PDF (ie. more pixelated than desired).
Note that if a website uses a vector-graphics format like SVG instead of a pixel based format like PNG or JPEG, then those will translate to PDF cleanly, and will zoom nicely. That's a choice that would be made by the webmaster of the site in question.
Inspecting the source reveals that the gifs depicting each equation have alt-text that approximates the LaTeX that would render them (it might be Mathematica code--I'm not familiar with Wolfram's tools). Extracting a reasonable source wouldn't be impossible, but it would be hard. The site is laid out with tables, so even with something like beautiful soup parsing the HTML could be tricky. Some equations are broken up into different gifs, so parsing them would be even trickier. You'd also have to convert from whatever the alt-text is to LaTeX.
All in all, if you don't need to do a zillion pages, I'd suggest copy-pasting the text, saving the images, grabbing the alt-text of each image and doing the converting yourself.
For the given example, you could download the Mathematica notebook for that page. Maybe it is possible to parse something from that.
I got a bunch of .DOC documents. I'm not even positive they are Word documents, but even if they are, I need to open and parse them with eg. Python to extract information from them.
Problem is, I couldn't figure out how they were encoded: UltraEdit's Conversion function wouldn't correct the text no matter which encoding I tried. OpenOffice 3.2 also failed displaying the contents correctly (guessing Windows-1252).
Here's an example, hoping that someone knows what pagecode it is:
"lÕAssemblŽe gŽnŽrale" instead of "l'Assemblée générale"
Thank you for any tip.
Greenstone digital library http://www.greenstone.org/ provides pretty good text extraction from word documents, including encoding detection.
Running msword in server mode gives you a range of scripting options- I'm sure detecting the encoding will be possible.