I am using Tesseract for text recognition.
How can I simply recognize padding between text and create e.g. pdf or .doc file with the same padding?
Let's say that the source page contains 3 columns with some text (like a news paper). How can I recognize this text with appropriate padding and margin to each other and to page?
Maybe you can suggest example or library that does the same or just algorithm?
Related
Can I put a UIButton or UITextField or UITextView between the texts in Swift
examples:
"What is your **[UITextField]** about downloading free music files from the Internet?**[UIButton]**"
"What is your
**[UITextField]** about
downloading free
music files from the
Internet?**[UIButton]**"
Like "example2", the layout may change depending on the situation.
Not a particularly easy task. You can use attributed text to stylize a word and make it "tappable" (like a hyperlink on a webpage), but that wouldn't solve the issue of embedding an editable text field in the string.
One approach would be to embed "placeholder" text in your string, then find the bounding box for that word and overlay a textfield or button. You'd have to be sure to account for things like word wrap, and it would take some experimentation to get the widths right.
So, you might set the text of your label to:
What is your UITextFieldGoesHere about downloading free music files from the Internet? UIButtonHere
Then use code to find the bounding box / rect of UITextFieldGoesHere and position a text field on top of that, so it covers the word and looks like it is inline. Same thing with the UIButtonHere.
If your button might be simply OK, and you don't want it wide with left-right padding, change that placeholder in your string to something like OKB ... just make sure it is unique so you can find it.
Lots of examples out there for finding the bounding box / rect of a word in a label... use Google (or your favorite search engine) to search for:
uilabel find bounding box of specific word
I'm looking for technique to detect text on document.
For example on plain .txt file it's easy: There are many libraries, API's & SDK's for image processing and usually they have methods implementing OCR's algorithms.
But discussing "complex" printed document (structure of the document is well known & deterministic), for example the summary page of pension program annually report: I want to extract only the "bottom line" number. I know there is the header in the top center, in the middle some table, in the bottom left some paragraph and in the bottom right the paragraph I'm looking for.
What is the approach to extrac text from the document grouped & associated with it's location on the document? The main task here is a technique analysing the structure of the document versus pre defined structure, and when we know that we are now working on some specific paragraph - Well from here it's easy - apply standard mentioned above OCR API and collect the data in your custom data structure.
For example linked document (page 1): What is the approach such that every time I apply pure OCR API I know exactly on what part from the pre defined template I work? The document template has:
Top section devided into 3 horizontal parts.
Middle section: Title and then first table, another title and then another table.
Bottom section: some text on right corner.
Thanks,
I have cells underneath figures in an ipython notebook that contain figure caption text. I would like them to be centre('center')-aligned. I use "< center >" in the markdown, which gives exactly the appearance I'm after in the notebook. But when nb-converting to latex, the text gets shunted over to the left.
So is there a way to get nbconvert to recognize text alignment in markdown cells when converting to latex?
Thanks.
You have actually asked two different questions:
is there a way to get nbconvert to recognize text alignment in markdown cells
figure caption (centering) in nbconvert
ad 1)
To convert the markdown to latex pandoc is used. Unfortunately, pandoc removes raw html from markdown if converted to latex (if also removes raw latex when converting markdown to html).
So it is not that straight forward to use html tags to format the output in both html and latex. This formatting may be achieved based cell metadata but that is not that trivial currently.
ad 2)
Nevertheless it is possible to create caption like text to work with html and latex.
Here we have to distinguish between caption for pyout or stream data (e.g. Ipython.display.Image) and markdown images.
pyout and stream
A possible approach is to create a Caption class like
class Caption():
def __init__(self,s):
self.s = s
def _repr_html_(self):
return '<center>{0}</center>'.format(self.s)
def _repr_latex_(self):
return '\\begin{center}\n'+self.s+'\n\\end{center}'
which is called after the image. Note that both should be called with the IPython.display.display method, e.g as oneliner
display(Image('image.jpg'),Caption('Figure Caption'))
This approach allows process the captiontext with python, e.g. to add figure numbers.
If you want to add such a caption to a matplotlib plot, it is a bit more tricky as the wrong ordering has to be overcome. A possible approach is to plot using this snippet
%matplotlib inline
plt.plot([1,2])
f=plt.gcf()
plt.close()
display(f,Caption('Plot'))
It may be noted the the default latex template of IPython 1.x doesn't play well with this approach, as here, image and caption are only loosely coupled and thus, vertical space might be included during latex compiling. The latex_basic template works much better. In IPython master the default templates are working fine.
markdown images
Markdown allows to use images like
![Caption](/files/image)
When converting to latex pandoc can take the Caption part and create a real latex caption.
Similar, when converting to html the caption gets embedded in a caption class to be easily styleable using css.
However, currently IPython requires a "/files/" prefix which is currently not removed, thus the image file won't be found by latex. (Fixed by now)
Be aware that these markdown image calls do not embed but only link the image into the ipynb file, therefore, the image has to remain available.
I'm attempting to draw a richly laid out text view on iPhone that features:
Custom paragraph spacing (kCTParagraphStyleSpecifierParagraphSpacing)
Custom paragraph first-line indentation (kCTParagraphStyleSpecifierFirstLineHeadIndent)
Justified alignment (kCTParagraphStyleSpecifierAlignment)
Finally, a drop cap on my first paragraph
I'm using OHAttributedLabel. The first three points I achieved without much trouble by just setting some paragraph style attributes on my NSAttributedString.
The drop cap I managed to implement by hacking OHAttributedLabel:
Cut out a rectangular region out of the main paragraph's CGMutablePathRef the size of the drop cap by adding an extra CGPathAddRect, as detailed in this excellent blog post.
Drawing the large character in this region with an extra CTFrameDraw call.
My problem: The paragraph styles and the custom text path are incompatible. When I cut a rectangular chunk out of the main text's path, all the paragraph styles seem to get thrown away.
Does anyone know a way to make them work together? Or can anyone think of another way to implement drop caps? (Short of using a UIWebView + CSS, which I'd rather not have the overhead of!)
Thanks!
You can use straight Core Text to achieve this, in the following post I explain the use of 2 framesetters to lay out text with drop caps in a UIView. In the code example (there's also a link to a github repo) you'll be able to see where the paragraph styles are created and applied to the main text view.
https://stackoverflow.com/a/14639864/1218605
I'm trying to put together a LaTeX color box. The xcolor package \fcolorbox seem to be what I want, but I can't get the rendering quite correct. When I use
\fcolorbox{black}{red}{}
it renders a small box sunken to the bottom of the text line. The best I've managed to do is to fake it with a similar text color:
\fcolorbox{black}{red}{\textcolor{red}{--}}
However, I'm worried that this won't render correctly in all situations with defined colors. Is there a way I can declare an empty text box with full in-line text height? Is there another solution?
I'm basically looking for the code that produces the color boxes all through the document at ftp://ftp.dante.de/pub/tex/macros/latex/contrib/xcolor/xcolor.pdf. The boxes I'm referring to are used throughout, but the first instance is on page 4. Thanks.
The xcolor.dtx file in the same directory as the pdf contains the source for the package and the source for the documentation. The relevant bits from the source for the documentation:
\def\testclr#1#{\#testclr{#1}}
\def\#testclr#1#2{{\fboxsep\z#\fbox{\colorbox#1{#2}{\phantom{XX}}}}}
...
(Answer: 40\% \testclr{green} $+$ 60\% \testclr{yellow} $=$ \testclr{green!40!yellow}, e.g., |\color{green!40!yellow}|)
Basically, use \phantom{} on the contents of your color box, and make sure that at least one of the phantom characters is full-height.
Also, https://tex.stackexchange.com/