Automatic text / HTML annotation / highlighting - machine-learning

Nowadays there are softwares which, when provided a text or a html document page, will output a summary.
I wonder if there exist anything to automatically annotate (or at least highlight) the same documents.
The idea is to be able to keep the full text, but highlight the most meaningful parts (somehow like a summarisation tool would do I guess). And maybe provide additional inferred insights (?)
Also I would like to know how it works if it exists :) Would it really be very different of summarization, or is it just the same principles with a different "output format"?
I'm looking for something to annotate HTML documents, like AnnotatorJS is designed for, looking like this:

This is not a complete answer, but it can lead to what you want. The first suggestion is looking at GATE. It provides a great annotation framework and as long as you don't want to program anything for it, it is easy to use. The second thing is to search for summarization plug-ins for GATE. GATE has been around for such a long time that I am sure someone has already implemented a summarization plug-in for it.

Related

How to store math equation/symbol and display them on the web?

I want to build a website where people can create tests with questions and answers . I want people can type in math equation/symbol and equations in a textbox or something like that, and they will be store in database, it'also displayed on the web like image.
My idea is i will store the text user input in latex syntax and store it, then display it using MathJax, i don't know it's possible or will have better way to do this.
And a problem is in user input will have normal text with "math text" (latex), so how can i separate them and only save the latex text? Please give me some idea or suggest the way to solve it, thanks.
p/s: i'm building this site in ruby on rails, i found the gem mathjax-rails but it seem not working.
Consider building off Gollum. It is the backend for the wiki system Github uses and works fairly well with LaTex equations (currently their is a very irritating bug with less/greater than symbols, but is documented and likely will be fixed in the next release). I start using it this summer to take notes in a math classes, an example of a full page of rendered LaTex equations notes is here here.
Note: You must be logged into Github in order for the equation to render.

EverNote OCR feature?

I downloaded the EverNote API Xcode Project but I have a question regarding the OCR feature. With their OCR service, can I take a picture and show the extracted text in a UILabel or does it not work like that?
Or is the text that is extracted not shown to me but only is for the search function of photos?
Has anyone ever had any experience with this or any ideas?
Thanks!
Yes, but it looks like it's going to be a bit of work.
When you get an EDAMResource that corresponds to an image, it has a property called recognition that returns an EDAMData object that contains the XML that defines the recognition info. For example, I attached this image to a note:
I inspected the recognition info that was attached to the corresponding EDAMResource object, and found this:
the xml i found on pastie.org, because it's too big to fit in an answer
As you can see, there's a LOT of information here. The XML is defined in the API documentation, so this would be where you parse the XML and extract the relevant information yourself. Fortunately, the structure of the XML is quite simple (you could write a parser in a few minutes). The hard part will be to figure out what parts you want to use.
It doesn't really work like that. Evernote doesn't really do "OCR" in the pure sense of turning document images into coherent paragraphs of text.
Evernote's recognition XML (which you can retrieve after via the technique that #DaveDeLong shows above) is most useful as an index to search against; the service will provide you sets of rectangles and sets of possible words/text fragments with probability scores attached. This makes a great basis for matching search terms, but a terrible one for constructing a single string that represents the document.
(I know this answer is like 4 years late, but Dave's excellent description doesn't really address this philosophical distinction that you'll run up against if you try to actually do what you were suggesting in the question.)

TeX: Add blank page after every content page

I'm currently writing my bachelor thesis and my university wants a one sided print. The printing and binding will be done by a professional print company. They only accept two sided manuscripts.
Because of that I need to add a blank page after every page of content. I don't want to do this manually using \newpage or \clearpage because there are too many pages. Is there any, maybe low level, TeX command or package to do this? Or can you suggest another tool that does this without breaking the PDF?
Thanks for your help!
One option you might look into is to use a double sided layout that allows separate formatting for the even vs. odd pages: e.g. the book class allows this. Then you will need to define the even pages to be blank (presumably you don't want headers printed, or the page count to increment).
An alternative (if you can't get this to look correct for what you need) would be to do the layout in single sided (so that page numbering, etc. is all taken care of), then have a separate latex document which includes the pages, one at a time (pdfpages may be a good package to do this properly), and then insert blank pages (with no headers/etc.) in-between. This may end up being more work, but if you have trouble with formatting, it may be the easier way to go.
I suspect that you'd be better off doing this by manipulating the output PDF, rather than changing the LaTeX.
For example, if you're able to print to a file on your platform, there might be options in the print dialogue to tweak this. Your PDF viewer may be able to arrange this, if only by inserting blanks every second page. Or there may be a GUI or command-line tool to do the reshuffling for you.
Having said that, I've no specific recommendations for what tool you could use. A quick look around suggests strongly that the pstops tool might be able to do something along these lines, but that only helps if you're generating your PDF from postscript.
So no recipe, I'm afraid, but this'll probably be a better direction to look.
(or, meta answer: find a different print shop, or phone again and hope you get someone who gives you a different answer!)

Draw and manipulate shapes at run time

What's the best way to draw shapes interactively at run time using Delphi? I need to be able to select, drag and resize the shapes. This will be used to mark up existing images and documents.
This looks like a good starting point, but I'm wondering if there's a more complete library (preferably free) available that will save some time.
Update:
If you're going with a custom solution from scratch, I've seen another example on Delphi Central that might be an even better starting point.
I will recommend you, read some links on my site. Are explained and all the source code is available; You can see and get some usefull for you.
Plugin system in Delphi - Part 2
Not directly what you need, is a plugin system for Delphi. But all the samples are based on a drawing tool that uses Shapes (Creating, selecting, resizing). You can review the code and extract what you need.
Sample manipulating of "Maps and Figures"
Sample of how to create, select and move components at runtime (in this case with TImage).
- Select shapes visually: Shows different ways to select shapes visually.
The web is in Spanish, but you can generate an authomatic translation on the web itself.
Anyway the code is commented.
Regards.
Excuse-me for my bad english.
One freeware option would be TssControlSizer. Just change the "control" property to the control you want to manipulate resize/move.
Not sure if you've moved on now with this Bruce but if you haven't, it might be worth looking at TMS Components Diagram Studio - it's certainly cheap, and looks quite powerful from the demo.
I would use Flex Graphics (commercial, $499 for one developer, with sources, $1500 for site-license, with source code). When I bought it, it was a lot less than that. So I guess, I wouldn't pay that now. It's a lightweight 'drawing/cad' package.
But as I already own it, I could import a page from the original document as an image, perhaps rendered in PNG or WMF, and then mark it up with lines, etc.
You could think of it as a light "cad" package. It has most primitive shapes, and you can easily create your own new objects or shapes in Delphi classes, that could be "smart shapes" like the ones in Visio.
http://www.flex-graphics.com/
Another commercial component set that I have heard only good things about is TRichView. They have a TRichViewEdit that looks like you could emulate a document markup environment easily with it.
Please check here:
TCAD -2d graphics component for delphi
http://www.codeidea.com
wish can help you.

Setting up help for a Delphi app

What's the best way to set up help (specifically HTML Help) for a Delphi application? I can see several options, all of which has disadvantages. Specifically:
I could set HelpContext in the forms designer wherever appropriate, but then I'm stuck having to track numbers instead of symbolic constants.
I could set HelpContext programmatically. Then I can use symbolic constants, but I'd have more code to keep up with, and I couldn't easily check the text DFMs to see which forms still need help.
I could set HelpKeyword, but since that does a keyword lookup (like Application.HelpKeyword) rather than a topic jump (like Application.HelpJump), I'd have to make sure that each of my help pages has a unique, non-changing, top-level keyword; this seems like extra work. (And there are HelpKeyword-related VCL bugs like this and this.)
I could set HelpKeyword, set an Application.OnHelp handler to convert HelpKeyword requests to HelpJump requests so that I can assign help by topic ID instead of keyword lookup, and add code such as my own help viewer (based on HelpScribble's code) that fixes the VCL bugs and lets HelpJump work with anchors. By this point, though, I feel like I'm working against the VCL rather than with it.
Which approach did you choose for your app?
When I first started researching how to do this several years ago, I first got the "All About help files in Borland Delphi" tutorial from: http://www.ec-software.com/support_tutorials.html
In that document, the section "Preparing a help file for context sensitive help" (which in my version of the document starts on page 28). It describes a nice numbering scheme you can use to organize your numbers into sections, e.g. Starting with 100000 for your main form and continuing with 101000 or 110000 for each secondary form, etc.
But then I wanted to use descriptive string IDs instead of numbers for my Help topics. I started using THelpRouter, which is part of EC Software's free Help Suite at: http://www.ec-software.com/downloads_delphi.html
But then I settled on a Help tool that supported string ID's directly for topics (I use Dr. Explain: http://www.drexplain.com/) so now I simply use HelpJump, e.g.:
Application.HelpJump('UGQuickStart');
I hope that helps.
We use symbolic constants. Yes, it is a bit more work, but it pays off. Especially because some of our dialogs are dynamically built and sometimes require different help IDs.
I create the help file, which gets the help topic ID, and then go around the forms and set their HelpContext values to them. Since the level of maintenance needed is very low - the form is unlikely to change help file context unless something major happens - this works just fine.
We use Help&Manual - its a wonderful tool, outputting almost any format of stuff you could want, doc, rtf, html, pdf - all from the same source. It will even read in (or paste from rtf (eg MSWord). It uses topic ID's (strings) which I just keep a list of and I manually put each one into a form (or class) as it suits me. Sounds difficult but trust me you'll spend far longer hating the wrong authouring tool. I spent years finding it!
Brian

Resources