open source controls to convert rich text formatted code to html markup - asp.net-mvc

I am working on asp.net mvc. I am trying to display the rich text formatted content like,
{\rtf1\ansi\ansicpg1252\uc1\htmautsp\deff2{\fonttbl{\f0\
fcharset0 Times New Roman;}{\f2\fcharset0 Tahoma;}}{\colortbl\red0\green0\blue0;\red255\green255\blue255;}\loch\hi
ch\dbch\pard\plain\ltrpar\itap0{\lang1033\fs24\f2\cf0 \cf
0\ql{\f2 {\ltrch AMANDA WITH RC CALLED AND WANTED TO
VERIFY THAT WE WERE AFFILIATED WITH SHAUN # JAGGYS. LET HER KNOW WE
WERE, SHAUN CALLED RC AS WELL TO VERIFY STATUS OF BD}\li0\ri0\sa0\sb0\fi0\ql\par}
}
}
in the view. Actually this data could come from database table and i need to display it in the editor type control. so is there any open source controls that are able to display rich text format.

Well, I just got done writing a RTF to HTML converter that maintains all embedded media, and creates a MIME multipart message out of it. This is close to what you want to do. Essentially if you aren't interested in writing your own converter, you can look at this CodeProject and use his: http://www.codeproject.com/Articles/27431/Writing-Your-Own-RTF-Converter
There is also descriptions as to how to reach his solution.
On my project we just started ripping apart the RTF document and parsing its contents. Open source and 3rd-Party Libraries weren't an option for me.

Related

Processing HTML with UIMA

I am trying to get my head around the UIMA architecture.
I would like to create a pipeline that starts with HTML markup. I need to strip this to plain text, so it can be processed by different annotators, like POS, chunking, entity detection, etc. However I would also like to keep track of which regions correspond to the original html tags, like links, paragraphs, em, etc. Basically I would like a final annotator that takes advantage of structural annotations (from html) and semantic annotations (from the other components), all at once.
So, I can imagine starting off with a component that strips the html markup and adds annotations to keep track of the tags I am interested in. Does such a component exist already? It seems like something a lot of people would want.
If I do have to create it from scratch, what kind of component is it? It's not just a straight annotator, because it needs to change the SOFA: it needs to replace the markup with plain text.
Or should I have it create a new view of the document, so we maintain a markup view and a plain text view of the document? This seems weird, considering I will never care about the markup view again. Also, how would I make sure the other annotators (which I won't be coding myself) operate on the plain text view of the document rather than the markup view?
Depending on the complexity of the markup, some people use Apache Tika, and some people use Boilerpipe.
Here is a blog post from someone who wanted to use Boilerpipe in UIMA but ran into a snag because he wanted to preserve offsets back to the HTML.
Here is the UIMA annotator that calls tika.
UIMA Ruta provides some analysis engines for this task. The HtmlAnnotator creates annotations in the html text for the different tags. The HtmlConverter is able to create a new view that contains only the text of the html, but with the corresponding annotations for the tags. There are some configuration parameters for handling linebreaks and so on. For further processing without sofa mappings in a pipeline, there is the ViewWriter that is able copy the new plain text view to the _initalView of a new file.
DISCLAIMER: I am a developer of UIMA Ruta

ASP.net MVC Export To Excel

I am currently exporting to Excel using the old HTML trick, where I set the MIME type to application/ms-excel. This gives the added benefit of nicely formatted tables, however the negative of the excel document not being native Excel format.
I could export it as CSV, but then this would not be formatted.
I have read brief snippets that you can export it as XML to create the Excel document, but cannot find too much information on this. Does anybody know of any tutorials and/or benefits of this? Can it be formatted tables using this method?
Thanks.
Easiest way, you could parse your table and export it in Excel XML format, see this for example: http://blogs.msdn.com/b/brian_jones/archive/2005/06/27/433152.aspx
It allows you to format the table as you whish (borders, fonts,colors, I think even formulas), and Excel will recognize it as native excel format. As a plus, you can use other programs that can import Excel XML (ie.Open office, Excel viewer,etc) and you do not need to have Office components installed on the server.
Check out ExcelXmlWriter.
We've been using it for some time and it works well. There are some downsides to the xml format however. Since it's unlikely your end users will have the .xml extension associated with Excel, you end up having to download files as .xls with an Excel mime type. When a user opens a file downloaded in this way they get a warning that the file is not in xls format. If they click through it, the file opens normally.
The only alternative is a paid library to generate native Excel files. That's certainly the best solution but last time we looked there were no good, free libraries (may have changed)
Bill Sternberger has blogged a very simple solution here:
export to excel or csv from asp.net mvc
Just today I had to write a routine that exported data to excel in an MVC application. Here's the details so someone may benefit in the future, first the user had to select some date ranges and areas for the report. On the post back, this method was in place, with TheModelTypeList containing the data from LINQ/Entity Framework/SQL Query returning strong types:
if (ExportToExcel) {
var stream = new MemoryStream();
var serializer = new XmlSerializer(typeof(List<SomeModelType>));
serializer.Serialize(stream, TheModelTypeList);
stream.Position = 0;
FSR = new FileStreamResult(stream, "application/vnd.ms-excel");
}
The only catch on this one was the file type was not known when opening so the system prompted for the application to open it... this is a result of the content being XML.... I'm still working on that.
I am using Spreadsheet Light, an Open-Source library that provides ridiculously easy creation, manipulation and saving of an Excel sheet from C#. You can have an MVC / WebAPI Controller do the work of creating the file and either
Return a URL link to the saved Excel file to the page and invoke Excel to open it with an ActiveX object
Return a Data Content Stream to the page
Return a URL link to the calling page to force an Open / Save As dialog
http://spreadsheetlight.com/

Orbeon Text Areas and RTEs as CDATA

Is there a way in Orbeon to save TextAreas and RTEs as CDATA sections so that line breaks and other formatting inputted by the user is preserved? In some use cases it's really important not to change what the user has entered and I haven't found a way to accomplish this to date.
Thanks!
In general, formatting and line breaks should be preserved by default. If the input is modified, there are three possible "culprits": the RTE component itself, Tagsoup, and clean-html.xsl. There are certain limitations regarding the RTE component (AFAIK orbeon still uses YUI 2), for example it doesn't handle p elements correctly. Tagsoup and clean-html.xsl should let through most of the standard html elements, but they filter, for example, the canvas element. More on orbeon's RTE element:
http://wiki.orbeon.com/forms/doc/developer-guide/xforms-controls/textarea-control#TOC-Rich-text-editor-HTML-editor-
So, if the content that arrives at your xforms instance is modified, you will need to debug each of the processing steps to check where the modification took place.
If it's a matter of the RTE component, you could try to check if the TinyMCE XBL component works better for you (it uses TinyMCE instead of the YUI2 RTE - i posted it some months ago in the ops-users ML). If it's a Tagsoup matter, you will have to patch the source code (change the Tagsoup config); there's also a workaround to configure Tagsoup using an external config file (it should be available in the ML archives, too). If it's a clean-html.xsl issue, you can easily created your own clean-html.xsl, it's described in the wiki page (see above) HTH fs

read content in webbrowser input field

Something I have been trying to do and still can't get done. Reading the information typed in a website input field and being able to copy that.
Is there a way I can read the ty
try these articles about using TWebBrowser and delphi to read data from a web page.
How to read and write form elements
TWebBrowser OleObject and Document data
Javascript
document.getElementById('input-field-id').value
returns the contents of an input box. What are you trying to do?

extract text from word or pdf based on format (font name and size)

I need to parse large text (about 1000 pages of word or pdf document)and place some of the text inside this document into database fields
I found that the only thing I can distinguish the text I want to extract is the format , it is always "Helvetica-Condensed" size 12
can I do that ? I know how to use the string functions but what I should use to test the format ?
as I said the text is stored inside word document or PDF
if there is third party component can do no problem please refer it to me
Thanks
There is QuickPDF. The price is $249,00.
The other option is to code it yourself. The file specification is available online, and if your only trying to rip the text out of the document this should guide you most of the way.
The only thing to be careful of are documents which are built entirely from images. In that scenario (no matter what you use to read the file) you will also need an OCR type of application. To see if this is the case or not, open a sample of the type of file you are wanting to "extract" text from, select the text to copy then try to paste into notepad.

Resources