Orbeon Text Areas and RTEs as CDATA - orbeon

Is there a way in Orbeon to save TextAreas and RTEs as CDATA sections so that line breaks and other formatting inputted by the user is preserved? In some use cases it's really important not to change what the user has entered and I haven't found a way to accomplish this to date.
Thanks!

In general, formatting and line breaks should be preserved by default. If the input is modified, there are three possible "culprits": the RTE component itself, Tagsoup, and clean-html.xsl. There are certain limitations regarding the RTE component (AFAIK orbeon still uses YUI 2), for example it doesn't handle p elements correctly. Tagsoup and clean-html.xsl should let through most of the standard html elements, but they filter, for example, the canvas element. More on orbeon's RTE element:
http://wiki.orbeon.com/forms/doc/developer-guide/xforms-controls/textarea-control#TOC-Rich-text-editor-HTML-editor-
So, if the content that arrives at your xforms instance is modified, you will need to debug each of the processing steps to check where the modification took place.
If it's a matter of the RTE component, you could try to check if the TinyMCE XBL component works better for you (it uses TinyMCE instead of the YUI2 RTE - i posted it some months ago in the ops-users ML). If it's a Tagsoup matter, you will have to patch the source code (change the Tagsoup config); there's also a workaround to configure Tagsoup using an external config file (it should be available in the ML archives, too). If it's a clean-html.xsl issue, you can easily created your own clean-html.xsl, it's described in the wiki page (see above) HTH fs

Related

Time format for PDF template in Orbeon 4.10

Im using orbeon 4.10 to collect data and fill back a PDF from a template. I would like to choose how the time is displayed on the pdf. I have seen the oxf.xforms.format.input.time and oxf.xforms.format.output.time properties, but they seem to only control the form itself.
I have also seen this, but it seems to relate to the date format.
What value do I need to change in my properties?
Thanks
UPDATE: I don't think my solution below actually works. I think this might have worked at some point but might not work anymore. We do have an RFE for this.
You can use the following property:
<property
as="xs:string"
name="oxf.fr.resource.$app.$form.$lang.print.formats.time"
value="[H01]:[m01]:[s01]]"/>
where:
$app is the app name
$form is the form name
$lang is the language which applies
You can use wildcards (*) for all of those.
As a workaround i used a Hidden field to format my time correctly for the pdf.
Here is the formating code :format-time($controlname, '[H01]:[m01]') that i used in the calculated value
Here is the visibility code : $fr-mode = 'email' for the pdf generated from email or $fr-mode = 'pdf' for pdf button.

How to reference AngularDart component attributes outside of component scope

I've been tasked with creating a webapp using AngularDart. I'm new to Dart and therefore AngularDart also, but have worked through the tutorials at https://angulardart.org/tutorial/.
I've read similar SO questions/answers (How to access a AngularDart components attribute (NgOneWay) outside of the component, eg. on the page it is in?) but given my limited knowledge I don't fully understand how to implement Brian's suggestion - or if indeed it relates to my question! Am I also right in thinking that Vink's suggestion is now irrelevant as #Controller's are now deprecated?
In short, I'm trying to figure out if the following is in fact achievable:
create a top-level <custom-component/> complete with attributes;
within <ng-view/> - and therefore nested components - access <custom-component/>'s attributes. More specifically, bind/listen for changes to these attributes and act accordingly.
For example, given the following
<body>
<custom-component att1="val1" att2="val2"></custom-component>
<ng-view></ng-view>
</body>
and on the assumption that <ng-view/> renders the following markup
<div>
...
<other-component att1="{{customComponent.att1}}" att2="{{customComponent.att2}}"></other-component>
<another-component att1="{{customComponent.att1}}" att2="{{customComponent.att2}}"></another-component>
...
</div>
is it possible to bind customComponent.<<attributeName>> to <other-component/> and <another-component/>?
Or, as is likely, am I misunderstanding the use of AngularDart's #Component/#Decorator?
Should the #att1 and #att2 attributes be moved from <custom-component/> to <ng-view/>?
Alternatively, should I look to architect a different solution altogether? My ultimate goal in this instance is to provide the user with att1/att2 select boxes (from yet another component) which in turn determine the rendered content within each of <other-component/> and <another-component/>.
Any and all suggestions welcome, I won't be offended if you dismiss any/all of the above!!!
My current development environment is as follows:
Dart SDK version 1.9.0-dev.8.0
angular 1.0.0
Many thanks, J
What you can do is to move the two components into the content(template) of a wrapper component and bind attributes of both components to fields of the wrapper element.
Usually a better way is to use events like shown in How to communicate between Angular DART controllers or Angular Dart component events (controllers have been merged with components since then)

open source controls to convert rich text formatted code to html markup

I am working on asp.net mvc. I am trying to display the rich text formatted content like,
{\rtf1\ansi\ansicpg1252\uc1\htmautsp\deff2{\fonttbl{\f0\
fcharset0 Times New Roman;}{\f2\fcharset0 Tahoma;}}{\colortbl\red0\green0\blue0;\red255\green255\blue255;}\loch\hi
ch\dbch\pard\plain\ltrpar\itap0{\lang1033\fs24\f2\cf0 \cf
0\ql{\f2 {\ltrch AMANDA WITH RC CALLED AND WANTED TO
VERIFY THAT WE WERE AFFILIATED WITH SHAUN # JAGGYS. LET HER KNOW WE
WERE, SHAUN CALLED RC AS WELL TO VERIFY STATUS OF BD}\li0\ri0\sa0\sb0\fi0\ql\par}
}
}
in the view. Actually this data could come from database table and i need to display it in the editor type control. so is there any open source controls that are able to display rich text format.
Well, I just got done writing a RTF to HTML converter that maintains all embedded media, and creates a MIME multipart message out of it. This is close to what you want to do. Essentially if you aren't interested in writing your own converter, you can look at this CodeProject and use his: http://www.codeproject.com/Articles/27431/Writing-Your-Own-RTF-Converter
There is also descriptions as to how to reach his solution.
On my project we just started ripping apart the RTF document and parsing its contents. Open source and 3rd-Party Libraries weren't an option for me.

Processing HTML with UIMA

I am trying to get my head around the UIMA architecture.
I would like to create a pipeline that starts with HTML markup. I need to strip this to plain text, so it can be processed by different annotators, like POS, chunking, entity detection, etc. However I would also like to keep track of which regions correspond to the original html tags, like links, paragraphs, em, etc. Basically I would like a final annotator that takes advantage of structural annotations (from html) and semantic annotations (from the other components), all at once.
So, I can imagine starting off with a component that strips the html markup and adds annotations to keep track of the tags I am interested in. Does such a component exist already? It seems like something a lot of people would want.
If I do have to create it from scratch, what kind of component is it? It's not just a straight annotator, because it needs to change the SOFA: it needs to replace the markup with plain text.
Or should I have it create a new view of the document, so we maintain a markup view and a plain text view of the document? This seems weird, considering I will never care about the markup view again. Also, how would I make sure the other annotators (which I won't be coding myself) operate on the plain text view of the document rather than the markup view?
Depending on the complexity of the markup, some people use Apache Tika, and some people use Boilerpipe.
Here is a blog post from someone who wanted to use Boilerpipe in UIMA but ran into a snag because he wanted to preserve offsets back to the HTML.
Here is the UIMA annotator that calls tika.
UIMA Ruta provides some analysis engines for this task. The HtmlAnnotator creates annotations in the html text for the different tags. The HtmlConverter is able to create a new view that contains only the text of the html, but with the corresponding annotations for the tags. There are some configuration parameters for handling linebreaks and so on. For further processing without sofa mappings in a pipeline, there is the ViewWriter that is able copy the new plain text view to the _initalView of a new file.
DISCLAIMER: I am a developer of UIMA Ruta

Making tagsoup markup cleansing optional

Tagsoup is interfering with input and formatting it incorrectly. For instance when we have the following markup
Text outside anchor
It is formatted as follows
Text outside anchor
This is a simple example but we have other issues as well. So we made tagsoup cleanup/formatting optional by adding an extra attribute to textarea control.
Here is the diff(https://github.com/binnyg/orbeon-forms/commit/044c29e32ce36e5b391abfc782ee44f0354bddd3).
Textarea would now look like this
<textarea skip-cleanmarkup="true" mediatype="text/html" />
Two questions
Is this the right approach?
If I provide a patch can it make it to orbeon codebase?
Thanks
BinnyG
Erik, Alex, et al
I think there are two questions here:
The first Concern is a question of Tag Soup and the clean up that happens OOTB: Empty tags are converted to singleton tags which when consumed/sent to the client browser as markup gets "fixed" by browsers like firefox but because of the loss of precision they do the wrong thing.
Turning off this clean up helps in this case but for this issue alone is not really the right answer because we it takes away a security feature and a well-formed markup feature... so there may need to be some adjustment to the handling of at least certain empty tags (other than turning them in to invalid singleton tags.)
All this brings us to the second concern which is do we always want those features in play? Our use-case says no. We want the user to be able to spit out whatever markup they want, invalid or not. We're not putting the form in an app that needs to protect the user from cross script coding, we're building a tool that lets users edit web pages -- hence we have turned off the clean-up.
But turning off cleanup wholesale? Well it's important that we can do it if that's what our usecase calls for but the implementation we have is all or nothing. It would be nice to be able to define strategies for cleanup. Make that function plug-able. For example:
* In the XML Config of the system define a "map" of config names to class names which implement the a given strategy. In the XForm Def the author would specify the name from the map.
If TagSoup transforms:
Text outside anchor
Into:
Text outside anchor
Wouldn't that be bug in TagSoup? If that was the case, then I'd say that it is better to fix this issue rather than disable TagSoup. But, it isn't a bug in TagSoup; here is what seems to be happening. Say the browsers sends the following to the client:
<a shape="rect"></a>After<br clear="none">
This goes through TagSoup, the result goes through the XSLT clean-up code, and the following is sent to the browser:
<a shape="rect"/>After<br clear="none"/>
The issue is on the browser, which transforms this into:
<a shape="rect">After</a><br clear="none"/>
The problem is that we serialize this as XML with Dom4jUtils.domToString(cleanedDocument), while it would be more prudent to serialize it as HTML. Here we could use the Saxon serializer. It is also used from HTMLSerializer. Maybe you can try changing this code to use it instead of using Dom4jUtils.domToString(). You'll let us know what you find when a get a chance to do that.
Binesh and I agree, if there is a bug it would be a good idea to address the issue closer to the root. But I think the specific issue he is only part of the matter.
We're thinking it would be best to have some kind of name-to-strategy mapping so that RTEs can call in the server-side processing that is right for them or the default if it's not specified.

Resources