Need to modify XLIFF XSD - localization

I am converting DITA to XLIFF. In my technical solution I have to specialize (modify) xliff-core-1.2-strict.xsd to accommodate few DITA attributes. It means some attributes will go along with "g" tag. For Example:
<g id="00001" newAtt="this is new attribute" xid="009"/>
From the translation side I am not sure how it will work, so my questions:
Is it general practice that LSPs will get different flavor of XLIFF XSDs from different companies?
And is it possible for them to use it in their XLIFF editor by updating updated XLIFF XSD? I tried to explore “Transolution” but did not find any place where to place modified XSD.
Please let me know if you have any thought on this.
Thanks.

Here are my answers to your two questions:
I work for an LSP and I've seen all sorts and flavors of Xliff. Most of them try to stick to OASIS 1.2 transitional schema. Some of TMS/CAT producers added their own extensions. These producers normally provide an XSD so you can validate their Xliffs by adding that XSD to OASIS schema; e.g. SDL extensions to 1.2. When I'm customizing Xliff for a client, I normally do namespaces and provide a simple additional XSD; e.g.:
<trans-unit id="0" translate="yes" resname="msg_foo">
<superduper:uri>http://foo.bar/iJKLM9</superduper:uri>
<source>This is supposed to be a <superduper:g id="00001" newAtt="this is new attribute" xid="009"/> example.</source>
<target state="new"></target>
</trans-unit>
Most of the TMS/CAT tools are very basic (and closed) when it comes to their Xliff filters (or any of their filters for that matter) and I'm sorta kinda sure that they ignore your customized XSD.
Transolution is a very nice tool and my favorite Open Source translation tool. Unfortunately it's been long abandoned and has plenty of defects and shortcomings.
Anyway, if you provide a sample file, I can tell you what happens to non-conforming tags when it's imported into one of common, major CAT tools.
One final note; <g> seems to be retired in Xliff 2.0.

Forget about Transolution for one thing. I would try it with Trados, MemoQ or MemSource. Those are some of the big tools used in the translation business.
Personally I never got an XLIFF with a XSD, nor do I think Trados can handle it.
What do you need to change in the XLIFF?

Related

Add new values to XML dynamically

I have an XML file in my app resources folder. I am trying to update that file with new dictionaries dynamically. In other words I am trying to edit an existing XML file to add new keys and values to it.
First of all can we edit a static XML file and add new dictionary with keys and values to it. What is the best way to do this.
In general, you can read an XML file into a document object (choose your language), use methods to modify it (add your new dictionary), and (re-)write it back out to either the original XML file, or a new one.
That's straightforward ... just roll up the ol' sleeves and code it up.
The real problem comes in with formatting in the XML file before and after said additions.
If you are going to 'unix diff' the XML file before and after, then order is important. Some standard XML processors do better with order than others.
If the order changes behind the scenes, and is gratuitously propagated into your output file, you lose standard diffing advantages, such as some gui differs, and some scm diffs (svn, cvs, etc.).
For example, browse to:
Order of XML attributes after DOM processing
They discuss that DOM loses order where SAX does not.
You can also write a custom XML 'diff'er (there may be such off-the-shelf ... for example check out 'http://diffxml.sourceforge.net/') that compares 2 XML documents tag-by-tag, attribute-by-attribute, etc.
Perhaps some standard XML-related tool such as XSLT will allow you to keep the formatting constant without changing tag or attribute order. You'd have to research that.
BTW, a related problem is the config (.ini) file problem ... many common processors flippantly announce that the write-order may not agree with the read-order.

Specification for vda formats

I'm student and I have to make project which include EDI converter (mine of course). To be honest I have to use ODETTE format and VDA. Does anybody know where I can find any specification for this formats?
Any help would be great !:)
ODETTE is edifact. see http://www.unece.org/trade/untdid/welcome.html
all UN edifact messagges are there.
they have versions (like D 96A etc); so you have to be beware of the version of the message you use.
VDA is used in German automitive.
This is their website: http://www.vda.de/de/index.html
look eg at http://www.vda.de/de/publikationen/publikationen_downloads/index.html?aid=1
They have more than one format: edifact, csv, fixed elngth records.
Yes, you can build your own translator.
Take a look at Bots open source edi translator (http://bots.sourceforge.net),
you might get some idea's ;-))

Overriding JSF's default I18N handling

Like the asker of the question here
Variable substitution JSF Resource Bundle property file
I'm slightly aghast at the inability to reference the value of other property keys in the message bundle.
Although I see how easy to write my own rubbish handler[0] that can do what I want in a custom component, that would leave expressions in templates calling the message bundle still using the default JSF implementation.
Is it possible to override the default JSF handling of the message bundle?
[0] Or better, to use code referenced in one of the answers to the above question
https://code.google.com/p/reflectiveresourcebundle/
You can provide the fully qualified name of a concrete ResourceBundle implementation as "base name" instead of alone the path and filename of the properties files.
E.g.
public class YourCustomResourceBundle extends ResourceBundle {
// ...
}
which can be registered as follows
<application>
<resource-bundle>
<base-name>com.example.YourCustomResourceBundle</base-name>
<var>text</var>
</resource-bundle>
</application>
or declared on a per-view/template basis as follows
<f:loadBundle baseName="com.example.YourCustomResourceBundle" var="text" />
Here are several related questions/answers which contain some concrete code which you could use as a kickoff example:
How to remove the surrounding ??? when message is not found in bundle
internationalization in JSF with ResourceBundle entries which are loaded from database
i18n with UTF-8 encoded properties files in JSF 2.0 appliaction
Everything is possible for those who try. The question is not whether it is is possible but should you do it. And the answer to that question is: probably not.
Referencing other message in a message bundle means you want to build a compound message. So you can re-use part of the message many times just to save small fraction of the disk space or small fraction of development time.
If that is the case, I have a message for you. What you plan to do is called a concatenation and it is the second most common I18n defect. And its implications are as bad as those of hardcoded strings.
Why? Because target languages do not follow the English grammar rules. First, it is common need to re-order the sentence while translating. This might be easy to fix by using (numbered or named) placeholders. On the other hand though, the translation might differ depending on the context. That is, it might need to be translated in totally other way, or simply the word endings might need to be different depending on a grammar case, mood or gender.
My advice is, do not use such shortcuts, it will create more problems than it fixes.
Now you should know why "those stupid Romans" didn't implement it like this: it is against I18n best practices.

Alternatives to gettext?

Are there any general localization/translation alternatives to gettext?
Open source or proprietary doesn't matter.
When I say alternative to gettext, I mean a library for internationalization, with a localization backend of sorts.
The reason I'm asking is because (among other things) I find the way gettext does things slightly cumbersome and static, mostly in the backend bit.
First of all I think gettext is one of the best at this point.
You may take a look on Boost.Locale that may provide a better API and use gettext's dictionary model: http://cppcms.sourceforge.net/boost_locale/docs/ (not official part of Boost, still beta).
Edit:
If you don't like gettext...
These are translation technologies:
OASIS XLIFF
GNU gettext po/mo files
POSIX catalogs
Qt ts/tm files
Java properties,
Windows resources.
Now:
Last two total crap... Very hard to use translate and maintain, do not support plural forms.
Qt ts/tm -- requires usage of Qt framework. Have very similar model to gettext. Not bad solution, but limited to Qt. Not so useful in generic programs.
POSIX catalogs -- nobody uses them, no plural forms support. Crap.
OASIX XLIFF -- "standard" solution, depends on XML, even ICU requires compilation to specific ICU resources for use. Limited translation tools, I don't know any library that supports XLIFF. Plural forms not so easy to use (ICU included some support only in 4.x release).
Now what do we have?
GNU gettext, widely used, has great tools, has great plural forms support, very popular in translators community...
So decide, do you really think that gettext is not so good solution?
I don't think so. You haven't worked with other solutions at all, so try to understand how it works at first place.
Fluent is a new system that offers a number of adaptations that gettext lacks. Where gettext supports pluralization, fluent has a generic framework for the text variants. Where gettext uses the "untranslated" string as its translation key, fluent supports an abstract key (allowing multiple translations for something that just happens to be homonymous in the source language. Here is a more extensive comparison.
An example of fluent .ftl file, taken from firefox's preferences codebase, looks like this:
# This Source Code Form is subject to the terms of the Mozilla Public
# License, v. 2.0. If a copy of the MPL was not distributed with this
# file, You can obtain one at http://mozilla.org/MPL/2.0/.
blocklist-window =
.title = Block Lists
.style = width: 55em
blocklist-description = Choose the list { -brand-short-name } uses to block online trackers. Lists provided by <a data-l10n-name="disconnect-link" title="Disconnect">Disconnect</a>.
blocklist-close-key =
.key = w
blocklist-treehead-list =
.label = List
blocklist-button-cancel =
.label = Cancel
.accesskey = C
blocklist-button-ok =
.label = Save Changes
.accesskey = S
# This template constructs the name of the block list in the block lists dialog.
# It combines the list name and description.
# e.g. "Standard (Recommended). This list does a pretty good job."
#
# Variables:
# $listName {string, "Standard (Recommended)."} - List name.
# $description {string, "This list does a pretty good job."} - Description of the list.
blocklist-item-list-template = { $listName } { $description }
blocklist-item-moz-std-listName = Level 1 block list (Recommended).
blocklist-item-moz-std-description = Allows some trackers so fewer websites break.
blocklist-item-moz-full-listName = Level 2 block list.
blocklist-item-moz-full-description = Blocks all detected trackers. Some websites or content may not load properly.
Interesting comments about gettext() and all those pro-gettext().
I'm not saying that it ain't working right in most cases, but I tried to manage one project with it and quickly felt overwhelm by the hardness of using it. Maybe there are a few user interfaces for translators today, but I did not even look. The extraction and merging of strings is just not doing it for me.
Now, I thank you Artyom for talking about XLIFF which is a much better solution for my environment since everything is XML. Oh! And there are excellent editors out there. But if you like gettext() you won't find them. 8-)
I'll suggest looking at this one for example:
https://sourceforge.net/projects/wordforge2/
Now, this may give the programmer a nightmare to make it all work, but what we want is a dream come true for the translators (and zero work by the programmer as translations pour in, because I can tell you that with gettext() I had to do all the work!)
There is an alternative from Zend that supports gettext *.po / *.mo files and many more formats.
Many Apache servers cache the translation files because gettext is implemented as a module and the server has to be restarted to refresh the translation data.
The Zend implementation avoids this and supports many more formats:
http://framework.zend.com/manual/1.12/en/zend.translate.html

Adding MS-Word-like comments in LaTeX

I need a way to add text comments in "Word style" to a Latex document. I don't mean to comment the source code of the document. What I want is a way to add corrections, suggestions, etc. to the document, so that they don't interrupt the text flow, but that would still make it easy for everyone to know, which part of the sentence they are related to. They should also "disappear" when compiling the document for printing.
At first, I thought about writing a new command, that would just forward the input to \marginpar{}, and when compiling for printing would just make the definition empty. The problem is you have no guarantee where the comments will appear and you will not be able to distinguish them from the other marginpars.
Any idea?
todonotes is another package that makes nice looking callouts. You can see a number of examples in the documentation.
Since LaTeX is a text format, if you want to show someone the differences in a way that they can use them (and cherry pick from them) use the standard diff tool (e.g., diff -u orig.tex new.tex > docdiffs). This is the best way to annotate something like LaTeX documents, and can be easily used by anyone involved in the production of a document from LaTeX sources. You can then use standard LaTeX comments in your patch to explain the changes, and they can be very easily integrated. If the document lives in a version control system of some sort, just use the VCS to generate a patch file that can be reviewed.
I have used changes.sty, which gives basic change colouring:
\added{new text}
\deleted{old text}
\replaced{new text}{old text}
All of these take an optional parameter with the initials of the author who did this change. This results in different colours used, and these initials are displayed superscripted after the changed text.
\replaced[MI]{new text}{old text}
You can hide the change marks by giving the option final to the changes package.
This is very basic, and comments are not supported, but it might help.
My little home-rolled "fixme" tool uses \marginpar where possible and goes inline in places (like captions) where that is hard to arrange. This works out because I don't often use margin paragraphs for other things. This does mean you can't finalize the layout until everything is fixed, but I don't feel much pain from that...
Other than that I heartily agree with Michael about using standard tools and version control.
See also:
Tips for collaboratively editing a LaTeX document (which addresses you main question...)
https://stackoverflow.com/questions/193298/best-practices-in-latex
and a self-plug:
How do I get Emacs to fill sentences, but not paragraphs?
You could also try the trackchanges package.
You can use the changebar package to highlight areas of text that have been affected.
If you don't want to do the markup manually (which can be tedious and interrupt the flow of editing) the neat latexdiff utility will take a diff of your document and produce a version of it with markup added to visually display the changes between the two versions in the typeset output.
This would be my preferred solution, although I haven't tested it out on large, multi-file documents.
The best package I know is Easy Review that provides the commenting functionality into LaTeX environment. For example, you can use the following simple commands such as \add{NEW TEXT}, \remove{OLD TEXT}, \replace{OLD TEXT}{NEW TEXT}, \comment{TEXT}{COMMENT}, \highlight{TEXT}, and \alert{TEXT}.
Some examples can be found here.
The todonotes package looks great, but if that proves too cumbersome to use, a simple solution is just to use footnotes (e.g. in red to separate them from regular footnotes).
Package trackchanges.sty works exactly the way changes.sty. See #Svante's reply.
It has easy to remember commands and you can change how edits will appear after compiling the document. You can also hide the edits for printing.

Resources