Application for writing/translation structured text in several languages - translation

I need to write articles in several languages. The structure and appearance of article will be the same. For me it is ok to use html/css or markdown (other option could be acceptable as well) o perfrom this task. But problem is in efficient translation. Is any tool that allows:
keeping data separately from structure and appearance (something like it is made in Android apps - separate resource directory with files for supported languages, and text is taken by id)?
tool that provides good GUI so I could open tabs with article in different languages, could see what is translated/not-translated, mark article element (title/paragrapgh) as translated?

Related

What approach graphical DSL workbenches use: Parsers or projections?

To my knowledge there are 2 approches that DSL editors use:
1- Parser based approach to develop textual DSLs: The user specifies a grammar and the workbench generates a parser that recognize this grammar. The parser builds an Abstract Syntax Tree that is used by code generators and so on.
2- Projectional approach: here there is no parser. The Abstract Syntax Tree is directly edited by user's gestures and projection rules specify how The Abstract Syntax Tree is rendered. This allow the use of different notations (Textual, graphical, tabular... ) at the same time.
Now When I look at graphical only DSL workbenches (such as DSL-tools from Microsoft) I wonder what approach they use and what are the steps involved behind the definition of the DSL. If it's the projectional approach so why is it limited to graphical notation only?
My idea is that it uses both. The projectional approach to make the notation graphical but the models are saved in a specific format (XML for exemple) and parsed.
Thank you.
Well, strictly speaking, any "graphical" editor is projectional. The ability of a language workbench to have different notations, like in MPS, is enabled by the fact that the tool has these notations built in, together with the ability to define several editors for the same piece of model. In the case of MPS it's even possible to create new notations as a plugin (so without having to change MPS itself).
I would say that saving the models to any storage medium can't ultimately be anything else than textual or binary. Any editor that wants to save models will serialize to one of those two options, even MPS. So since it wouldn't make sense to say that there is a projectional way to save models, you can either say that both DSL-tools and MPS have the textual approach for saving and a projectional editor, or (my preferred option) simply that both DSL-tools and MPS can produce projectional editors.
Also, I wouldn't agree to call DSL-tools a language workbench. As you can read in https://homepages.cwi.nl/~storm/publications/lwc13paper.pdf a program has to meet a bunch of criteria (more than DSL-tools can meet in my opinion) to be a language workbench.
In general, I would say that any "graphical" language workbench (i.e. a language workbench that produces editors that are not plain text) uses the projectional approach.
An important difference between source and projectional editing environment is the split between persistent storage and editing. Projectional editing systems can choose any persistence mechanism that they choose, while source systems need to have some universal storage mechanism - which is why they are almost always text files Martin Fowler
So if what you are editing is not in the same format as it is stored in, you are using a projectional editor. All non-textual notations (tabular, symbolic, graphical) inherently cannot be stored exactly as they look like, so they must be projected.
example: Markdown on this website
An example featuring a commonly used tool that technically (you wouldn't think of it that way) uses a projectional editor could be MS Word, because you can't just open your .docx file quickly in notepad and change the size of the header. You always edit the abstract representation shown to you through a projection.
WYSIWYG word processing systems such as Word, which appear to edit formatted text directly, are essentially structure editors for the underlying marked-up text. [wikipedia]
A tangentially related term is Illustrative programming [Fowler] which features the so-called most common "programming language" in the world Excel.

Most efficient way to translate a Sitecore website to 4 other languages (Not having the translators in you Sitecore CMS)

I am looking for a good way to translate an excisting Sitecore installation (English language is available) to 4 other languages (Russian , Chinese, Portuguese etc.) A dedicated translation company will translate all texts we deliver to the specified languages, but I'm curious on how other companies set this up. I thought about just exporting all Sitecore items which have to be translated using the Database language Export function in Sitecore and having the translation company edit those files. By just replacing the language tags in the XML we should be able to import this file as the newly created other language, however I'm affraid that this XML structure will be totally useless for a translation company and that they will drown in the codes inside this XML. How can we efficiently do this? Is there any other way then just giving those translation people access to the Sitecore environment and having them edit the languages here? Any Shared Source Module to achieve this? I still have alot of questions, is there anyone with some experience in achieving this?
Your primary options are either the language export/import functionality (as you mention), or a workflow-based solution that integrates with your translation agency's Translation Management System (if they have one -- hopefully they do).
The former is better for the initial translation. Typically, your agency should be able to handle translation of content within XML files. A good one can. If you create all needed language versions beforehand and copy english content into them, it will make the files easier to work with as they'll have tags for the new languages in them already. I've seen the creation of these layers done with Revolver (http://www.codeflood.net/revolver/) but could also be done with custom code or workflow.
For ongoing maintenance of your translated content, you'll probably want to integrate through workflow. Clay Tablet Technologies (http://www.clay-tablet.com/) have a middleware component w/ Sitecore integration that can make this easier, depending on your translation agency. You can also do your own workflow-based integration, with workflow commands that allow your users to send content for translation. Then you'd need some sort of listener that pulls the translated content back in, and continues the workflow.
Hope this helps!
You could also check out Lionbrdige (http://en-us.lionbridge.com/sitecore-and-lionbridge-announce-partnership-to-help-companies-thrive-across-borders.htm) as a solution.
From my own experience our customers normally use the Sitecore import/export function as a first step and then use Lionbridge or Clay Tablet as a service.
One important thing to think about with translations is the ongoing work. The initial translation is rather simple, but the second and so on might be more troublesome. What if different changes has been made in different languages. If local changes were made in the content for sat the french version you couldn´t just send the English version (second translate then) since you would also have to accomodate for the regional changes in the content.
Having worked with literally dozens of Sitecore clients worldwide — and helped get content to and from all the largest, and many smaller translation firms —, I can attest to the ineffeciency of trying to do translation in situ, that is in Sitecore. I liken it to asking an electrician to come over and rewire your house, but as they reach for their toolbox from the truck you tell them, "Nope — you need to do it by hand".
The very best way to manage anything more than a page or two of content for translation is to export it seamlessly. Deliver it to the LSP in a proper format (XML or XLIFF) and, when possible, auto import it to their TMS. Once translated, the content should then flow seamlessly back into Sitecore.
You can code this yourself — but the pitfalls are non-trivial just on the Sitecore side. (If you want intuitive UI's, scalability, and all the features that meet the needs of translation). Let alone the challenges of connecting to the systems LSP's use. (For example, who here knows the relative merits/risks of using SLD's Nexus connector versus their CTA for connecting to TMS?)
As kindly mentioned above, there are commecially available solutions that meet all these needs and more. So if you've got even a modest amount of content — and want to send that to any translation provider of your choice — I'd be happy to discuss how we can help.
The main issue with translation isn't technical at all, the XML export is a simple enough format and all agencies should be able to deal with it with no porblems. as others have suggested, maintenance after the initial translation is slightly more problematic but they also point to tools to achieve this.
The main issue we've found with translation is actually linguistic: how to achieve consistency of phrasing and that matches the original but is sufficiently adjusted to local requirements. Translation companies usually have software to aid this - libraries of of the phrases they translate etc. - working with an exported XML file doesn't provide the context of seeing content in situ. A particular item may be translated correctly and the site consistently, but as each page may be built from multiple items there can easily be conflicts between content as presented.
That makes working with the Sitecore backend (maybe with field security settings to limit ) or in the page editor (possibly pre filling fields with English values) a viable idea.

What is the best way to store multiple language versions of a website?

My web site (on Linux servers) needs to support multiple languages.
What is the best practice to have/store multiple languages versions of the same site?
Some I can think of:
store in DB
different view file for each language
gettex
hard coded words in PHP files (like in phpBB)
With web sites, you really have several categories of content to consider for localization:
The article-type content elements that you would in many cases create, edit and publish in a CMS.
The smaller content blocks that are common to every page (or a sub-group of pages), such as tagline, blurb, text around a contact form, but also imported content such as a news ticker or ads and affiliate links. Some of these may only appear for one language (for example, if you don't offer some services in some regions, or don't have, say, language-appropriate imported content for a particular language: it can be better to remove an element rather than offering English to people who may not speak it).
The purely functional elements, like "Click here to comment", "More...", high-level navigation, etc., which are sometimes part of your template. Some of these may be inside images.
For 1. the main decision is using a CMS or not. If yes, you absolutely need to choose one that supports multiple languages. I'm not up-to-date with recent developments in PHP CMS's, but several of the Django CMS apps (Django-CMS-2, FeinCMS) support multi-language content. Don't forget that date stamps, for example, need to be localized, too (or you can get around this by choosing ISO dates, though that may not always be possible). If you don't use a CMS, and everything is in your HTML files, then gettext is the way to go, and keep the .mo files (and your offline .po files) in folders by language.
For 2. if you have a CMS with good multi-lingual support, get as much as possible inside the CMS. The reason is that these bits do change, and you want to edit your template as little as possible. If you write code yourself, think of ways of exporting all in-CMS strings per language, to hand them to translators. Otherwise, again, gettext. The main issue is that these elements may require hard-coding language-selection code (if $language = X display content1 ...)
For 3., if it's in your template, use gettext. For images, the per-language folders will come in handy, and for heaven's sake make choose images the generation of which can be automated, or you (or your graphic artist) will go mad with editing 100s of custom images with strings in languages you don't understand.
For both 2. and 3., abstracting from the language selection may help selecting the appropriate blocks or content directory (where localized images or .mo files are kept).
What you definitely want to avoid is keeping a pile of HTML files with extensive text content in them that would be a nightmare to maintain.
EDIT: Everything about gettext, .po and .mo files is in the GNU gettext manual (more than you ever wanted to know) or a slightly dated but friendlier tutorial. For PHP, there's are the PHP gettext functions, and also the Zend Locale documentation
I recommend using Zend_Translate's Gettext adapter which parses mo files. Very efficient + caching. Your calls would be like
echo $translation->_("Hello World");
Which would find the locale specific key for that specified string.
Check out i18n support for php: http://php-flp.sourceforge.net/getting_started_english.htm

Dictionary Component or source code that can check in multiple languages

We are developing an application in which we need to implement spell checking for Indic languages that use ANSI fonts (not UNICODE)
I am looking for a Dictionary Component or Source Code that will allow:
To maintain separate dictionaries
like for example Legal, commercial,
etc.
Support more than one language
If possible to allow developer to set
parsing parameters so that we as
developers can determine as to how
given text should be broken down in
words
Support Addition of words to
dictionary (should maintain separate
dictionary and not modify original
dictionary)
Support custom dialog box so we can
design our own dialog box (if
required)
Should be able to distinguish case of
characters meaning it should not
consider cascade and Cascade
as same (if possible). There should be
some kind of parameters that will
allow us to enable/disable this feature
If this dictionary can check spellings in another Windows App that would be an added advantage.
As the link in the comment suggest, I would look at Addict component suite and plus pack.
Dictionary wizard provide way of creating specialized dictionaries. There's also APIs "allowing for text parsing, dictionary lookup, text corrections, misspelling suggestions, thesaurus contexts and more."
Addict was written and designed from
the ground up to be as robust and
flexible as possible. Developers have
complete API access to all of Addict's
core features, including main
dictionaries, control parsers, parsing
engine, entities to ignore while
parsing, custom dictionaries,
suggestions generation, thesaurus
file, and much more.
You should split spellchecker core system and your UI.
So, in your application, you should add any dialog boxes and configuration.
Spellchecker core just works with plain text and returns list or errors with suggestions.
There are dozens of open-source spellchecker core implementations. You can even use online services like Google's one (look at Google Wave videos).

Localization asset management

We have several large products we'd like to integrate with a consistent localization strategy.
We're already doing the right things from a code point of view - ie. strings in resource files.
I'm looking for something that will organize localized strings in a database, and generate the appropriate resource files (ie. .RESX files for .NET, .js files, etc.) during the build process. Ideally, it would also be able to read in the files as well (detecting strings that have been added/removed).
The database would allow us to reuse translations in different products, switch to different technologies, and track what translations are missing in each release.
Has anyone found a good product that handles these requirements? What have others done to manage localized assets?
Found some good links in the answers for this question: Do you know of a good program for editing/translating resource (.rc) files?
There's a number of products which we're now evaluating:
http://www.lingobit.com/
http://www.sisulizer.com/
http://www.multilizer.com/
WinTrans - http://www.schaudin.com/
None of have quite the database-based approach we were initially looking for, but they seem to have the core functionality. Lingobit is an early favorite, but we haven't trialed in too much detail yet. Does anyone have a recommendation between those products (or similar)?
Check out GlobalSight or Alchemy's Catalyst
Catalyst is a standalone translation memory and localization engine that can be used in your build process (and is used by many large software companies). GlobalSight is a relatively new and open source translation database and workflow tool that looks very promising.

Resources