How does Google Translate provide offline translations? - machine-learning

The thing I am specifically puzzled by is how each language corresponds to a single translation file. This means that there must be a common intermediate translation that connects the different languages to solve the NxM problem. How exactly does it work?

Related

I need a service for generation sentence included a set of words

I need to generate random but lexically correct sentences with my words included. The sentence doesn't have to consist entirely of my words. But the more of them the better.
I have already searched through a lot of resources related to machine learning, but everywhere they write about generating RANDOM texts. I can't influence the result in any way by specifying the presence of at least a couple of my words as a condition.
Perhaps someone on this resource knows about repositories with similar libraries or APIs where I can get something like this?
After a week of searching, I can say that there is no ready-to-use library or service for this.
If you data scientist, you may found this problem interesting for a creation your own startup, i think.

Saving different sets of values of variables with a changing structure

I have several sets of values (factory setting, user setting...) for a structure of variables and these values are saved in a binary file. So when I want to apply certain setting I just load the specific file containing desired values and these values are applied to the variables accordingly to the structure. This works fine when the structure of variables doesn't change.
I can't figure out how to do it when I add a variable but need to retain the values of the rest (when a structure in a program changes, I need to change the files so that they would contain the new values accordingly to the new structure and at the same time keep the old ones).
I'm using a PLC system that is written in ST language. But I'm looking for some overall approach for solving this issue.
Thank you.
This is not an easy task to provide a solution that is generic and works with different plc platforms. There are many different ways to accomplish this depending on the system/interface you actually want to use e.g. PLC Source Code / OPC / ADS / MODBUS / special functions, addins from the vendor and there are some more possibilities e.g. language features on the PLC. I wrote three solutions to this with C#/ST(with OOP Extensions) and ADS/OPC communication, one with source code parsing first in C#, the other with automatic generation from PLC side and another with an automatic registration system of the parameters with an EntityFramework compatible Database as ParameterStore. If you don't want to invest too much time in this you should try out the parameter management systems that are provided by your plc vendor and live by those restrictions.

Most efficient way to translate a Sitecore website to 4 other languages (Not having the translators in you Sitecore CMS)

I am looking for a good way to translate an excisting Sitecore installation (English language is available) to 4 other languages (Russian , Chinese, Portuguese etc.) A dedicated translation company will translate all texts we deliver to the specified languages, but I'm curious on how other companies set this up. I thought about just exporting all Sitecore items which have to be translated using the Database language Export function in Sitecore and having the translation company edit those files. By just replacing the language tags in the XML we should be able to import this file as the newly created other language, however I'm affraid that this XML structure will be totally useless for a translation company and that they will drown in the codes inside this XML. How can we efficiently do this? Is there any other way then just giving those translation people access to the Sitecore environment and having them edit the languages here? Any Shared Source Module to achieve this? I still have alot of questions, is there anyone with some experience in achieving this?
Your primary options are either the language export/import functionality (as you mention), or a workflow-based solution that integrates with your translation agency's Translation Management System (if they have one -- hopefully they do).
The former is better for the initial translation. Typically, your agency should be able to handle translation of content within XML files. A good one can. If you create all needed language versions beforehand and copy english content into them, it will make the files easier to work with as they'll have tags for the new languages in them already. I've seen the creation of these layers done with Revolver (http://www.codeflood.net/revolver/) but could also be done with custom code or workflow.
For ongoing maintenance of your translated content, you'll probably want to integrate through workflow. Clay Tablet Technologies (http://www.clay-tablet.com/) have a middleware component w/ Sitecore integration that can make this easier, depending on your translation agency. You can also do your own workflow-based integration, with workflow commands that allow your users to send content for translation. Then you'd need some sort of listener that pulls the translated content back in, and continues the workflow.
Hope this helps!
You could also check out Lionbrdige (http://en-us.lionbridge.com/sitecore-and-lionbridge-announce-partnership-to-help-companies-thrive-across-borders.htm) as a solution.
From my own experience our customers normally use the Sitecore import/export function as a first step and then use Lionbridge or Clay Tablet as a service.
One important thing to think about with translations is the ongoing work. The initial translation is rather simple, but the second and so on might be more troublesome. What if different changes has been made in different languages. If local changes were made in the content for sat the french version you couldn´t just send the English version (second translate then) since you would also have to accomodate for the regional changes in the content.
Having worked with literally dozens of Sitecore clients worldwide — and helped get content to and from all the largest, and many smaller translation firms —, I can attest to the ineffeciency of trying to do translation in situ, that is in Sitecore. I liken it to asking an electrician to come over and rewire your house, but as they reach for their toolbox from the truck you tell them, "Nope — you need to do it by hand".
The very best way to manage anything more than a page or two of content for translation is to export it seamlessly. Deliver it to the LSP in a proper format (XML or XLIFF) and, when possible, auto import it to their TMS. Once translated, the content should then flow seamlessly back into Sitecore.
You can code this yourself — but the pitfalls are non-trivial just on the Sitecore side. (If you want intuitive UI's, scalability, and all the features that meet the needs of translation). Let alone the challenges of connecting to the systems LSP's use. (For example, who here knows the relative merits/risks of using SLD's Nexus connector versus their CTA for connecting to TMS?)
As kindly mentioned above, there are commecially available solutions that meet all these needs and more. So if you've got even a modest amount of content — and want to send that to any translation provider of your choice — I'd be happy to discuss how we can help.
The main issue with translation isn't technical at all, the XML export is a simple enough format and all agencies should be able to deal with it with no porblems. as others have suggested, maintenance after the initial translation is slightly more problematic but they also point to tools to achieve this.
The main issue we've found with translation is actually linguistic: how to achieve consistency of phrasing and that matches the original but is sufficiently adjusted to local requirements. Translation companies usually have software to aid this - libraries of of the phrases they translate etc. - working with an exported XML file doesn't provide the context of seeing content in situ. A particular item may be translated correctly and the site consistently, but as each page may be built from multiple items there can easily be conflicts between content as presented.
That makes working with the Sitecore backend (maybe with field security settings to limit ) or in the page editor (possibly pre filling fields with English values) a viable idea.

Converting free form english text to spanish, what are the options?

I have an application that will be used by spanish speaking people as well as english speaking people. I am using .resx files and localization to translate all the hard coded text. I am also retrieving language specific data from the database for some things that don't change often like "Category Descriptions". Here is my question. I think I already know the answer. Is there a way to translate free form text entered by a user? For example can a string entered as saved to a database in english be displayed in spanish? One more issue is these strings often contain engineering terms and technical abbreviations that I don't think could be translated with something like google translate. Is there anything else out there? I am thinking that this text can only be translated by a human with knowledge of the terminolgy and abbreviations used in this particular industry.
There are some online services such as Google Translate as pointed to by Binary Worrier. However, one should bear in mind that none of these services give accurate translations. Because, as you wrote, translation is a very difficult matter. Current obstacles to good automated translation include, as you wrote, lack of context.
This is a problem even for human translators. Ask a translator for a given sentence in another language. She'll answer: "Ok, what do you mean by this word: X or Y ? In which context ? Who are you talking to? Is this a formal or informal tone? etc...
This is especially true regarding localization where texts are usually very short. This increases the lack of context. Think of a simple menu item: "Load". Is it a name? Is it a verb? Damn, even a human translator needs more information. So don't expect a computer to solve the problem.
Of course, it all depends on the accuracy that you need and the acceptance factor of your users for bad translations. Google Translate et al are very successful because people prefer a bad translation than nothing.
If I were you, I'd make a few manual tests with typical texts in your DBs and see if the translation accuracy fits your needs.
BTW, I believe Google Translate is free for reasonable of amount of use. Basically, unless you want to translate the whole Wikipedia every week, you should be on the safe side ;-)
You can hook into Google Translate APIs and translate this stuff on the fly, I think there's a charge though
I have an answer from my users. Have the users enter the strings in both English and Spanish and store them to the database. Display the correct strings based on the language of the browser. I still have alot of grunt work to do with filling out the .resx files and modifying all the words I need translated.

What do people do with Parsers, like antlr javacc?

Out of curiosity, I wonder what can people do with parsers, how they are applied, and what do people usually create with it?
I know it's widely used in programming language industry, however I think this is just a tiny portion of it, right?
Besides special-purpose languages, my most ambitious use of a parser generator yet (with good old yacc back in C, and again later with pyparsing in Python) was to extract, validate and possibly alter certain meta-info from SQL queries -- parsing SQL properly is a real challenge (especially if you hope to support more than one dialect!-), a parser generator (and a lexer it sits on top of) at least remove THAT part of the job!-)
They are used to parse text....
To give a more concrete example, where I work we use lexx/yacc to parse strings coming over sockets.
Also from the name it should give you an idea what javacc is used for (java compiler compiler!)
Generally to parse Domain Specific Languages or scripting languages, or similar support for code snipits.
Previously I have seen it used to parse the command line based output of another software tool. This way the outer tool (VPN software) could re-use the base router IPSec code without modification. As lots of what was being parsed was IP Route tables and other structured repeated text.
Using a parser allowed simple changes when the formatting changed, instead of trying to find and tweak the a hand written parser. And the output did change a few times of the life of the product.
I used parsers to help process +/- 800 Clipper source files into similar PRGs that could be compiled with Alaksa Xbase 32.
You can use it to extend your favorite language by getting its language definition from their repository and then adding what you've always wanted to have. You can pass the regular syntax to your application and handle the extension in your own program.

Resources