Are there any general localization/translation alternatives to gettext?
Open source or proprietary doesn't matter.
When I say alternative to gettext, I mean a library for internationalization, with a localization backend of sorts.
The reason I'm asking is because (among other things) I find the way gettext does things slightly cumbersome and static, mostly in the backend bit.
First of all I think gettext is one of the best at this point.
You may take a look on Boost.Locale that may provide a better API and use gettext's dictionary model: http://cppcms.sourceforge.net/boost_locale/docs/ (not official part of Boost, still beta).
Edit:
If you don't like gettext...
These are translation technologies:
OASIS XLIFF
GNU gettext po/mo files
POSIX catalogs
Qt ts/tm files
Java properties,
Windows resources.
Now:
Last two total crap... Very hard to use translate and maintain, do not support plural forms.
Qt ts/tm -- requires usage of Qt framework. Have very similar model to gettext. Not bad solution, but limited to Qt. Not so useful in generic programs.
POSIX catalogs -- nobody uses them, no plural forms support. Crap.
OASIX XLIFF -- "standard" solution, depends on XML, even ICU requires compilation to specific ICU resources for use. Limited translation tools, I don't know any library that supports XLIFF. Plural forms not so easy to use (ICU included some support only in 4.x release).
Now what do we have?
GNU gettext, widely used, has great tools, has great plural forms support, very popular in translators community...
So decide, do you really think that gettext is not so good solution?
I don't think so. You haven't worked with other solutions at all, so try to understand how it works at first place.
Fluent is a new system that offers a number of adaptations that gettext lacks. Where gettext supports pluralization, fluent has a generic framework for the text variants. Where gettext uses the "untranslated" string as its translation key, fluent supports an abstract key (allowing multiple translations for something that just happens to be homonymous in the source language. Here is a more extensive comparison.
An example of fluent .ftl file, taken from firefox's preferences codebase, looks like this:
# This Source Code Form is subject to the terms of the Mozilla Public
# License, v. 2.0. If a copy of the MPL was not distributed with this
# file, You can obtain one at http://mozilla.org/MPL/2.0/.
blocklist-window =
.title = Block Lists
.style = width: 55em
blocklist-description = Choose the list { -brand-short-name } uses to block online trackers. Lists provided by <a data-l10n-name="disconnect-link" title="Disconnect">Disconnect</a>.
blocklist-close-key =
.key = w
blocklist-treehead-list =
.label = List
blocklist-button-cancel =
.label = Cancel
.accesskey = C
blocklist-button-ok =
.label = Save Changes
.accesskey = S
# This template constructs the name of the block list in the block lists dialog.
# It combines the list name and description.
# e.g. "Standard (Recommended). This list does a pretty good job."
#
# Variables:
# $listName {string, "Standard (Recommended)."} - List name.
# $description {string, "This list does a pretty good job."} - Description of the list.
blocklist-item-list-template = { $listName } { $description }
blocklist-item-moz-std-listName = Level 1 block list (Recommended).
blocklist-item-moz-std-description = Allows some trackers so fewer websites break.
blocklist-item-moz-full-listName = Level 2 block list.
blocklist-item-moz-full-description = Blocks all detected trackers. Some websites or content may not load properly.
Interesting comments about gettext() and all those pro-gettext().
I'm not saying that it ain't working right in most cases, but I tried to manage one project with it and quickly felt overwhelm by the hardness of using it. Maybe there are a few user interfaces for translators today, but I did not even look. The extraction and merging of strings is just not doing it for me.
Now, I thank you Artyom for talking about XLIFF which is a much better solution for my environment since everything is XML. Oh! And there are excellent editors out there. But if you like gettext() you won't find them. 8-)
I'll suggest looking at this one for example:
https://sourceforge.net/projects/wordforge2/
Now, this may give the programmer a nightmare to make it all work, but what we want is a dream come true for the translators (and zero work by the programmer as translations pour in, because I can tell you that with gettext() I had to do all the work!)
There is an alternative from Zend that supports gettext *.po / *.mo files and many more formats.
Many Apache servers cache the translation files because gettext is implemented as a module and the server has to be restarted to refresh the translation data.
The Zend implementation avoids this and supports many more formats:
http://framework.zend.com/manual/1.12/en/zend.translate.html
Related
I am converting DITA to XLIFF. In my technical solution I have to specialize (modify) xliff-core-1.2-strict.xsd to accommodate few DITA attributes. It means some attributes will go along with "g" tag. For Example:
<g id="00001" newAtt="this is new attribute" xid="009"/>
From the translation side I am not sure how it will work, so my questions:
Is it general practice that LSPs will get different flavor of XLIFF XSDs from different companies?
And is it possible for them to use it in their XLIFF editor by updating updated XLIFF XSD? I tried to explore “Transolution” but did not find any place where to place modified XSD.
Please let me know if you have any thought on this.
Thanks.
Here are my answers to your two questions:
I work for an LSP and I've seen all sorts and flavors of Xliff. Most of them try to stick to OASIS 1.2 transitional schema. Some of TMS/CAT producers added their own extensions. These producers normally provide an XSD so you can validate their Xliffs by adding that XSD to OASIS schema; e.g. SDL extensions to 1.2. When I'm customizing Xliff for a client, I normally do namespaces and provide a simple additional XSD; e.g.:
<trans-unit id="0" translate="yes" resname="msg_foo">
<superduper:uri>http://foo.bar/iJKLM9</superduper:uri>
<source>This is supposed to be a <superduper:g id="00001" newAtt="this is new attribute" xid="009"/> example.</source>
<target state="new"></target>
</trans-unit>
Most of the TMS/CAT tools are very basic (and closed) when it comes to their Xliff filters (or any of their filters for that matter) and I'm sorta kinda sure that they ignore your customized XSD.
Transolution is a very nice tool and my favorite Open Source translation tool. Unfortunately it's been long abandoned and has plenty of defects and shortcomings.
Anyway, if you provide a sample file, I can tell you what happens to non-conforming tags when it's imported into one of common, major CAT tools.
One final note; <g> seems to be retired in Xliff 2.0.
Forget about Transolution for one thing. I would try it with Trados, MemoQ or MemSource. Those are some of the big tools used in the translation business.
Personally I never got an XLIFF with a XSD, nor do I think Trados can handle it.
What do you need to change in the XLIFF?
We parse a lot of JSON in our app - without the back-end, it would be a pretty useless app. I know this goes for a bunch of other apps out there as well. In order to parse JSON, we need a list of keys to get to the data. I'd like to know what is considered 'best practice' or at least 'damn good practice' for managing these paths/string literals. Is there a tool out there that helps manage such keys and reduces duplication?
Hard-coding them is definitely not an option although to be frank, if our back-end programmers change the key, in concept, a simple find/replace in XCode (or whatever IDE you're using) would suffice. It's ugly and unclean and I just feel dirty putting string literals all over my code though.
What I'm currently doing now is putting them all into my PCH file, which means I end up with:
#define kBookmarksSearchResultsIDFieldName #"business.id"
#define kBookmarksSearchResultsNameFieldName #"business.name"
#define kBookmarksSearchResultsThumbnailURLFieldName #"business.display_image.images.small_mobile.source"
#define kBookmarksBusinessCategoryArrayFieldName #"business.categories"
This gets unwieldy real fast though since now I have around a thousand lines of these things in my PCH file.
The other option I'm considering is breaking these up into separate .h files - but then if two components of my app end up using the same key (for example, a business object is embedded into the JSON for a bookmark, or for a review of that business) then I have to import the .h that contains the JSON paths for the business object. So in this case I'm still importing all of the same data, it's just the file organization that's cleaner.
My objectives are:
Easy management of string literals used for parsing JSON
Reduce the amount of duplication needed
Easy changing/replacement of JSON paths if/when needed
Is option 3 that I listed above (separate .h files) my best option? What do you guys use, and am I missing an easy tool out there (and no, JSONModel isn't an option because of the way it requires your JSON keys to match your ivar/property names - our back-end supports a number of platforms so we can't change the JSON keys just for iOS).
Look into using a library such as RestKit which allows you to map a JSON document to a set of Objective-C classes. This means you can read the document in and get an array of objects you can manipulate by properties instead of having to keep track of key names. It's much easier, and Xcode will autocomplete your property names as you work with the classes.
It takes some setup, but you only have to do it once. :)
Just to update this answer - there's a very cool library called Mantle - not perfect, there are some issues with typecasting but still a very solid effort.
I'm using the iPhone library for MeCab found at https://github.com/FLCLjp/iPhone-libmecab . I'm having some trouble getting it to tokenize all possible words. Specifically, I cannot tokenize "吉本興業" into two pieces "吉本" and "興業". Are there any options that I could use to fix this? The iPhone library does not expose anything, but it uses C++ underneath the objective-c wrapper. I assume there must be some sort of setting I could change to give more fine-grained control, but I have no idea where to start.
By the way, if anyone wants to tag this 'mecab' that would probably be appropriate. I'm not allowed to create new tags yet.
UPDATE: The iOS library is calling mecab_sparse_tonode2() defined in libmecab.cpp. If anyone could point me to some English documentation on that file it might be enough.
There is nothing iOS-specific in this. The dictionary you are using with mecab (probably ipadic) contains an entry for the company name 吉本興業. Although both parts of the name are listed as separate nouns as well, mecab has a strong preference to tag the compound name as one word.
Mecab lacks a feature that allows the user to choose whether or not compounds should be split into parts. Note that such a feature is generally hard to implement because not everyone agrees on which compounds can be split and which ones can't. E.g. is 容疑者 a compound made up of 容疑 and 者? From a purely morphological point of view perhaps yes, but for most practical applications probably no.
If you have a list of compounds you'd like to get segmented, a quick fix is to create a user dictionary for the parts they consist of, and make mecab use this in addition to the main dictionary.
There is Japanese documentation on how to do this here. For your particular example, it would involve the steps below.
Make a user dictionary with two entries, one for 吉本 and one for 興業:
吉本,,,100,名詞,固有名詞,人名,名,*,*,よしもと,ヨシモト,ヨシモト
興業,,,100,名詞,一般,*,*,*,*,こうぎょう,コウギョウ,コウギョウ
I suspect that both entries exist in the default dictionary already, but by adding them to a user dictionary and specifying a relatively low specificness indicator (I've used 100 for both -- the lower, the more likely to be split), you can get mecab to tend to prefer the parts over the whole.
Compile the user dictionary:
$> $MECAB/libexec/mecab/mecab-dict-index -d /usr/lib64/mecab/dic/ipadic -u mydic.dic -f utf-8 -t utf-8 ./mydic
You may have to adjust the command. The above assumes:
Mecab was installed from source in $MECAB. If you use mecab installed by a package manager, you might have difficulties finding the mecab-dict-index tool. Best install from source.
The default dictionary is in /usr/lib64/mecab/dict/ipadic. This is not part of the mecab package; it comes as a separate package (e.g. this) and you may have difficulties finding this, too.
mydic is the name of the user dictionary created in step 1. mydic.dic is the name of the compiled dictionary you'll get as output (needs not exist).
Both the system dictionary (-t option) and the user dictionary (-f option) are encoded in UTF-8. This may be wrong, in which case you'll get an error message later when you use mecab.
Modify the mecab configuration. In a system-wide installation, this is a file named /usr/lib64/mecab/dic/ipadic/dicrc or similar. In your case it may be located somewhere else. Add the following line to the end of the configuration file:
userdic = home/myhome/mydic.dic
Make sure the absolute path to the dictionary compiled above is correct.
If you then run mecab against your input, it will split the compound into its parts (I tested it, using mecab 0.994 on a Linux system).
A more thorough fix would be to get the source of the default dictionary and manually remove all compoun nouns you want to get split, then recompile the dictionary. As a general remark, using a CJK tokenizer for a serious application in production mode over a longer period of time usually involves a certain amount of dictionary maintenance (adding/removing entries) regularly.
I'm currently writing some functions that are related to lists that I could possibly be reused.
My question is:
Are there any conventions or best practices for organizing such functions?
To frame this question, I would ideally like to "extend" the existing lists module such that I'm calling my new function the following way: lists:my_funcion(). At the moment I have lists_extensions:my_function(). Is there anyway to do this?
I read about erlang packages and that they are essentially namespaces in Erlang. Is it possible to define a new namespace for Lists with new Lists functions?
Note that I'm not looking to fork and change the standard lists module, but to find a way to define new functions in a new module also called Lists, but avoid the consequent naming collisions by using some kind namespacing scheme.
Any advice or references would be appreciated.
Cheers.
To frame this question, I would ideally like to "extend" the existing lists module such that I'm calling my new function the following way: lists:my_funcion(). At the moment I have lists_extensions:my_function(). Is there anyway to do this?
No, so far as I know.
I read about erlang packages and that they are essentially namespaces in Erlang. Is it possible to define a new namespace for Lists with new Lists functions?
They are experimental and not generally used. You could have a module called lists in a different namespace, but you would have trouble calling functions from the standard module in this namespace.
I give you reasons why not to use lists:your_function() and instead use lists_extension:your_function():
Generally, the Erlang/OTP Design Guidelines state that each "Application" -- libraries are also an application -- contains modules. Now you can ask the system what application did introduce a specific module? This system would break when modules are fragmented.
However, I do understand why you would want a lists:your_function/N:
It's easier to use for the author of your_function, because he needs the your_function(...) a lot when working with []. When another Erlang programmer -- who knows the stdlb -- reads this code, he will not know what it does. This is confusing.
It looks more concise than lists_extension:your_function/N. That's a matter of taste.
I think this method would work on any distro:
You can make an application that automatically rewrites the core erlang modules of whichever distribution is running. Append your custom functions to the core modules and recompile them before compiling and running your own application that calls the custom functions. This doesn't require a custom distribution. Just some careful planning and use of the file tools and BIFs for compiling and loading.
* You want to make sure you don't append your functions every time. Once you rewrite the file, it will be permanent unless the user replaces the file later. Could use a check with module_info to confirm of your custom functions exist to decide if you need to run the extension writer.
Pseudo Example:
lists_funs() -> ["myFun() -> <<"things to do">>."].
extend_lists() ->
{ok, Io} = file:open(?LISTS_MODULE_PATH, [append]),
lists:foreach(fun(Fun) -> io:format(Io,"~s~n",[Fun]) end, lists_funs()),
file:close(Io),
c(?LISTS_MODULE_PATH).
* You may want to keep copies of the original modules to restore if the compiler fails that way you don't have to do anything heavy if you make a mistake in your list of functions and also use as source anytime you want to rewrite the module to extend it with more functions.
* You could use a list_extension module to keep all of the logic for your functions and just pass the functions to list in this function using funName(Args) -> lists_extension:funName(Args).
* You could also make an override system that searches for existing functions and rewrites them in a similar way but it is more complicated.
I'm sure there are plenty of ways to improve and optimize this method. I use something similar to update some of my own modules at runtime, so I don't see any reason it wouldn't work on core modules also.
i guess what you want to do is to have some of your functions accessible from the lists module. It is good that you would want to convert commonly used code into a library.
one way to do this is to test your functions well, and if their are fine, you copy the functions, paste them in the lists.erl module (WARNING: Ensure you do not overwrite existing functions, just paste at the end of the file). this file can be found in the path $ERLANG_INSTALLATION_FOLDER/lib/stdlib-{$VERSION}/src/lists.erl. Make sure that you add your functions among those exported in the lists module (in the -export([your_function/1,.....])), to make them accessible from other modules. Save the file.
Once you have done this, we need to recompile the lists module. You could use an EmakeFile. The contents of this file would be as follows:
{"src/*", [verbose,report,strict_record_tests,warn_obsolete_guard,{outdir, "ebin"}]}.
Copy that text into a file called EmakeFile. Put this file in the path: $ERLANG_INSTALLATION_FOLDER/lib/stdlib-{$VERSION}/EmakeFile.
Once this is done, go and open an erlang shell and let its pwd(), the current working directory be the path in which the EmakeFile is, i.e. $ERLANG_INSTALLATION_FOLDER/lib/stdlib-{$VERSION}/.
Call the function: make:all() in the shell and you will see that the module lists is recompiled. Close the shell.
Once you open a new erlang shell, and assuming you exported you functions in the lists module, they will be running the way you want, right in the lists module.
Erlang being open source allows us to add functionality, recompile and reload the libraries. This should do what you want, success.
In his article The Nature of Lisp, Slava Akhmechet introduces people to lisp by using Ant/NAnt as an example. Is there an implementation of Ant/NAnt in lisp? Where you can use actual lisp code, instead of xml, for defining things? I've had to deal with creating additions to NAnt, and have wished for a way to bypass the xml system in the way Slava shows could be done.
Ant is a program that interprets commands written in some XML language. You can, as justinhj mentioned in his answer use some XML parser (like the mentioned XMLisp) and convert the XML description in some kind of Lisp data and then write additional code in Lisp. You need to reimplement also some of the Ant interpretation.
Much of the primitive stuff in Ant is not needed in Lisp. Some file operations are built-in in Lisp (delete-file, rename-file, probe-file, ...). Some are missing and need to be implemented - alternative you can use one of the existing libraries. Also note that you can LOAD Lisp files into Lisp and execute code - there is also the REPL - so it comes already with an interactive frontend (unlike Java).
Higher level build systems in Common Lisp usually are implementing an abstraction called 'SYSTEM'. There are several of those. ASDF is a popular choice, but there are others. A system has subsystems and files. A system has also a few options. Its components also have options. A system has either a structural description of the components, a description of the dependencies, or a kind descriptions of 'actions' and their dependencies. Typically these things are implemented in an object-oriented way and you can implement 'actions' as Lisp (generic) functions. Lisp also brings functions like COMPILE-FILE, which will use the Lisp compiler to compile a file. If your code has, say, C files - you would need to call a C compiler - usually through some implementation specific function that allows to call external programs (here the C compiler).
As, mentioned by dmitry-vk, ASDF is a popular choice. LispWorks provides Common Defsystem. Allegro CL has their own DEFSYSTEM. Its DEFSYSTEM manual describes also how to extend it.
All the Lisp solution are using some kind of Lisp syntax (not XML syntax), usually implemented by a macro to describe the system. Once that is read into Lisp, it turns into a data representation - often with CLOS instances for the system, modules, etc.. The actions then are also Lisp functions. Some higher-order functions then walk over the component graph/tree and execute actions of necessary. Some other tools walk over the component graph/tree and return a representation for actions - which is then a proposed plan - the user then can let Lisp execute the whole plan, or parts of the plan.
On a Lisp Machine a simple system description looks like this:
(sct:defsystem scigraph
(:default-pathname "sys:scigraph;"
:required-systems "DWIM")
(:serial "package" "copy" "dump" "duplicate" "random"
"menu-tools" "basic-classes" "draw" "mouse"
"color" "basic-graph" "graph-mixins" "axis"
"moving-object" "symbol" "graph-data" "legend"
"graph-classes" "present" "annotations" "annotated-graph"
"contour" "equation" "popup-accept" "popup-accept-methods"
"duplicate-methods" "frame" "export" "demo-frame"))
Above defines a system SCIGRAPH and all files should be compiled and load in serial order.
Now I can see what the Lisp Machine would do to update the compiled code:
Command: Compile System (a system [default Scigraph]) Scigraph (keywords)
:Simulate (compiling [default Yes]) Yes
The plan for constructing Scigraph version Newest for the Compile
operation is:
Compile RJNXP:>software>scigraph>scigraph>popup-accept-methods.lisp.newest
Load RJNXP:>software>scigraph>scigraph>popup-accept-methods.ibin.newest
It would compile one file and load it - I have the software loaded and changed only this file so far.
For ASDF see the documentation mentioned on the CLIKI page - it works a bit different.
Stuart Halloway's upcoming book Programming Clojure goes through the construction of Lancet throughout the book as an example app. Lancet is a Clojure build system which (optionally) integrates directly with Ant. Source code and examples are available.
If all you want to do is generate Ant XML files using Lisp code, you could use something like clj-html for Clojure or CL-WHO for Common Lisp. Generating XML from Lisp s-exps is fun and easy.
Common Lisp's ASDF (Another System Definition Facility) is analogous to Make/Ant (but not a full analogue — it is aimed at building lisp programs, not generic systems like make or ant). It is extensible with Lisp code (subclassing systems, components, adding operations to systems). E.g., there is an asdf-ecs extensions that allows including (and compiling) C source files into system.
Perhaps you could define things in lisp and convert them to XML at the point you pass them to NAnt.
Something like XMLisp makes it easier to go back and forth between the two representations.
Edit: Actually, xml-emitter would make more sense.