Should units of measurement be localized? [closed] - localization

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 11 days ago.
Improve this question
I am working on an app featuring some measurements entered by human operators. In the config section, an admin enters which measurements and which units they want to use, among other things. The kinds of units anticipated are very diverse and not able to be fully defined. So the plan is to let the admin enter the units in free-form instead of using a select box.
OK so far. But elsewhere, we are displaying the units when the app is localized into one of several different languages. The possible range of languages is known by the app from the beginning.
I'm looking for ideas on how best to handle the entering and displaying of units. I'm by no means a linguistic expert, but I imagine different languages have their own ways of representing the same units, which would imply that if we use free-form text entry, the admin would have to enter the unit's translation into each language. We do that with other kinds of text fields in the app, so it's not a huge problem from a coding standpoint.
But I'm wondering how others handle this kind of situation. It'd be a lot easier NOT to translate the units. But is that reasonable? FWIW, both the admins and the end-users of this system are typical consumers, not necessarily scientists or other analytic types. Also, we need to avoid having our software dependent on 3rd-party services like Google Translate.

The units themselves are partly culturally dependent, like metric vs. imperial vs. US units. If you intend to allow any units, you would in a sense carry out ultimate localization of unit: the individual user would decide on units. I suppose you would still recognize a finite set of units, somehow.
The symbols for units are a different issue. If SI units are used, their symbols are in principle the same across languages and cultures. But there are differences in practice; e.g., in Russia, it is normal to use Russian abbreviations (in Cyrillic letters) instead of standard symbols, e.g. кг and not kg. Moreover, if users can enter units by name, the names need to be localized (even though they tend to be similar, like meter ~ metre ~ Meter ~ metri ~ метр, they are not identical). And many non-SI units don’t even have standardized symbols.
So set of unit symbols or names recognized would need to be language-dependent.

No, they are international when using SI unit

It comes to my mind that french speaking countries use o for Octet instead of B like byte. But that is a Unit and not an SI-Prefix.
I also observed in cyryllic writing countries that they replace k, M, G with their respective glyphe. I don't know it this is just convenience and also accepted or if this is the actual way. I just beleive it is convenience and it's supposed to be actually the SI normed latin letters for SI-Prefixes. And therefore the latin letters would be correct (at least, too). (IMHO)

Related

Why do languages need libraries? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
Can't the languages just include the functions in them?
For example to use the sqrt function in Python you need to import the math library.
Why can't languages already have these functions built in?
Names are a scarce resource.
Would you want to be required to avoid using thousands of names, including things like max, set, read, and cycle?
As I understand, you have two very different questions and a very precise answer is not possible for either one.
Can't the languages just include the functions in them?
This part I am confused with if by this question, you mean explicit import
of a function in source file that programmer is need to do or is it just duplicate of question # 2 , that I already tried to answer.
Reasons for explicit import : To have option of multiple implementations of same logic and reduce application program executable size. e.g. a language implemented a function - sqrt is such a way that its slow and some other smart programmer wrote same method in more efficient way , wouldn't you like to use second option & not use language provide function ? That can be achieved only if programmer specifies that which sqrt , he / she meant to use.
Why can't languages already have these functions built in?
Because every piece of software needs to be maintained and continuously upgraded ( as per changing trends in computing ) by a set of people and everybody is constrained of resources esp. in open source environment. So what we do - we try to keep basic language software to minimal so it can easily be maintained and improved upon by core group - X while group - Y , group - Z can take care of non - essential / optional items. That way, scope of a language is limited. You should also know that languages contain lots of features which are rarely used.
A propriety & rich company like Microsoft might have a different thought process and they might put 1000 dedicated people to their language & try to include everything but most popular languages originated & still live in non - corporate environment.
Other reason is giving flexibility to programmer as already explained. A language which provides you everything and asks you to use only those functions would be very inflexible.
If you put in the complexity of business domains like something specific for Aerospace , something specific for healthcare etc etc , scope gets very unlimited very easily.
Usually, a software is divided into two parts - core part & optional patches ( modules ) to achieve better maintainability , flexibility and reduces software size on need basis.

Concept Based Text Summarization (Abstraction) [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 6 years ago.
Improve this question
I am looking for an engine that does AI text summarization based on the concept or meaning of the sentence, I looked at open-source projects like (ginger, paraphrase, ace) but they don't do the job.
The way they work is that they try to find synonyms for each word and replace with the current words, this way they generate alot of alternatives to a sentence but the meaning is wrong most of the times.
I have worked with Stanford's engine to do something like highlights to an article and based on that extract the most important sentences, but still this is not abstraction, its extraction.
It would also make sense that the engine I'm looking for learns over time and results are improved after each summary.
Please help out here, your help is greatly appreciated!
I don’t know any open source project which fits your requirements about abstraction and a meaning as I assume.
But I have an ideas how to build such engine and how to train it.
In a few words I think we all keep in mind some Bayesian-network like structure in our minds, with helps us not only to classify some data, but also to form an abstract meaning about text or message.
Since it is impossible to extract all that abstract categories structure from our mind I think it’s better to build mechanism which allow as to reconstruct it step-by-step.
Abstract
The key idea of the proposed solution is in the extraction of meaning of a conversation using approaches which easier in operation with it from an automated computer system. This will allow creating the good level of illusion of real conversation with another person.
Proposed model supports two levels of abstraction:
First of them, less complex level consists in the recognition of groups of words or a single word as a group which related to the category, instance or to the instance attribute.
Instance means instantiation from the general category of the real or abstract subject, object, action, attribute or other kind of instances. As an example – concrete relation between two or more subjects: concrete relations between employer and employee, concrete city and country where it’s situated and so on.
This basic meaning recognition approach allows us to create bot with ability sustain a conversation. This ability based on recognition of basic elements of meaning: categories, instances and instances attributes.
Second, the most complicated method based on scenario recognition and storing them into the conversation context with instances/categories as well as using them for completion some of recognized scenarios.
Related scenarios will be used to complete the next message of the conversation as well as some of scenarios can be used to generate the next message or for recognizing meaning element by using of conditions and by using meaning elements from the context.
Something like that:
Basic classification should be entered manually and with future correction/addition of the teachers.
Words from sentence in conversation and scenarios from sentence can be filled from context
Conversation scenarios/categories can be fulfilled by previously recognized instances or with instances described in future conversation (self-learning)
Pic 1 – word detection/categorization basically flow vision
Pic 2 – general system vision big picture view
Pic 3 - meaning element classification
Pic 4 – basically categories structure could be like that

Using existing human translations to aid machine translation to a new language

In the past, my company has used human, professional translators to translate our software from English into some 13 languages. It's expensive but the quality is high.
The application we're translating contains industry jargon. It also contains a lot of sentence fragments and single words which, out of context, are unlikely to be correctly translated.
I am wondering if there is a machine translation system or service that could use our existing professionally-generated translations to more accurately create a machine translation into any new language.
If an industry term, phrase or sentence fragment has been translated from en-US to es-AR, pt-BR, cs-CZ, etc., then couldn't those prior translations be used as a hint regarding what the correct word choice should be for some new language? They could be used, in a sense, to triangulate. At worst, they could be used to create a majority voting system (e.g. if 9 of 13 languages translated a phrase to the same thing in the new language, we go with it).
Is anyone aware of a machine translation service that works like this?
I have no idea about tranlation systems, but such functionality -- custom translations for specific words -- will be offered by any commercial system, I guess. It is even possible with google translate by clicking on the book with the star on cover.
As a trivial non-invasive method, you could for each goal-language make up a dictionary [] the required terminology in the form [word-as-is-translated, word-as-should-be-translated], where you have an N:1 relationship (in one language multiple word-as-is-translated should be mapped to one word-as-should-be-translated. The words word-as-is-translated thereby depend on the actual translation system).
After preparing those dictionaries, you can simply search the translation result for those words and replace them with the desired words.

Converting free form english text to spanish, what are the options?

I have an application that will be used by spanish speaking people as well as english speaking people. I am using .resx files and localization to translate all the hard coded text. I am also retrieving language specific data from the database for some things that don't change often like "Category Descriptions". Here is my question. I think I already know the answer. Is there a way to translate free form text entered by a user? For example can a string entered as saved to a database in english be displayed in spanish? One more issue is these strings often contain engineering terms and technical abbreviations that I don't think could be translated with something like google translate. Is there anything else out there? I am thinking that this text can only be translated by a human with knowledge of the terminolgy and abbreviations used in this particular industry.
There are some online services such as Google Translate as pointed to by Binary Worrier. However, one should bear in mind that none of these services give accurate translations. Because, as you wrote, translation is a very difficult matter. Current obstacles to good automated translation include, as you wrote, lack of context.
This is a problem even for human translators. Ask a translator for a given sentence in another language. She'll answer: "Ok, what do you mean by this word: X or Y ? In which context ? Who are you talking to? Is this a formal or informal tone? etc...
This is especially true regarding localization where texts are usually very short. This increases the lack of context. Think of a simple menu item: "Load". Is it a name? Is it a verb? Damn, even a human translator needs more information. So don't expect a computer to solve the problem.
Of course, it all depends on the accuracy that you need and the acceptance factor of your users for bad translations. Google Translate et al are very successful because people prefer a bad translation than nothing.
If I were you, I'd make a few manual tests with typical texts in your DBs and see if the translation accuracy fits your needs.
BTW, I believe Google Translate is free for reasonable of amount of use. Basically, unless you want to translate the whole Wikipedia every week, you should be on the safe side ;-)
You can hook into Google Translate APIs and translate this stuff on the fly, I think there's a charge though
I have an answer from my users. Have the users enter the strings in both English and Spanish and store them to the database. Display the correct strings based on the language of the browser. I still have alot of grunt work to do with filling out the .resx files and modifying all the words I need translated.

Designing a Non-Specific Language Application, e.g. planning for localization

Made this community wiki :3
I'm developing a basic RPG, and one of my goals from the beginning is to make sure that my program is language non-specific. Basically, before I design or start programming any menus, I want to make sure that I can load and display them out of supported languages so I am not hard-coding in values.
(It would save me from many migranes down the road)
For this example, let's use Western Left-to-Right languages. English, Spanish, German, French, Italian.
This is a basic example of what I have.
One XML file contains a mapping and design of a conversation.
<conversation>
<dialog>line1</dialog>
<dialog>line2</dialog>
</conversation>
Other XML files contains the definitions.
<mappings language="English">
<line1>This is line 1 in English!</line1>
<line2>Other lines are contained in language-separated xml files</line2>
</mappings>
Heh. This would work great, besides the fact that I forgot that English doesn't assign genders to their words, whereas other languages do. So, where one sentence might be enough in English, I might need to have two sentences in other languages, one to cover the masuline tense and the other to cover the feminine tense.
What would be the most condusive way of solving this problem? Right now, I've considered coming up with different mapping tables, one excuslively for masculine-tense sentences whereas the other table would cover just feminine-tenses. Or just reading from different defintion tables.
And another kicker would be based within my game data design. I never thought about it, but I might need to store within my game items and characters their sexes so I can use the correct sentence. However, other languages might have their own specific quirks that I would need to consider as well (though thankfully, from what I know Italian and Spanish are relatively similar, and French possibly as well.)
So, obviously this is a huge task ahead of me. What other design considerations should I think of? Rightnow, I'm thinking a static class would be easiest. Configure selected language at startup, throw in inputs and hopefully get a string back.
Any ideas (looking to throw ideas around :P)
There's two general ways to approach this: brute force and trying to be clever. Brute force means writing each possible line and including it with your XML files. It's a lot of work, but it will work.
Trying to be clever gets into deep water, fairly fast, particularly if you're trying to cover a whole lot of languages.
You need to keep more information about characters than gender. In Russian, for example, there are different words meaning "you" depending on whether you're being informal or formal (or talking to multiple people), and the verb endings are also different. There are different translations of "please pass the bread" depending on the formality. In other languages, getting the translation right depends on social status.
There are issues, as pawel_dyda pointed out, with singular, plural, and possibly dual case. Other languages also use different word orders: "The arrows are X coppers each, so to buy Y arrows you'll need Z silver" may require you to keep track of the order of the numbers.
Visual C++ and MFC come with internationalization facilities that are actually pretty good. You'd keep the strings in a resource file, and it's possible to substitute numbers and the like in while keeping the order correct for different languages.
Look up "internationalization" (often abbreviated to "i18n") on the web. There's plenty of stuff out there.
As for genders you may try encourage translators to use non-gender specific translations (which is usually possible in business applications but might be impossible here).
You may have also encounter the problem somewhere else. Other (non-English) languages have multiple plural forms. For example: "Your team has acquired 2 swords". No matter how many swords you will actually receive, be it 5 or 1000, in English you will always end up with one plural sentence. But this is not the case in many languages.

Resources