Phonetic translation from Latin (English, German) to Arabic - machine-learning

I read a few papers about machine translation but did not understand them well.
The language models (in Google translate) use phonetics and machine learning as best as I can tell.
My question then becomes is it possible to convert an Arabic word that is phonetically spelled in English to translate the users intended Arabic word?
For instance the word 'Hadith' is an English phonetic of the Arabic word 'حديث'. Can I programmatically go from 'Hadith' to Arabic?

Thanks the Wiki article, there's an entire field of work in the area of Transliteration. There was a Google API for this that was deprecated in 2011 and moved to the Google Input Tools service.
The simplest answer is Buck Walter Transliteration but at first glace a 1:1 mapping doesn't seem like a good enough idea.
I am going to try to see if there's a way to hack the Google Input tools and call it even at CLI level because their online demo works very well

Related

Web Page From English To Urdu converter

I need to convert my website's pages from English to the Urdu language. For this I was using Google's Translation API, but Google translate API is not returning the correct translation of the pages.
What should I to use to get 99% accurate results when translating pages from English Language to Urdu Language?
There are only few parameters that you can specify when using Google Translate API and that can make a difference to your results: source and model parameters:
Source is the language of the source text. If you don't specify it then it will be detected automatically. As your source language is English then I don't think this will be causing any troubles.
Model: As Urdu language is supported by the Neural Machine Translation Model, if you don't specify the model, then nmt model will be used. You can try to use base model, however the nmt one is supposed to "provide improved translation for longer and more complex content".
Maybe expecting the model to get 99% accuracy is expecting it to be almost perfect.

Detect when to use a vs an

I have a service that allows user's (admins) to change the terminology the site uses. My designer wants me to use the format "A Group". The problem is, for some terminology, it should be "An" not "A".
Is there any way to reliably detect which to use? What about localization?
I can brute force it and get 90% of the way by checking the first letter for consonant vs vowel. That won't work for all words though. And that doesn't cover any language except English.
In my opinion you've got only 2 ways:
1- You need to check the first letter and process all the sentence by checking its letters to see if there is any non-English letters.
2- Provide a dictionary of English nouns then you can easily check your word to find if it needs an "a" or "an".
Although the "a versus an" issue is very specific, what you're describing here is a natural language processing issue. Essentially you are being asked to write code that generates a grammatically correct piece of text.
I think you should try to to explain the implications to the designer, especially if you end up localizing in other languages. Your time is probably better spent working on your app's business logic than on language processing.

Tategaki (japanese vertical writing) in iOS apps

Is there a user control (standard or third-party) for iOS that allows to display vertical text of East Asian languages? I also need to display a ruby characters (furigana/reading aid) near the text. Result should look like this http://img23.imageshack.us/img23/3262/img0088xa.jpg (japanese iBooks screenshot)
At this time you will need Core Text or a view using Core Text.
Github search fails but googling in Japanese wins.
http://cocoadays-info.blogspot.jp/2012/01/coretexttextview-lccoretext.html
Blog article in Japanese on this
https://github.com/novi/LTCoreText
Should do the trick.
Too bad github search doesn't find it.
Google translate may or may not help. I've forked it just now and will translate the read me soon.
Also found https://github.com/hokuron/CTRVerticalTextView
Though it seems fairly unfinished and it's owner's blog seems down.
A Japanese site has this nifty page of bookmarks on the topic.
http://b.hatena.ne.jp/Watson/iOS/CoreText/

Language codes reference

Can someone please give me the official reference to the language (country/region) codes. I'm finding different codes for the same language (es_ES, esp_ESP, etc.) and I can't figure out which one is the right one.
There are several different standards specifying language codes, including ISO-639 with its sub-standards 1-3 and IETF language tags, which describe more of a system of possible codes than the codes themselves.
Which standard is "the right" standard depends on your use case and context. See http://en.wikipedia.org/wiki/Language_codes.
That's because the languages naming coding has different standards, using different number of letters. You might have to chose which standard to use and maybe detect which standard the data source you have is using.
This is a starting point: http://en.wikipedia.org/wiki/List_of_ISO_639-1_codes
These codes are a combination of the specific language as well as the conuntry in which the language is used. So for instance means es_ES spanish_Spain. Another one would be es_AR which would mean spanish_Argentina. For the language code there's the Language Matrix, as for the localisation part you could use the ISO 3166-2 country reference
you can find all the regions code in documentation here

free to use, in a programmer-friendly format, dictionaries for european languages

I want to experiment with an idea I have of automatically localizing software, or at least suggesting a reasonable translation if a localized string is not available.
I'm not sure this will be working satisfactorily tomorrow morning but I just wanted to play with this idea.
Does anybody know of a dictionary that is free to use, and is in an easy to parse format, that can help me automatically translate words from English to other European languages (French, German, Spanish, etc)
The FreeDict project has quite a few relatively complete dictionaries. Most are from one language to english or vice versa, but some are between two non-english languages as well.
I don't know any dictionary but would like to point something out. You have to bear in mind that translating is not a direct word to word technique in any sense. The Rules of the language change as well and thus leave sentences unreadable. This is why even companies like Google have trouble making good translation software. Context is very hard to programmatically detect and context means everything in choosing the right word, the right structure and so on.
Maybe use a Translation API, if there is one. Google only seem to do a JavaScript API for Language.
You can't even expect to get a reasonable translation with an automatic method. Translating full texts is too hard for a computer to handle completely correct, translating short phrases correctly is impossible.
Take for example the simple text "Open", without a context it's not even possible to tell if it's a verb or an adjective. I know that at least in german that the verb and the adjective translates into two different words.
Also, computer specific concepts often borrow words from similar concepts outside the computer sphere. Those concepts often have a specific translation, but an automatic translation would sometimes try to translate it as if it was the original meaning, which can give you very strange translations.
After a while of searching i solved the problem by myself start to create my own dictionary. I do a lot of translations in my free time. In the beginning it is really boring work...but after a while you get an really good dicitionary. Some friends of mine using it too...so we all benefit from every new Word we translate.

Resources