In my Rails 3 application, users may write messages in forum. I would like to identify what the language is for a given message. I'm interested in English, Russian, and Hebrew languages. Is there any built-in library in Ruby/Rails for such a task? If not, any ideas will be appreciated.
Use this: https://github.com/nashby/wtf_lang
"ruby is so awesome!".lang # => "en"
"ruby is so awesome!".full_lang # => "ENGLISH"
You can use the api provided by google to guess it with google translate.
See here for documentation : http://code.google.com/apis/language/translate/v1/using_rest_langdetect.html
Since you're concerned with languages with different character sets you could dig up the character codes that are predominantly in your strings. You could then see if they fall into the code sets that represent hebrew / cryllic characters.
Perhaps you could look at the whatlanguage gem?
Take a look at this blog
http://blog.kenweiner.com/2008/04/server-side-language-detection-with.html
This may be helpful
Language Detection API provides Ruby GEM to detect language.
Just a quick demo of WhatLanguage for anyone interested : http://www.youtube.com/watch?v=lNqZ2cqOReo&list=UUJ_3fstMOH-g4yBxtvgAWkw&index=0&feature=plcp
http://rubygems.org/gems/prose Prose dose it without a gem. Try it.
Related
Is there a way with the API to convert/translate Revit standard terms such as 'Insulation', '3D view', 'View Templates', 'Detail Level' and other baked-in terms to a given language (such as German, Russian , Chinese, etc.)? I'd like to ensure that the messages I provide in my localized add-in use terms that the user is familiar with (with regard to Revit).
I think Jeremy's answer is probably the way to go for a comprehensive approach.
However - if you're looking for something more self-contained and quick-and-dirty, you could try the LabelUtilities class in the Revit API. :)
The LabelUtilties lets you look up the translated value of all of the thousands of builtin parameters, parameter groups, unit types, etc).
All of the pieces of text that you mentioned above are available as BuiltInParmater translations (although, admittedly, some are not available as plurals).
For example:
LabelUtils.GetLabelFor( BuiltInParameter.RBS_WIRE_INSULATION_PARAM );
==> "Insulation" in English.
(You can see all of the translated English BuiltInParameters in the Revit API reference under the BuiltInParameters page).
Good Luck!
Matt
The Autodesk localisation team uses a cross product corpus database NeXLT for terminology and message translation:
http://langtech.autodesk.com/nexlt
This link is accessible from outside the company and translation companies working with the localisation team around the world make use it for translating products for Autodesk platforms.
This answer is already published with a little more background on The Building Coder blog:
http://thebuildingcoder.typepad.com/blog/2014/10/autodesk-open-source-all-over-germany-and-japan.html#4
I am writing documentation in language other than English (actually Slovak). I do not want the words like Content, Note, Caution to appear in my Slovak documentation, instead of that, I want to have Obsah, Poznámka, Pozor.
After some time googling, I was unable to find a way to do it. Could you give me an advice please?
In you conf.py you can set the language in this section:
# The language for content autogenerated by Sphinx. Refer to documentation
# for a list of supported languages.
language = 'nl'
The list of supported languages, however, does not show support for Slovak yet. But there are good instructions to create your own internationalization files so that you can ad Slovak to the list.
What is the best way to determine the language of twitter posts.
There is the language parameter that comes with the streaming API but it doesn't really seem to be very accurate. Even many Japanese posts are labelled as English.
What have others done to sort out the langauges?
I've had very good results with this PHP package:
http://pear.php.net/package/Text_LanguageDetect/
It is fast and open source. We use it to select English only posts for a site we run at http://2012twit.com.
google have language detection within their Translate API if using evil external services is a go-er?
http://code.google.com/apis/language/translate/v1/reference.html#detectResult
Supposing a guy doesn't want users to include months or days or anything that remotely sounds like a time in his text area form. Could he use something besides regex to validate that? Is there a built-in API for that? I do use the Chronic gem somewhere else, but if I used it for this purpose, it would probably be out-of-the-box.
Here's an example .
I'm offering this wonderful sheep and it's only good until June.
Blam! Can't have this then because they said the word, 'June'. Can Ruby detect that?
This doesn't answer my question, but this is the regex alternative. I'm posting it in case anyoen else might need it.
/|\/|[Jj][Aa][Nn][Uu][Aa][Rr][Yy]|[Jj][Aa][Nn](\s|\.)|[Ff][Ee][Bb][Rr][Uu][Aa][Rr][Yy]|[Ff][Ee][Bb](\s|\.)|[Mm][Aa][Rr][Cc][Hh]|[Mm][Aa][Rr](\s|\.)|[Aa][Pp][Rr][Ii][Ll]|[Aa][Pp][Rr](\s|\.)|[Jj][Uu][Nn][Ee]|[Jj][Uu][Ll][Yy]|[Aa][Uu][Gg][Uu][Ss][Tt]|[Aa][Uu][Gg](\s|\.)|[Ss][Ee][Pp][Tt][Ee][Mm][Bb][Ee][Rr]|[Ss][Ee][Pp][Tt](\s|\.)|[Nn][Oo][Vv][Ee][Mm][Bb][Ee][Rr]|[Nn][Oo][Vv](\s|\.)|[Dd][Ee][Cc][Ee][Mm][Bb][Ee][Rr]|[Dd][Ee][Cc](\s|\.)|[Ss][Uu][Nn]\.|[Ss][Uu][Nn][Dd][Aa][Yy]|[Mm][Oo][Nn][Dd][Aa][Yy]|[Mm][Oo][Nn](\s|\.)|[Tt][Uu][Ee][Ss][Dd][Aa][Yy]|[Tt][Uu][Ee][Ss](\s|\.)|[Ww][Ee][Dd][Nn][Ee][Ss][Dd][Aa][Yy]|[Ww][Ee][Dd](\s|\.)|[Tt][Hh][Uu][Rr][Ss][Dd][Aa][Yy]|[Tt][Hh][Uu][Rr](\s|\.)|[Ff][Rr][Ii][Dd][Aa][Yy]|[Ff][Rr][Ii](\s|\.)|[Ss][Aa][Tt][Uu][Rr][Dd][Aa][Yy]|[Ss][Aa][Tt](\s|\.))/
I didn't include the word 'May or may', and Sun but it checks for Sun.
I want my Rails App to parse external websites for a trackback URL but I'm not really sure if I should just look for a
Text
or follow the RDF specifications described by sixapart. Or both. Wordpress and Techcrunch both only offer a rel="trackback" link and they should know. On the other hand maybe some blog only provides RDF and I'm missing the link.
What do you think?
And is there any ready gem/plugin out there (it's really hard to google for trackback...)
Thanks.
UPDATE
I'm now first trying to find the RDF information. If I do not find anything, I look for the link tag. I was refering to the sixapart specifications. Thanks for your help.
I'm checking for both now (first RDF, then link if not successfull). I was refering to the sixapart specifications. Thanks for your help!