Stemming in other languages through the API - eventbrite

Does the Eventbrite SE stem words in languages other than English? If so, what languages are they and how is the language determined?

I did a quick check using the interactive docs here:
http://developer.eventbrite.com/doc/events/event_search/
Spanish-language stemming seems to work in many cases - since many Spanish words are pluralized by adding an "s" to the end. A search for "gato" and "gatos" both return the same result list.
From what I understand, SOLR's basic stemming support will automatically try removing or adding the letter 's' to the end of most search keywords. I think it does the same for other common english-language word extensions, like 'ing'.
Eventbrite is putting a lot of focus on i18n, and support for international events.
I would try testing several words from the languages that you would like to support. If you find the support for stemming in a particular language to be lacking, please pass your feedback on to the Eventbrite support team.

Related

How do I specify language when storing strings?

I'm currently developing a system that supports several languages. I want to specify these languages as precisely as possible in the database in case of future integrations. (Yes I know it's a bit YAGNI)
I've found several ways to define a language
nb-NO
nb_NO
nb-no
nb_no
nb
These can all mean "Norwegian Bokmål". Which one, if any, is the most correct?
The Locale article on the ArchLinux Wiki specifies a Locale as language[_territory][.codeset][#modifier]. The codeset and modifier I guess are only relevant for input. But language is a minimum and territory may be nice to have should we implement cultural differences regarding currency and decimal points etc.
Am I overthinking it?
Look at BCP 47
https://tools.ietf.org/html/bcp47
In this day and age you would need at to support at least language, script, region (only language being mandatory to be present)
It depends a lot what you use this tags for.
If it is spoken content you might care about dialect (for instance Cantonese vs Mandarin Chinese), but not script. In written form you will care about script (Traditional vs. Simplified Chinese), but not dialect.
It also matters a lot the complete stack you use to process things. You might use minus as separator, use grandfathered ids, or the -u- extension (see bcp,) then discover that you use a programming language that "chokes" on it. Or you use "he" for Hebrew, but your language (cough Java cough) wants the deprecated "iw"
So you might decide to use the same locale id as your tech stack, or have a "conversion layer".
If you want things accessible from several technologies, then conversion layers is your only (reasonable) option.

Internationalization and For Program Localization. i18n

I have several projects I've worked on that are setup for internationalization.
From the programming perspective, I have everything pretty much setup and put all of the string into an xml file or properties file. I wish to get these files translated into other languages, such as: Italian (it), Spanish (es), Germany (de), Brazillian Portugese (pt-br), Chinese Simplified (zh-cn), Chinese Traditional (zh-tw), Japanese (ja), Russian (ru), Hugarian (hu), Polish (pl), and French (fr).
I've considered using services like google translate, however I feel that this automatic translation tools are still a bit weak.
In summsary, I'm curious on if others have used professional translation services for their programs, if so which ones would people recommend and how did you coordinate the translation updates with the translation teams? Any idea on what I should expect to pay? Or is there a better way of doing this that I'm not aware of?
Machine translation services like Google, Bing etc. are not a good choice. As you mention, these services are in reality still in their infancy, and more importantly using them will most likely give your non-English customers a bad impression of your application.
If you want top quality translation, you will need to employ the services of a professional translation agency. Translators need to understand your application in order to translate the text correctly, so providing them with the application itself or screen captures of the English product will help.
You will pay per word - the rates vary from agency to agency, and also from language to language.
The other alternative is using crowd-sourced translations, from GetLocalization for example.
To summarize, proper localization is not just a matter of translating the text - you need to build a relationship with your translators, and ensure they understand your application and the context of the strings that they are translating, otherwise you will end up with a linguistically poor application, that will reflect badly on your company.

Would Google Translate do a good job for translating my software?

Just wondering how users translate their software, I've just finished making my software able to use different language but I'm not sure that Google translate is accurate enough.
No; I wouldn't recommend it.
Machine translators will not handle short domain-specific strings very well.
Your UI is likely to have non-standard words or usages that the translator will choke on or mistranslate.
Also, machine translations tend to look very unprofessional.
I don't think Google Translate would be sufficient, I would recommend to get a collaborator that knows your language and the language that you want to translate to.
Google translate might work for lists of words but will make a mess of sentences and make your software look really bad. Better to hire or talk nicely to a native speaker to get your sentences translated.

Resources for translating common programming terms for UI into other languages

I'm not even sure this is entirely programming related...but here goes:
I need to translate some forms into different languages, specifically Spanish and French. Obviously, it would be good if I knew these other languages fluently, but I don't. Besides doing a Google translate, babel fish, etc. are there any resources which can assist in this? Mainly I am trying to find out what the translation of OK and Cancel are.
Moreover, I looked to find some programs which have the UI written in these other languages and all I could find were language learning programs.
How do other programmers handle doing this?
Take a look at the Pootle Terminology project:
http://pootle.locamotion.org/projects/terminology/
and Microsoft's UI translations:
http://www.microsoft.com/Language/en-US/Translations.aspx
Both provide translations for common UI terminology in a range of languages.
If you want to do it right, it seems like you need to hire a person to translate for you. Preferably a native speaker. I'm sure you can find some services through a quick google search.
The best resource for this is probably a translator who speaks the language in question fluently and has experience translating user interfaces.
Here's just one example of a company that provides this service:
http://www.ricintl.com/software-localization-services.htm
For spanish is ok to user: "Ok and "Cancelar"

How can I implement a semantic ontology in Ruby on Rails?

I'm working on a "twitter filter" - more to learn ruby on rails than anything else. The idea is that I use a semantic ontology to lookup a users interests. So if a user says they're interested in "sports" that means flag any tweets that discuss "sports" "golf" "football" and so on.
I'd like to be able to expand it to any hierachial of topics, though. So if you're interested in Europe flag all the countries in Europe.
Naturally this is rather complex, so maybe we'd limit it to one or two "levels" of lookup...
How could I do this efficently? I'm pretty familiar with Java, C and Ruby, and have worked a lot with MySQL.
I'd look into Doug Lenat's Cyc. It's done and open.
I'm not sure if it will help you, but Google has something called Google Sets. You can look on it here: http://labs.google.com/sets
Before you think about programming languages and technology, think about this: What kind of datastructure is a "semantic onthology"?
To me that sounds like some kind of a directed graph.
Knowing that, you'll soon find out, that it's quite easy to implement such a structure in whatever language and technology you want and that a lot of languages already have implemented some kind of a graph library (e.g. RGL for Ruby).
To me the real problem isn't how to implement such a datastructure and how to do this efficiently but how to get the semantic information you need out of twitter to build this (e.g. who tells your application that europe isn't a part of spain but that spain is a part of europe?).
Anyway, have fun implementing it, sounds like a cool project! :-)
I'm not sure what your requirements are. But it seems that either Singular Value Decomposition (SVD) or Support Vector Machines (SVM) will work for you.

Resources