What's the best way to order list of countries in French from a usability standpoint? - localization

I have a list of countries that I would like to display in a dropdown menu. Now, because of their french translations, the list either needs to be re-ordered, or countries need to be rewritten.
For example, Canary Islands is translated into Îles Canaries in french. Should I re-order the list so that all Îles are grouped togheter? Or should I write it as Canaries, Îles. Additionally, will people be able to navigate to Îles by typing the accented Î?

I agree it should be Isles first - stick with how someone would say the name. For example United States is Etas-Unis - or States United - but you wouldn't reverse the order to match English convention.

Why would you go for "Canaries, Îles"? Just to keep the English order?
Imagine that an English list would have that as "Islands, Canary"
Translate the way a native would expect it (in French)
Sort it following the French rules
Find it when the user types Î, and I
In general internationalisation is not hard: just turn the table and think what would you want if a German or French software is translated into English.

Related

Is it possible to exclude only French tweets from a Twitter advanced search engine?

To find only tweets in French, i use:
find this words lang:fr
But what if I want to exclude only French from the results and find all the others?
very easy, just put a minus - before the lang:fr operator to exclude it :-)
find this words -lang:fr
If i'm not mistaken all of Twitter's Advanced Search Operators, like filter:, near:, within:, include: and many more, can be negated to exclude results corresponding to the specific operator.

Where is the full list of exemplar cities for English in the CLDR time zone xml?

There are a couple exemplar cities and metazone names in core/common/main/en.xml from CLDR, however, the full list is not included in en.xml like there is in all the other languages.
Why is this and where do I find the entire list of exemplar cities for English?
I found the answer in the CLDR bug tracker.

What would be the best search engine design for multi language search?

I have a database in which I store over 3 million documents with titles in different languages.
Each document has the following (simplified) structure:
{name: "The Intouchables",
detail: {
original_title: "Intouchables"
spanish_title: "Intocable"
}
}
My users search either in Spanish or English. Text index feature in Mongodb enables you to specify the language for each document and a default language. Having this into account, how would you design a great search engine that:
Searches fast (I would like to incorporate autocompletion soon) for titles
Is accurate
User can search in English or Spanish
For the time being, I would like to adjust to what Mongodb brings to the table, but I'm open to other technologies if they really change the game (Redis, Elasticsearch, etc.)
Some work I've already done:
I have indexed all my documents with default_language "none". That's inefficient due to the great amount of potential stop words that are stored. If I set default_language as English or Spanish, results are not accurate due to stop-word matching (it gives irrelevant results: for example, giving good score for titles with word "The"...a lot).
Some ideas:
Using mongoid_search (keywords based on specified fields) and make _keywords field text indexed.
Specify language override for titles in spanish. Make both English and Spanish search (two queries) and intersect results (not big fan).

Localized country names

Where can I get the country names in all languages? I need these to localize an application.
The proper location to get this information from is CLDR - Unicode Common Locale Data Repository.
There you can find an updated list of countries (core/common/main), the data is available in numerous formats.
I recommend this site: https://github.com/umpirsky/country-list
List of all countries with names and ISO 3166-1 codes in all languages
and data formats.
There's probably an ISO standard document you can buy (a useful standard is ISO 3166-1, I think).
On the other hand, you might just be able to scrape through the various language versions of this wikipedia page, since it has a list of country names. I did a random check and it seemed the entire list was in at least one non-English language, too.
A know this is an old post but I found something that might help others who end up viewing this post via a google search.
This alternative to a select list gives (some) localised country names.
selectToAutocomplete by Jamie Appleseed
Take a look at the data-alternate-spelling tag for the items within the select menu.
IP2Location provide a free CSV formatted list of country names in 81 different languages. I've found this the most useful list for this purpose. The data can be fairy easily transformed into different formats if required:
https://www.ip2location.com/free/country-multilingual

User input parsing - city / state / zipcode / country

I'm looking for advice on parsing input from a user in multiple combinations of City / State / Zip Code / Country.
A common example would be what Google maps does.
Some examples of input would be:
"City, State, Country"
"City, Country"
"City, Zip Code, Country"
"City, State, Zip Code"
"Zip Code"
What would be an efficient and correct way to parse this input from a user?
If you are aware of any example implementations please share :)
The first step would be to break up the text into individual tokens using spaces or commas as the delimiting characters. For scalability, you can then hand each token to a thread or server (if using a Map-Reducer like architecture) to figure out what each token is. For instance,
If we have numbers in the pattern, then it's probably a zip code.
Is the item in the list of known states?
Countries are also fairly easy to handle like states, there's a limited number.
What order are the tokens in compared to the common ways of writing an address? Most input will probably follow the local post office custom for address formats.
Once you have the individual token results, you can glue the parts back together to get a full address. In the cases where there are questions, you can prompt the user what they really meant (like Google maps) and add that information to a learned list.
The easiest method to add that support to an applications, assuming you're not trying to build a map system, is to query Google or Yahoo and ask them to parse the date for you.
I am myself very fascinated with how Google handles that. I do not remember seeing anything similar anywhere else.
I believe, you try to separate an input string in words trying various delimeters - space, comma, semicolon etc. Then you have several combinations. For each combination, you take each words and match it against country, city, town, postal code database. Then you define some metric on how to evaluate the group match result for each combination. Here should also be cross rules, like if the postal code does not match well, but country, city, town match well and in combination refer to a valid address then the metric yields a high mark.
It is sure difficult and not an evening code exercise. It also requires strong computational resources - a shared hosting would probably crack under just 10 requests, but a data center could serve it well.
Not sure if there is an example implementation. Many geographical services are offered on paid basis. Something that sophisticated as GoogleMaps would likely cost a fortune.
Correct me if I'm wrong.
I found a simple PHP implementation
http://www.eotz.com/2008/07/parsing-location-string-php/
Yahoo seems to have a webservice that offers the functionality (sort of)
http://developer.yahoo.com/geo/placemaker/
Openstreetmap seems to offer the same search functionality on its homepage
http://www.openstreetmap.org/
Assuming you're only dealing with those four fields (City Zip State Country), there are finite values for all fields except for City, and even that I guess if you have a big city list is also finite. So just split each field by comma then check against each field list.
Assuming we're talking US addresses-
Zip is most obvious, so check for
that first.
State has 50x2 options
(California or CA), check that next
Country has ~190x2 options, depending
on how encompassing you want to be
(US, United States, USA).
Whatever is left over is probably your City.
As far as efficiency goes, it might make sense to check a handful of 'standard' formats first, like Dan suggests.

Resources