Rails friendly_id with arabic slug - ruby-on-rails

My question is closely related to this one Rails friendly id with non-Latin characters. Following the suggested answer there, I implemented a little bit different solution (I know, it's primitive, but I just want to make sure it works before adding complex behavior).
In my user model I have:
extend FriendlyId
friendly_id :slug_candidates, :use => [:slugged]
def slug_candidates
[
[:first_name, :last_name],
[:first_name, :last_name, :uid]
]
end
def should_generate_new_friendly_id?
first_name_changed? || last_name_changed? || uid_changed? || super
end
def normalize_friendly_id(value)
ERB::Util.url_encode(value.to_s.gsub("\s","-"))
end
now when I submit "مرحبا" as :first_name through the browser, slug value is set to "%D9%85%D8%B1%D8%AD%D8%A8%D8%A7-" in the database, which is what I expect (apart from the trailing "-").
However the url shown in the browser looks like this: http://localhost:3000/en/users/%25D9%2585%25D8%25B1%25D8%25AD%25D8%25A8%25D8%25A7- , which is not what I want. Does anyone know where these extra %25s are coming from and why?

Meanwhile, I came a bit further, so I put my solution here maybe it could be helpful for someone else.
The 25s in the url seem to be the result of url_encoding the '%' in my slug. I don't know where this happens, but I modified my normalize_friendly_id function, so that it doesn't affect me anymore. Here it is:
def normalize_friendly_id(value)
sep = '-'
#strip out tashkeel etc...
parameterized_string = value.to_s.gsub(/[\u0610-\u061A\u064B-\u065F\u06D6-\u06DC\u06DF-\u06E8\u06EA-\u06ED]/,''.freeze)
# Turn unwanted chars into the separator
parameterized_string.gsub!(/[^0-9A-Za-zÀ-ÖØ-öø-ÿ\u0620-\u064A\u0660-\u0669\u0671-\u06D3\u06F0-\u06F9\u0751-\u077F]+/,sep)
unless sep.nil? || sep.empty?
re_sep = Regexp.escape(sep)
# No more than one of the separator in a row.
parameterized_string.gsub!(/#{re_sep}{2,}/, sep)
# Remove leading/trailing separator.
parameterized_string.gsub!(/^#{re_sep}|#{re_sep}$/, ''.freeze)
end
parameterized_string.downcase
end
Some comments on that:
I took only Latin and Arabic alphabets into account
I decided that if I allowed arabic characters in the url, then there is no sense to keep the friendly_id behavior of converting e.g. "ü" to "ue", "ö" to "oe", etc. So I leave such characters in the url.
I tried also to keep characters which might not be used in Arabic, but in other languages which use the Arabic alphabet such as Farsi or Urdu. I speak Arabic only, so I did a guess of which characters might be regarded as regular in other languages. For example is "ڿ" a regular character in any language? I have no idea, but I guess it could well be.
again, since I speak arabic, I stripped the "Tashkil" out of the text. I would say, that texts without tashkil are in general easier to read than the ones with. However, I don't know if I should take care of some similar stuff in other languages. Any hints are much appreciated.
Last: adding another alphabet would be as easy as adding the appropriate sequences to the regex. One only needs to know which characters should be white-listed.
I appreciate any comments or improvement suggestions.

Related

Globally delimit numbers in rails app?

Is it possible to format all numbers in a rails app to be delimited?
I don't really think this is an i18n issue, as I'm fine with the default delimiter/separator characters. I'm simply trying to avoid putting number_with_delimiter(value) all over my views.
I always want numbers to be displayed as delimited. Always.
So far I've tried extending the Fixnum class with code cribbed from the number_with_delimiter method:
class Fixnum
def delimit(delimiter=",", separator=".")
begin
parts = self.to_s.split('.')
parts[0].gsub!(/(\d)(?=(\d\d\d)+(?!\d))/, "\\1#{delimiter}")
parts.join separator
rescue
self
end
end
end
> 98734578.delimit
=> "98,734,578"
> 223.delimit => "223"
So this is a step in the right direction- I like the dot notation as well as the slightly shorter method name. But I'd like to apply this to all instances of a Fixnum inside of a view without having to call .delimit.
Is this a bad idea? Should I be worried about implications this will have on numbers outside of the view context? Is there a better way to accomplish this goal?

Zipcode , to_i and leading zero in Ruby/Rails

I am trying to save zip codes which are passed in the params as "07306", and "03452", but to_i seems to be converting these string values to 7306 and 3452 before validation, because of which the validation keeps failing.
How do I prevent Ruby from removing the leading zeros?
The zip code is an integer field in the database and the validation checks for the format of the zip using:
validates_format_of :zip, :with => /\A[+\-]?\d+\Z/, :message => "Please enter a valid US zipcode"
It makes conceptually no sense to have a leading 0 when talking about an integer. Either format the zipcode when you use it (ie. make sure it has the right format, add a leading 0 when converting from int to str), or save it as a string
Saving it as a string instead will alleviate that issue, and also would help future-proof things if you decide to support foreign ZIP codes which may or may not have letters in them.
Yeah, I agree that it doesn't make sense to store zips as integers for just this reason. I also think you need to be very sure that you won't ever need non-US postal codes in your app. With those caveats out of the way, though...
If you are unable to modify the database for some reason, you can modify the get method for the zip code, like so:
def zip
val = read_attribute(:zip).to_s
val.length < 5 ? add_leading_zeros(val) : val
end
def add_leading_zeros(val)
while val.length < 5 do
val = "0" + val.to_s
end
val
end
It's kind of hacky, and I really don't recommend doing it this way if you can modify the DB field to be a string (varchar).
You might also want to modify the validation that you're using, as it will allow zip codes of less than 5 characters through.
Maybe use something like this:
validates_format_of :zip, :with => /^\d{5}$/
EDIT: I'll leave this answer here, but I just noticed that the OP already changed the field type in the DB... So, yeah, I feel a little silly for having typed all of this now.

Localizing a text field containing a number in Ruby on Rails

I am currently working on a project to internationalize one of our ruby-on-rails web applications so that it can be used in other countries (France will be the first one in this case).
A particular issue I haven't worked out yet is with the displaying of numeric fields. When displaying numbers for display purposes only, I do the following:
<%= number_to_percentage(tax.rate, :precision => 2)%>
In English, this shows 17.50, but in French it shows 17,50 (with a comma in place of the decimal point) which is as expected. The problem comes in the Edit form, when I show a text field
<%= f.text_field :rate, :size => 15 %>
When this renders a text box on the screen, the text box always shows 17.50 with a full stop rather than a comma for French. I am not sure that is correct.
When I tried doing the following:
<%= f.text_field :rate, :size => 15, :value => number_with_precision(f.object.rate, :precision => 2) %>
This did indeed show 17,50 in the text box for French, but when I click on the Update button to save the form, the Ruby validation kicks in and tells me that 17,50 is not a number (or rather it says "n'est pas un nombre"). I have to enter 17.50 to get it to save.
To be honest, I am not entirely sure on the correct thing to do here. Should all countries enter numbers with full stops in text boxes, or is there a way to get Ruby-on-Rails to display commas, and validate them appropriately?
TL;DR
This is the kind of things I hate to do over and over again (I'm serving french users, they're easily confused with dots as the decimal separator).
I exclusively use the delocalize gem now, which does the format translation automatically for you. You just have to install the gem and leave your forms as-is, everything should be taken care of for you.
I like to read, give me the long explanation
The basic conversion is quite simple, you have to convert back and forth between the following formats:
The backend one, which is usually English, used by your persistent storage (SQL database, NoSQL store, YAML, flat text file, whatever struck your fancy, ...).
The frontend one, which is whatever format your user prefers. I'm going to use French here to stick to the question*.
* also because I'm quite partial towards it ;-)
This means that you have two points where you need to do a conversion:
Outbound: when outputting your HTML, you will need to convert from English to French.
Inbound: When processing the result of the form POST, you will need to convert back from French to English.
The manual way
Let's say I have the following model, with the rate field as a decimal number with a precision of 2 (eg. 19.60):
class Tax < ActiveRecord::Base
# the attr_accessor isn't really necessary here, I just want to show that there's a database field
attr_accessor :rate
end
The outbound conversion step (English => French) can be done by overriding text_field_tag:
ActionView::Helpers::FormTagHelper.class_eval do
include ActionView::Helpers::NumberHelper
alias original_text_field_tag text_field_tag
def text_field_tag(name, value = nil, options = {})
value = options.delete(:value) if options.key?(:value)
if value.is_a?(Numeric)
value = number_with_delimiter(value) # this method uses the current locale to format our value
end
original_text_field_tag(name, value, options)
end
end
The inbound conversion step (French => English) will be handled in the model. We will override the rate attribute writer to replace every French separator with the English one:
class Tax < ActiveRecord::Base
def rate=(rate)
write_attribute(:rate, rate.gsub(I18n.t('number.format.separator'), '.')
end
end
This look nice because there's only one attribute in the example and one type of data to parse, but imagine having to do this for every number, date or time field in your model. That's not my idea of fun.
This also is a naïve* way of doing the conversions, it does not handle:
Dates
Times
Delimiters (eg. 100,000.84)
* did I already tell you I like French?
Enter delocalize
Delocalize is working on the same principle I outlined above, but does the job much more comprehensively:
It handles Date, Time, DateTime and numbers.
You don't have to do anything, just install the gem. It checks your ActiveRecord columns to determine if it's a type that needs conversion and does it automatically.
The number conversions are pretty straightforward, but the date ones are really interesting. When parsing the result of a form, it will try the date formats defined in your locale file in descending order and should be able to understand a date formatted like this: 15 janvier 2012.
Every outbound conversion will be done automatically.
It's tested.
It's active.
One caveat though: it doesn't handle client-side validations. If you're using them, you will have to figure out how to use i18n in your favourite JavaScript framework.
This is the gsub technique :
In your model :
before_validation :prepare_number
def prepare_number
self.rate.gsub(/,/, '.') if self.rate.match /,\S/
end

validation of special characters

I want to validate login name with special characters !##S%^*()+_-?/<>:"';. space using regular expression in ruby on rails. These special characters should not be acceptable. What is the code for that?
validates_format_of :username, :with => /^[A-Za-z0-9.&]*\z/
will work
You've received regexps in this thread that answer your specific question. You're doing a black-list approach (blocking the characters you don't want) but is this really what's best? I see you didn't cover & or ~, and there are many other special characters that are likely still missing.
If you're trying to block special characters, I'd suggest a white-list approach, as per pablorc's regexp suggestion. It's far more broad and lists only what you want to allow....non-special characters: only words, underscore and numbers.
I've gone ahead and created a method for you that does a white-list approach using this regexp.
def valid_login?(str)
return true if (/^\w*$/.match(str))
return false
end
This method, valid_login?, returns true only if the string contains letters, numbers, or underscore, so all of your special characters (plus any other you've left out that do not meet these requirements), are safely detected.
Usage:
> valid_login?("testy")
true
> valid_login?("a b")
false
> valid_login?("a'")
false
Well I don't know rails but this is what the regex would look like in every other language I know:
^[^!##\$%\^\*\(\)\+_\-\?/\<\>:"';\. ]$
The regex /^\w*$/ allows to use only letters, numbers, and a underscore.
Also, you have a cheatsheet and a live ruby regexp editor on http://rubular.com
First off, I would recommend using a gem for login, like authlogic.
The gem can be configured to validate the email address. You also get the benefit of not having to worry about authenticating your users, etc.
Very easy gem to work with too.
validates_format_of :username, :with => /^[^!##S%\^\*()\+_-\?\/<>:\"';]+$/

Best way to encode URLs?

I am currently developing a CMS and want to encode special chars in the URL in a nice way.
I don't want to use Rack::Utils.escape.
Is there already a cool gem available?
Best regards
Look at the stringex gem here, it can be used even without rails, but contains some stuff to make it easier to use(with rails).
Ruby's CGI library should do what you need:
url_encoded_string = CGI::escape("'Stop!' said Fred")
# => "%27Stop%21%27+said+Fred"
See http://ruby-doc.org/core/classes/CGI.html
Well, I normally use a handy custom-made method called String.to_slug. I hope you find it useful.
Call this /lib/to_slug.rb and include it in one initializer, or include it only on the model that generates the urls.
String.class_eval do
#converts accented letters into ascii equivalents (eg. ñ becomes n)
def normalize
#this version is in the forums but didn't work for me
#chars.normalize(:kd).gsub!(/[^\x00-\x7F]/n,'').to_s
mb_chars.normalize(:d).gsub(/[^\x00-\x7F]/n,'').to_s
end
#returns an array of strings containing the words on a string
def words
gsub(/\W/, ' ').split
end
#convert into a nice url-ish string
def to_slug(separator='-')
strip.downcase.normalize.words.join(separator)
end
end

Resources