Completely random identifier of a given length - ruby-on-rails

I would like to generate a completely random "unique" (I will ensure that using my model) identifier of a given (the length may varies) length containing numbers, letter and special characters
For example:
161551960578281|2.AQAIPhEcKsDLOVJZ.3600.1310065200.0-514191032|
Can someone please suggest the most efficient way to do that in Ruby on Rails?
EDIT: IMPORTANT:
If it is possible please comment on how efficient your proposed solution is because this will be used every time a user enters a website!
Thanks

Using this for an access token is a different story than UUIDs. You need not only pseudo-randomness but additionally this needs to be a cryptographically secure PRNG. If you don't really care what characters you use (they don't add anything to the security) you could use something as the following, producing a URL-safe Base64-encoded access token. URL-safeness becomes important in case you append the token to URLs, similar to what some Java web apps do: "http://www.bla.com/jsessionid=". If you would use raw Base64 strings for that purpose you would produce potentially invalid URLs.
require 'securerandom'
def produce_token(length=32)
token = SecureRandom.urlsafe_base64(length)
end
The probability of getting a duplicate is equal to 2^(-length). Since the output will be Base64-encoded, the actual output will be 4/3 * length long. If installed, this is based on the native OpenSSL PRNG implementation, so it should be pretty efficient in terms of performance. Should the OpenSSL extension not be installed, /dev/urandom will be used if available and finally, if you are on a Windows machine, CryptGenRandom would be used as fallback. Each of these options should be sufficiently performant. E.g., on my laptop running produce_tokena million times finishes in ~6s.

The best solution is:
require 'active_support/secure_random'
ActiveSupport::SecureRandom.hex(16) # => "00c62d9820d16b52740ca6e15d142854"
This will generate a cryptographically secure random string (i.e. completely unpredictable)
Similarly, you could use a library to generate UUIDs as suggested by others. In that case, be sure to use the random version (version 4) and make sure the implementation uses a cryptosecure random generator.
As anything related to security, rolling your own is not the best idea (even though I succumbed to it too, see first versions! :-). If you really want an homemade random string, here's a rewrite of tybro0103's approach:
require 'digest/sha1'
ALPHABET = "|,.!-0123456789".split(//) + ('a'..'z').to_a + ('A'..'Z').to_a
def random_string
not_quite_secure = Array.new(32){ ALPHABET.sample }.join
secure = Digest::SHA1.hexdigest(not_quite_secure)
end
random_string # => "2555265b2ff3ecb0a13d65a3d177b326733bc143"
Note that it hashes the random string, otherwise it could be subject to attack.
Performance should be similar.

Universally Unique Identifieres - UUIDs are tricky to generate yourself ;-) If you want something really reliable, use the uuid4r gem and call it with UUID4R::uuid(1). This will spit out a uuid based on time and a hardware id (the computers mac address). So it's even unique across multiple machines if generated at the exact same time.
A requirement for uuid4r is the ossp-uuid c library which you can install with the packetmanager of your choice (apt-get install libossp-uuid libossp-uuid-dev on debian or brew install ossp-uuid on a mac with homebrew for example) or by manually downloading and compiling it of course.
The advantage of using uuid4r over a manual (simpler?) implementation is that it is a) truly unique and not just "some sort of pseudo random number generator kind of sometimes reliable" and b) it's fast (even with higher uuid versions) by using a native extension to the c library
require 'rubygems'
require 'uuid4r'
UUID4R::uuid(1) #=> "67074ea4-a8c3-11e0-8a8c-2b12e1ad57c3"
UUID4R::uuid(1) #=> "68ad5668-a8c3-11e0-b5b7-370d85fa740d"
update:
regarding speed, see my (totally not scientific!) little benchmark over 50k iterations
user system total real
version 1 0.600000 1.370000 1.970000 ( 1.980516)
version 4 0.500000 1.360000 1.860000 ( 1.855086)
so on my machine, generating a uuid takes ~0.4 milliseconds (keep in mind I used 50000 iterations for the whole benchmark). hope that's fast enough for you
(following the "benchmark")
require 'rubygems'
require 'uuid4r'
require 'benchmark'
n = 50000
Benchmark.bm do |bm|
bm.report("version 1") { n.times { UUID4R::uuid(1) } }
bm.report("version 4") { n.times { UUID4R::uuid(4) } }
end
Update on heroku: the gem is available on heroku as well

def random_string(length=32)
chars = (0..9).to_a.concat(('a'..'z').to_a).concat(('A'..'Z').to_a).concat(['|',',','.','!','-'])
str = ""; length.times {str += chars.sample.to_s}
str
end
The Result:
>> random_string(42)
=> "a!,FEv,g3HptLCImw0oHnHNNj1drzMFM,1tptMS|rO"

It is a bit trickier to generate random letters in Ruby 1.9 vs 1.8 due to the change in behavior of characters. The easiest way to do this in 1.9 is to generate an array of the characters you want to use, then randomly grab characters out of that array.
See http://snippets.dzone.com/posts/show/491

You can check implementations here I used this one

I used current time in miliseconds to generate random but uniqure itentifier.
Time.now.to_f # => 1656041985.488494
Time.now.to_f.to_s.gsub('.', '') # => "16560419854884948"
this will give 17 digits number
sometime it can give 16 digits number because if last digit after point (.) is 0 than it is ignore by to_f.
so, I used rleft(17, '0')
example:
Time.now.to_f.to_s.gsub('.', '').ljust(17, '0') # => "1656041985488490"
Than I used to_s(36) to convert it into short length alphanumeric string.
Time.now.to_f.to_s.gsub('.', '').ljust(17, '0').to_i.to_s(36) # => "4j26hz9640k"
to_s(36) is radix base (36)
https://apidock.com/ruby/v2_5_5/Integer/to_s
if you want to limit the length than you can select first few digits of time in miliseconds:
Time.now.to_f.to_s.gsub('.', '').ljust(17, '0').first(12).to_i.to_s(36) # => "242sii2l"
but if you want the uniqueness accuracy in miliseconds than I would suggest to have atleast first(15) digits of time

Related

Ruby: how to generate unique alphabetic string in ruby

Is there any in built method in ruby which will generate unique alphabetic string every time(it should not have numbers only alphabets)?
i have tried SecureRandom but it doesn't provide method which will return string containing only alphabets.
SecureRandom has a method choose which:
[...] generates a string that randomly draws from a source array of characters.
Unfortunately it's private, but you can call it via send:
SecureRandom.send(:choose, [*'a'..'z'], 8)
#=> "nupvjhjw"
You could also monkey-patch Random::Formatter:
module Random::Formatter
def alphabetic(n = 16)
choose([*'a'..'z'], n)
end
end
SecureRandom.alphabetic
#=> "qfwkgsnzfmyogyya"
Note that the result is totally random and therefore not necessarily unique.
UUID are designed to have extremely low chance of collision. Since UUID only uses 17 characters, it's easy to change the non-alphabetic characters into unused alphabetic slots.
SecureRandom.uuid.gsub(/[\d-]/) { |x| (x.ord + 58).chr }
Is there any in built method in ruby which will generate unique alphabetic string every time(it should not have numbers only alphabets)?
This is not possible. The only way that a string can be unique if you are generating an unlimited number of strings is if the string is infinitely long.
So, it is impossible to generate a string that will be unique every time.
def get_random_string(length=2)
source=("a".."z").to_a + ("A".."Z").to_a
key=""
length.times{ key += source[rand(source.size)].to_s }
key
end
How about something like this if you like some monkey-patching, i have set length 2 here , please feel free to change it as per your needs
get_random_string(7)
I used Time in miliseconds, than converted it into base 36 which gives me unique aplhanumeric value and since it depends on time so, it will be very unique.
Example: Time.now.to_f.to_s.gsub('.', '').ljust(17, '0').to_i.to_s(36) # => "4j26m4zm2ss"
Take a look at this for full answer: https://stackoverflow.com/a/72738840/7365329
Try this one
length = 50
Array.new(length) { [*"A".."Z", *"a".."z"].sample }.join
# => bDKvNSySuKomcaDiAlTeOzwLyqagvtjeUkDBKPnUpYEpZUnMGF

Strip out thousands delineator specific to the locale

I have a rails app in which users input numbers in large quantities. They often use the thousands delimiter (e.g. 1,000,000,000) to help keep their large numbers human-readable (I don't want to disallow delimiter because doing so would increase the chance of incorrect data).
ActiveSupport/Rails has the handy method number_with_delimiter so that an int 1234567 is displayed as 1,234,567. Is there a method to do the reverse?
note: I don't want to simply strip out a comma, since commas are used as a decimal point in many locales (e.g. European)
To answer your general question, you can determine the "delimiter" (thousands-separator) and the "separator" (decimal separator) from the Rails localization system directly:
I18n.t('number.format.separator') # <= '.' on a US English system
I18n.t('number.format.delimiter') # <= ',' on a US English system
So you can do this:
better = input_string.gsub(I18n.t('number.format.delimiter'), '')
Or, if you prefer be more aggressive and remove all non-numerical and non-decimal input:
better = input_string.gsub(/[^\d#{I18n.t('number.format.separator')}]/, '')
Note, though, that the second example will also remove negative signs, if that matters to you.
It is also worth noting that ActiveRecord will do this for you:
my_model.update_attributes(some_float: "1,234.50") # <= sets some_float to 1234.5

Compressing a hex string in Ruby/Rails

I'm using MongoDB as a backend for a Rails app I'm building. Mongo, by default, generates 24-character hexadecimal ids for its records to make sharding easier, so my URLs wind up looking like:
example.com/companies/4b3fc1400de0690bf2000001/employees/4b3ea6e30de0691552000001
Which is not very pretty. I'd like to stick to the Rails url conventions, but also leave these ids as they are in the database. I think a happy compromise would be to compress these hex ids to shorter collections using more characters, so they'd look something like:
example.com/companies/3ewqkvr5nj/employees/9srbsjlb2r
Then in my controller I'd reverse the compression, get the original hex id and use that to look up the record.
My question is, what's the best way to convert these ids back and forth? I'd of course want them to be as short as possible, but also url-safe and simple to convert.
Thanks!
You could represent a hexadecimal id in a base higher than 16 to make its string representation shorter. Ruby has built-in support for working with bases from 2 up to 36.
b36 = '4b3fc1400de0690bf2000001'.hex.to_s(36)
# => "29a6dblglcujcoeboqp"
To convert it back to a 24-character string you could do something like this:
'%024x' % b36.to_i(36)
# => "4b3fc1400de0690bf2000001"
To achieve better "compression" you could represent the id in base higher than 36. There are Ruby libraries that will help you with that. all-your-base gem is one such library.
I recommend base 62 representation as it only uses 0-9, a-z and A-Z characters which means it is URL safe by default.
Even with base 62 representation you end up with still unwieldy 16-character ids:
'4b3fc1400de0690bf2000001'.hex.to_base_62
# => "UHpdfMzq7jKLcvyr"
Sidestepping Rails convention a bit, another compromise is to use as the "URL id" the base 32 representation of the created_at date of the object.
aCompany.created_at
# => Sat Aug 13 20:05:35 -0500 2011
aCompany.created_at.to_i.to_s(32)
# => "174e7qv"
This way you get super short ids (7 characters) without having to keep track of a special purpose attribute (in MongoMapper, it's a simple matter of adding timestamps! in the model to get automatic created_at and updated_at attributes).
You can use base64 to make it shorter. Make sure that you are using '-' and '_' instead of '+' and '/'. You can also chop of the padding =.
Code to convert from a hex value to base 64
def MD5hex2base64(str)
h1=[].clear
# split the 32 byte hex into a 16 byte array
16.times{ h1.push(str.slice!(0,2).hex) }
# pack (C* = unsigned char), (m = base64 encoded output)
[h1.pack("C*")].pack("m")
end

How to make a Ruby string safe for a filesystem?

I have user entries as filenames. Of course this is not a good idea, so I want to drop everything except [a-z], [A-Z], [0-9], _ and -.
For instance:
my§document$is°° very&interesting___thisIs%nice445.doc.pdf
should become
my_document_is_____very_interesting___thisIs_nice445_doc.pdf
and then ideally
my_document_is_very_interesting_thisIs_nice445_doc.pdf
Is there a nice and elegant way for doing this?
I'd like to suggest a solution that differs from the old one. Note that the old one uses the deprecated returning. By the way, it's anyway specific to Rails, and you didn't explicitly mention Rails in your question (only as a tag). Also, the existing solution fails to encode .doc.pdf into _doc.pdf, as you requested. And, of course, it doesn't collapse the underscores into one.
Here's my solution:
def sanitize_filename(filename)
# Split the name when finding a period which is preceded by some
# character, and is followed by some character other than a period,
# if there is no following period that is followed by something
# other than a period (yeah, confusing, I know)
fn = filename.split /(?<=.)\.(?=[^.])(?!.*\.[^.])/m
# We now have one or two parts (depending on whether we could find
# a suitable period). For each of these parts, replace any unwanted
# sequence of characters with an underscore
fn.map! { |s| s.gsub /[^a-z0-9\-]+/i, '_' }
# Finally, join the parts with a period and return the result
return fn.join '.'
end
You haven't specified all the details about the conversion. Thus, I'm making the following assumptions:
There should be at most one filename extension, which means that there should be at most one period in the filename
Trailing periods do not mark the start of an extension
Leading periods do not mark the start of an extension
Any sequence of characters beyond A–Z, a–z, 0–9 and - should be collapsed into a single _ (i.e. underscore is itself regarded as a disallowed character, and the string '$%__°#' would become '_' – rather than '___' from the parts '$%', '__' and '°#')
The complicated part of this is where I split the filename into the main part and extension. With the help of a regular expression, I'm searching for the last period, which is followed by something else than a period, so that there are no following periods matching the same criteria in the string. It must, however, be preceded by some character to make sure it's not the first character in the string.
My results from testing the function:
1.9.3p125 :006 > sanitize_filename 'my§document$is°° very&interesting___thisIs%nice445.doc.pdf'
=> "my_document_is_very_interesting_thisIs_nice445_doc.pdf"
which I think is what you requested. I hope this is nice and elegant enough.
From http://web.archive.org/web/20110529023841/http://devblog.muziboo.com/2008/06/17/attachment-fu-sanitize-filename-regex-and-unicode-gotcha/:
def sanitize_filename(filename)
returning filename.strip do |name|
# NOTE: File.basename doesn't work right with Windows paths on Unix
# get only the filename, not the whole path
name.gsub!(/^.*(\\|\/)/, '')
# Strip out the non-ascii character
name.gsub!(/[^0-9A-Za-z.\-]/, '_')
end
end
In Rails you might also be able to use ActiveStorage::Filename#sanitized:
ActiveStorage::Filename.new("foo:bar.jpg").sanitized # => "foo-bar.jpg"
ActiveStorage::Filename.new("foo/bar.jpg").sanitized # => "foo-bar.jpg"
If you use Rails you can also use String#parameterize. This is not particularly intended for that, but you will obtain a satisfying result.
"my§document$is°° very&interesting___thisIs%nice445.doc.pdf".parameterize
For Rails I found myself wanting to keep any file extensions but using parameterize for the remainder of the characters:
filename = "my§doc$is°° very&itng___thsIs%nie445.doc.pdf"
cleaned = filename.split(".").map(&:parameterize).join(".")
Implementation details and ideas see source: https://github.com/rails/rails/blob/master/activesupport/lib/active_support/inflector/transliterate.rb
def parameterize(string, separator: "-", preserve_case: false)
# Turn unwanted chars into the separator.
parameterized_string.gsub!(/[^a-z0-9\-_]+/i, separator)
#... some more stuff
end
If your goal is just to generate a filename that is "safe" to use on all operating systems (and not to remove any and all non-ASCII characters), then I would recommend the zaru gem. It doesn't do everything the original question specifies, but the filename produced should be safe to use (and still keep any filename-safe unicode characters untouched):
Zaru.sanitize! " what\ēver//wëird:user:înput:"
# => "whatēverwëirduserînput"
Zaru.sanitize! "my§docu*ment$is°° very&interes:ting___thisIs%nice445.doc.pdf"
# => "my§document$is°° very&interesting___thisIs%nice445.doc.pdf"
There is a library that may be helpful, especially if you're interested in replacing weird Unicode characters with ASCII: unidecode.
irb(main):001:0> require 'unidecoder'
=> true
irb(main):004:0> "Grzegżółka".to_ascii
=> "Grzegzolka"

How do I replace accented Latin characters in Ruby?

I have an ActiveRecord model, Foo, which has a name field. I'd like users to be able to search by name, but I'd like the search to ignore case and any accents. Thus, I'm also storing a canonical_name field against which to search:
class Foo
validates_presence_of :name
before_validate :set_canonical_name
private
def set_canonical_name
self.canonical_name ||= canonicalize(self.name) if self.name
end
def canonicalize(x)
x.downcase. # something here
end
end
I need to fill in the "something here" to replace the accented characters. Is there anything better than
x.downcase.gsub(/[àáâãäå]/,'a').gsub(/æ/,'ae').gsub(/ç/, 'c').gsub(/[èéêë]/,'e')....
And, for that matter, since I'm not on Ruby 1.9, I can't put those Unicode literals in my code. The actual regular expressions will look much uglier.
ActiveSupport::Inflector.transliterate (requires Rails 2.2.1+ and Ruby 1.9 or 1.8.7)
example:
>> ActiveSupport::Inflector.transliterate("àáâãäå").to_s
=> "aaaaaa"
Rails has already a builtin for normalizing, you just have to use this to normalize your string to form KD and then remove the other chars (i.e. accent marks) like this:
>> "àáâãäå".mb_chars.normalize(:kd).gsub(/[^\x00-\x7F]/n,'').downcase.to_s
=> "aaaaaa"
Better yet is to use I18n:
1.9.3-p392 :001 > require "i18n"
=> false
1.9.3-p392 :002 > I18n.transliterate("Olá Mundo!")
=> "Ola Mundo!"
I have tried a lot of this approaches but they were not achieving one or several of these requirements:
Respect spaces
Respect 'ñ' character
Respect case (I know is not a requirement for the original question but is not difficult to move an string to lowcase)
Has been this:
# coding: utf-8
string.tr(
"ÀÁÂÃÄÅàáâãäåĀāĂ㥹ÇçĆćĈĉĊċČčÐðĎďĐđÈÉÊËèéêëĒēĔĕĖėĘęĚěĜĝĞğĠġĢģĤĥĦħÌÍÎÏìíîïĨĩĪīĬĭĮįİıĴĵĶķĸĹĺĻļĽľĿŀŁłÑñŃńŅņŇňʼnŊŋÒÓÔÕÖØòóôõöøŌōŎŏŐőŔŕŖŗŘřŚśŜŝŞşŠšſŢţŤťŦŧÙÚÛÜùúûüŨũŪūŬŭŮůŰűŲųŴŵÝýÿŶŷŸŹźŻżŽž",
"AAAAAAaaaaaaAaAaAaCcCcCcCcCcDdDdDdEEEEeeeeEeEeEeEeEeGgGgGgGgHhHhIIIIiiiiIiIiIiIiIiJjKkkLlLlLlLlLlNnNnNnNnnNnOOOOOOooooooOoOoOoRrRrRrSsSsSsSssTtTtTtUUUUuuuuUuUuUuUuUuUuWwYyyYyYZzZzZz"
)
– http://blog.slashpoundbang.com/post/12938588984/remove-all-accents-and-diacritics-from-string-in-ruby
You have to modify a little bit the character list to respect 'ñ' character but is an easy job.
My answer: the String#parameterize method:
"Le cœur de la crémiére".parameterize
=> "le-coeur-de-la-cremiere"
For non-Rails programs:
Install activesupport: gem install activesupport then:
require 'active_support/inflector'
"a&]'s--3\014\xC2àáâã3D".parameterize
# => "a-s-3-3d"
Decompose the string and remove non-spacing marks from it.
irb -ractive_support/all
> "àáâãäå".mb_chars.normalize(:kd).gsub(/\p{Mn}/, '')
aaaaaa
You may also need this if used in a .rb file.
# coding: utf-8
the normalize(:kd) part here splits out diacriticals where possible (ex: the "n with tilda" single character is split into an n followed by a combining diacritical tilda character), and the gsub part then removes all the diacritical characters.
I think that you maybe don't really what to go down that path. If you are developing for a market that has these kind of letters your users probably will think you are a sort of ...pip.
Because 'å' isn't even close to 'a' in any meaning to a user.
Take a different road and read up about searching in a non-ascii way. This is just one of those cases someone invented unicode and collation.
A very late PS:
http://www.w3.org/International/wiki/Case_folding
http://www.w3.org/TR/charmod-norm/#sec-WhyNormalization
Besides that I have no ide way the link to collation go to a msdn page but I leave it there. It should have been http://www.unicode.org/reports/tr10/
This assumes you use Rails.
"anything".parameterize.underscore.humanize.downcase
Given your requirements, this is probably what I'd do... I think it's neat, simple and will stay up to date in future versions of Rails and Ruby.
Update: dgilperez pointed out that parameterize takes a separator argument, so "anything".parameterize(" ") (deprecated) or "anything".parameterize(separator: " ") is shorter and cleaner.
Convert the text to normalization form D, remove all codepoints with unicode category non spacing mark (Mn), and convert it back to normalization form C. This will strip all diacritics, and your problem is reduced to a case insensitive search.
See http://www.siao2.com/2005/02/19/376617.aspx and http://www.siao2.com/2007/05/14/2629747.aspx for details.
The key is to use two columns in your database: canonical_text and original_text. Use original_text for display and canonical_text for searches. That way, if a user searches for "Visual Cafe," she sees the "Visual Café" result. If she really wants a different item called "Visual Cafe," it can be saved separately.
To get the canonical_text characters in a Ruby 1.8 source file, do something like this:
register_replacement([0x008A].pack('U'), 'S')
You probably want Unicode decomposition ("NFD"). After decomposing the string, just filter out anything not in [A-Za-z]. æ will decompose to "ae", ã to "a~" (approximately - the diacritical will become a separate character) so the filtering leaves a reasonable approximation.
iconv:
http://groups.google.com/group/ruby-talk-google/browse_frm/thread/8064dcac15d688ce?
=============
a perl module which i can't understand:
http://www.ahinea.com/en/tech/accented-translate.html
============
brute force (there's a lot of htose critters!:
http://projects.jkraemer.net/acts_as_ferret/wiki#UTF-8support
http://snippets.dzone.com/posts/show/2384
I had problems getting the foo.mb_chars.normalize(:kd).gsub(/[^\x00-\x7F]/n,'').downcase.to_s solution to work. I'm not using Rails and there was some conflict with my activesupport/ruby versions that I couldn't get to the bottom of.
Using the ruby-unf gem seems to be a good substitute:
require 'unf'
foo.to_nfd.gsub(/[^\x00-\x7F]/n,'').downcase
As far as I can tell this does the same thing as .mb_chars.normalize(:kd). Is this correct? Thanks!
If you are using PostgreSQL => 9.4 as your DB adapter, maybe you could add in a migration it's "unaccent" extension that I think does what you want, like this:
def self.up
enable_extension "unaccent" # No falla si ya existe
end
In order to test, in the console:
2.3.1 :045 > ActiveRecord::Base.connection.execute("SELECT unaccent('unaccent', 'àáâãäåÁÄ')").first
=> {"unaccent"=>"aaaaaaAA"}
Notice there is case sensitive up to now.
Then, maybe use it in a scope, like:
scope :with_canonical_name, -> (name) {
where("unaccent(foos.name) iLIKE unaccent('#{name}')")
}
The iLIKE operator makes the search case insensitive. There is another approach, using citext data type. Here is a discussion about this two approaches. Notice also that use of PosgreSQL's lower() function is not recommended.
This will save you some DB space, since you will no longer require the cannonical_name field, and perhaps make your model simpler, at the cost of some extra processing in each query, in an amount depending of whether you are using iLIKE or citext, and your dataset.
If you are using MySQL maybe you can use this simple solution, but I have not tested it.
lol.. i just tryed this.. and it is working.. iam still not pretty sure why.. but when i use this 4 lines of code:
str = str.gsub(/[^a-zA-Z0-9 ]/,"")
str = str.gsub(/[ ]+/," ")
str = str.gsub(/ /,"-")
str = str.downcase
it automaticly removes any accent from filenames.. which i was trying to remove(accent from filenames and renaming them than) hope it helped :)

Resources