Generating unique token on the fly with Rails - ruby-on-rails

I want to generate a token in my controller for a user in the "user_info_token" column. However, I want to check that no user currently has that token. Would this code suffice?
begin
#new_token = SecureRandom.urlsafe_base64
user = User.find_by_user_info_token(#new_token)
end while user != nil
#seller.user_info_token = #new_token
Or is there a much cleaner way to do this?

If your token is long enough and generated by a cryptographically secure [pseudo-]random number generator, then you do not need to verify that the token is unique. You do not need to generate tokens in a loop.
16 raw source bytes is long enough for this effective guarantee. When formatted for URL-safety, the result will be longer.
# Base-64 (url-safe) encoded bytes, 22 characters long
SecureRandom.urlsafe_base64(16)
# Base-36 encoded bytes, naturally url-safe, ~25 characters long
SecureRandom.hex(16).to_i(16).to_s(36)
# Base-16 encoded bytes, naturally url-safe, 32 characters long
SecureRandom.hex(16)
This is because the probability that the 16-byte or 128-bit token is nonunique is so vanishingly small that it is virtually zero. There is only a 50% chance of there being any repetitions after approximately 264 = 18,446,744,073,709,551,616 = 1.845 x 1019 tokens have been generated. If you start generating one billion tokens per second, it will take approximately 264/(109*3600*24*365.25) = 600 years until there is a 50% chance of there having occurred any repetitions at all.
But you're not generating one billion tokens per second. Let's be generous and suppose you were generating one token per second. The time frame until a 50% chance of even one collision becomes 600 billion years. The planet will have been swallowed up by the sun long before then.

The cleanest solution I found:
#seller.user_info_token = loop do
token = SecureRandom.urlsafe_base64
break token unless User.exists?(user_info_token: token)
end
And something very clean but with potential duplicates (very few though):
#seller.user_info_token = SecureRandom.uuid
Random UUID probability of duplicates
Edit: of course, add a unique index to your :user_info_token. It will be much quicker to search for a user with the same token and it will raise an exception if by chance, 2 users are saved at the exact same moment with the exact same token!

I have many models I apply unique tokens to. For this reason I've created a Tokened concern in app/models/concerns/tokened.rb
module Tokened
extend ActiveSupport::Concern
included do
after_initialize do
self.token = generate_token if self.token.blank?
end
end
private
def generate_token
loop do
key = SecureRandom.base64(15).tr('+/=lIO0', 'pqrsxyz')
break key unless self.class.find_by(token: key)
end
end
end
In any model I want to have unique tokens, I just do
include Tokened
But yes, your code looks fine too.

Rails 5 comes with this feature, you only need to add to your model the next line:
class User
has_secure_token
end
Since Rails 5 is not releases yet you can use the has_secure_token gem. Also you can see my blog post to see more info about it https://coderwall.com/p/kb97gg/secure-tokens-from-rails-5-to-rails-4-x-and-3-x

Maybe you can do something using the actual time. Then you won't need to check if the token was already used by an user.
new_token = Digest::MD5.hexdigest(Time.now.to_i.to_s + rand(999999999).to_s)
user.user_info_token = new_token

You can try some below tricks to get unique token, its so easy which I used in my project -
CREDIT_CHARS = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789"
def create_credit_key(count = 25)
credit_key = ""
key = CREDIT_CHARS.length
for i in 1..count
rand = Random.rand((0.0)..(1.0))
credit_key += CREDIT_CHARS[(key*rand).to_i].to_s
end
return credit_key
end
Using digest it is again more easy, here I tried to generate without using any algorithm.

Related

spaCy: optimizing tokenization

I'm currently trying to tokenize a text file where each line is the body text of a tweet:
"According to data reported to FINRA, short volume percent for $SALT clocked in at 39.19% on 12-29-17 http://www.volumebot.com/?s=SALT"
"#Good2go #krueb The chart I posted definitely supports ng going lower. Gobstopper' 2.12, might even be conservative."
"#Crypt0Fortune Its not dumping as bad as it used to...."
"$XVG.X LOL. Someone just triggered a cascade of stop-loss orders and scooped up morons' coins. Oldest trick in the stock trader's book."
The file is 59,397 lines long (a day's worth of data) and I'm using spaCy for pre-processing/tokenization. It's currently taking me around 8.5 minutes and I was wondering if there were any way of optimising the following code to be quicker as 8.5 minutes seems awfully long for this process:
def token_loop(path):
store = []
files = [f for f in listdir(path) if isfile(join(path, f))]
start_time = time.monotonic()
for filename in files:
with open("./data/"+filename) as f:
for line in f:
tokens = nlp(line.lower())
tokens = [token.lemma_ for token in tokens if not token.orth_.isspace() and token.is_alpha and not token.is_stop and len(token.orth_) != 1]
store.append(tokens)
end_time = time.monotonic()
print("Time taken to tokenize:",timedelta(seconds=end_time - start_time))
return store
Although it says files, it's currently only looping over 1 file.
Just to note, I only need this to tokenize the content; I don't need any extra tagging etc.
It sounds like you haven't optimised the pipeline yet. You'll get a significant speed up from disabling the pipeline components you don't need, like so:
nlp = spacy.load('en', disable=['parser', 'tagger', 'ner'])
This should get you down to about the two-minute mark, or better, on its own.
If you need a further speed up, you can look at multi-threading using nlp.pipe. Docs for multi-threading are here:
https://spacy.io/usage/processing-pipelines#section-multithreading
You can use nlp.pipe(all_lines) instead of nlp(line) for a faster processing
see Spacy's documentation - https://spacy.io/usage/processing-pipelines

Why is this RegExp taking 16 minutes to process on Rails?

I've written a function to remove email addresses from my data using gsub. The code is below. The problem is that it takes a total of 27 minutes to execute the function on a set of 10,000 records. (16 minutes for the first pattern, 11 minutes for the second). Elsewhere in the code I process about 20 other RegExp's using a similar flow (iterating through data.each) and they all finish in less than a second. (BTW, I recognize that my RegExp's aren't perfect and may catch some strings that aren't email addresses.)
Is there something about these two RegExp's that is causing the processing time to be so high? I've tried it on seven different data sources all with the same result, so the problem isn't peculiar to my data set.
def remove_email_addresses!(data)
email_patterns = [
/[[:graph:]]+#[[:graph:]]+/i,
/[[:graph:]]+ +at +[^ ][ [[:graph:]]]{0,40} +dot +com/i
]
data.each do |row|
email_patterns.each do |pattern|
row[:title].gsub!(pattern,"") unless row[:title].blank?
row[:description].gsub!(pattern,"") unless row[:description].blank?
end
end
end
Check that your faster code isn't just doing var =~ /blah/ matching, rather than replacement: that is several orders of magnitude faster.
In addition to reducing backtracking and replacing + and * with ranges for safety, as follows...
email_patterns = [
/\b[-_.\w]{1,128}#[-_.\w]{1,128}/i,
/\b[-_.\w]{1,128} {1,10}at {1,10}[^ ][-_.\w ]{0,40} {1,10}dot {1,10}com/i
]
... you could also try "unrolling your loop", though this is unlikely to cause any issues unless there is some kind of interaction between the iterators (which there shouldn't be, but...). That is:
data.each do |row|
row[:title].gsub!(patterns[0],"") unless row[:title].blank?
row[:description].gsub!(patterns[0],"") unless row[:description].blank?
row[:title].gsub!(patterns[1],"") unless row[:title].blank?
row[:description].gsub!(patterns[1],"") unless row[:description].blank?
end
Finally, if this causes little to no speedup, consider profiling with something like ruby-prof to find out whether the regexes themselves are the issue, or whether there's a problem in the do iterator or the unless clauses instead.
Could it be that the data is large enough that it causes issues with paging once read in? If so, might it be faster to read the data in and parse it in chunks of N entries, rather than process the whole lot at once?

Generating a unique and random 6 character long string to represent link in ruby

I am generating a unique and random alphanumeric string segment to represent certain links that will be generated by the users. For doing that I was approaching with "uuid" number to ensure it's uniqueness and randomness, but, as per my requirements the string shouldn't be more than 5 characters long. So I dropped that idea.
Then I decided to generate such a string using random function of ruby and current time stamp.
The code for my random string goes like this:-
temp=DateTime.now
temp=temp + rand(DateTime.now.to_i)
temp= hash.abs.to_s(36)
What I did is that I stored the current DateTime in a temp variable and then I generated a random number passing the current datetime as parameter. Then in the second line actually added current datetime and random number together to make a unique and random string.
Soon I found,while I was testing my application in two different machines and send the request at the same time, it generated the same string(Though it's rare) once after more than 100 trials.
Now I'm thinking that I should add one more parameter like mac address or client ip address before passing to_s(36) on temp variable. But can't figure out how to do it and even then whether it will be unique or nor...
Thanks....
SecureRandom in ruby uses process id (if available) and current time. You can use the urlsafe_base64(n= 16) class method to generate the sequence you need. According to your requirements I think this is your best bet.
Edit: After a bit of testing, I still think that this approach will generate non-unique keys. The way I solved this problem for barcode generation was:
barcode= barcode_sql_id_hash("#{sql_id}#{keyword}")
Here, your keyword can be time + pid.
If you are certain that you will never need more than a given M amount of unique values, and you don't need more than rudimentary protection against guessing the next generated id, you can use a Linear Congruentual Generator to generate your identificators. All you have to do is remember the last id generated, and use that to generate a new one using the following formula:
newid = (A * oldid + B) mod M
If 2³² distinct id values are enough to suit your needs, try:
def generate_id
if #lcg
#lcg = (1664525 * #lcg + 1013904223) % (2**32)
else
#lcg = rand(2**32) # Random seed
end
end
Now just pick a suitable set of characters to represent the id in as little as 6 character. Uppercase and lowercase letters should do the trick, since (26+26)^6 > 2^32:
ENCODE_CHARS = [*?a..?z, *?A..?Z]
def encode(n)
6.times.map { |i|
n, mod = n.divmod(ENCODE_CHARS.size)
ENCODE_CHARS[mod]
}.join
end
Example:
> 10.times { n = generate_id ; puts "%10d = %s" % [n, encode(n)] }
2574974483 = dyhjOg
3636751446 = QxyuDj
368621501 = bBGvYa
1689949688 = yuTgxe
1457610999 = NqzsRd
3936504298 = MPpusk
133820481 = PQLpsa
2956135596 = yvXpOh
3269402651 = VFUhFi
724653758 = knLfVb
Due to the nature of the LCG, the generated id will not repeat until all 2³² values have been used exactly once each.
There is no way you can generate a unique UUID with only five chars, with chars and numbers you have a basic space of around 56 chars, so there is a max of 56^5 combinations , aprox 551 million (Around 2^29).
If with this scheme you were about to generate 10.000 UUIDs (A very low number of UUIDs) you would have a probability of 1/5.000 of generating a collision.
When using crypto, the standard definition of a big enough space to avert collisions is around 2^80.
To put this into perspective, your algorithm would be better off if it generated just a random integer (a 32 bit uint is 2^32, 8 times the size you are proposing) which is clearly a bad idea.

Ruby File IO - Failed to allocate memory

Below is a method which inserts records into the devices database. I am having a problem where I get a 'failed to allocate memory' error.
It is being run on a Windows Mobile device with quite limited memory.
There are 10 models, one is reasonably large with 108,000 records.
The error occurs when executing this line (f.readlines().each do |line|) but it occurs after the largest model has already been inserted.
Is the memory not being released by the block that is iterating through the lines? Or is there something else happening?
Any help on this matter would be greatly appreciated!
def insertRecordsIntoRhom(models)
updateAmount = 45 / models.length
GC.enable
models.each_with_index do |model,i|
csvColumns = Array.new
db = ::Rho::RHO.get_src_db(model)
db.start_transaction
begin
j=0
f = File.new("#{model}.csv")
f.readlines().each do |line|
#extract columns from header line of csv
if j==0
csvColumns = getCsvFieldFromHeader(line)
j+=1
next
end
eval(models[i] + ".create(#{csvPutIntoHash(line,csvColumns)})")
end
f.close
db.commit
rescue
db.rollback
end
end
end
IO#readlines returns an Array, i.e. it reads the whole file and returns a list of all the lines. No line can be garbage collected until you are completely done iterating that list.
Since you only need one line at a time, you should use IO#each_line instead. This will read only a little bit at a time and pass you lines one by one. Once you are done with a line, it can be garbage collected while the rest of the file is being processed.
Finally, note that Ruby comes bundled with a good CSV library, you probably want to use that if you can instead of rolling your own.

Ruby on Rails random number not working

I've been at this for awhile. I am building a simple lottery website and I am generating random tickets. On my local machine random numbers are generated, however, on the server they are duplicated.
I have tried multiple versions of what I have, but the duplicates is the same.
I need to create a random ticket number per ticket and ensure that it hasn't bee created.
This my like 50th version:
a = Account.find(current_account)
numTics = params[:num_tickets].to_i
t = a.tickets.where(:item_id => item.id).count
total = t + numTics
if total > 5
left = 5 - t
flash[:message] = "The total amount of tickets you can purchase per item is five. You can purchase #{left} more tickets."
redirect_to buy_tickets_path(item.id)
else
i = 1
taken = []
random = Random.new
taken.push(random.rand(100.10000000000000))
code = 0
while i <= numTics do
while(true)
code = random.rand(100.10000000000000)
if !taken.include?(code)
taken.push(code)
if Ticket.exists?(:ticket_number => code) == false
a.tickets.build(
:item_id => item.id,
:ticket_number => code
)
a.save
break
end
code = 0
end
end
i = i + 1
end
session['item_name'] = item.name
price = item.price.to_i * 0.05
total = price * numTics
session['amount_due'] = total.to_i
redirect_to confirmation_path
end
You should be using SecureRandom if possible, not Random. It works the same way but is much more random and doesn't need to be initialized like Random does:
SecureRandom.random_number * 100.1
If you're using Ruby 1.8.7 you can try the ActiveSupport::SecureRandom equivalent.
Also if you're generating lottery tickets, you will want to make sure your generator is cryptographically secure. Generating random numbers alone is probably not sufficient. You will likely want to apply some other function to generate these.
Keep in mind that most actual lotteries do not generate random tickets at the point of purchase, but generate large batches in advance and then issue these to purchasers. This means you are able to preview the tickets and ensure they are sufficiently random.
The problem is not with Ruby's pseudo random number generator but that fact that you are creating generators all the time with Random.new. As explained in this answer, you should not have to call Random.new more than once. Store the result in a global object and you'll be good to go.

Resources