Ruby on Rails: Is generating obfuscated and unique identifiers for users on a website using SecureRandom safe and reasonable? - ruby-on-rails

Internally, my website stores Users in a database indexed by an integer primary key.
However, I'd like to associate Users with a number of unique, difficult-to-guess identifiers that will be each used in various circumstance. Examples:
One for a user profile URL: So a User can be found and displayed by a URL that does not include their actual primary key, preventing the profiles from being scraped.
One for a no-login email unsubscribe form: So a user can change their email preferences by clicking through a link in the email without having to login, preventing other people from being able to easily guess the URL and tamper with their email preferences.
As I see it, the key characteristics I'll need for these identifiers is that they are not easily guessed, that they are unique, and that knowing the key or identifier will not make it easy to find the other.
In light of that, I was thinking about using SecureRandom::urlsafe_base64 to generate multiple random identifiers whenever a new user is created, one for each purpose. As they are random, I would need to do database checks before insertion in order to guarantee uniqueness.
Could anyone provide a sanity check and confirm that this is a reasonable approach?

The method you are using is using a secure random generator, so guessing the next URL even knowing one of them will be hard. When generating random sequences, this is a key aspect to keep in mind: non-secure random generators can become predictable, and having one value can help predict what the next one would be. You are probably OK on this one.
Also, urlsafe_base64 says in its documentation that the default random length is 16 bytes. This gives you 816 different possible values (2.81474977 × 1014). This is not a huge number. For example, it means that a scraper doing 10.000 request a second will be able to try all possible identifiers in about 900 years. It seems acceptable for now, but computers are becoming faster and faster, and depending on the scale of your application this could be a problem in the future. Just making the first parameter bigger can solve this issue though.
Lastly, something that you should definitely consider: the possibility for your database to be leaked. Even if your identifiers are bullet proof, your database might not be and an attacker might be able to get a list of all identifiers. You should definitely hash the identifiers in the database with a secure hashing algorithm (with appropriate salts, the same you would do for a password). Just to give you an idea on how important this is, with a recent GPU, SHA-1 can be brute forced at a rate of 350.000.000 tries per second. A 16 bytes key (the default for the method you are using) hashed using SHA-1 would be guessed in about 9 days.
In summary: the algorithm is good enough, but increase the length of keys and hash them in the database.

Because the generated ids will not be related to any other data, they are going to be very hard (impossible) to guess. To quickly validate there uniqueness and find users, you'll have to index them in the DB.
You'll also need to write a function that returns a unique id checking the uniqueness, something like:
def generate_id(field_name)
found = false
while not found
rnd = SecureRandom.urlsafe_base64
found = User.exists?(field_name: rnd)
end
rnd
end
Last security check, try to check the correspondance between an identifier and the user information before doing any changes, at least the email.
That said, it seems a good approach to me.

Related

Creating Unique Access Codes Per Email Address in Rails

I'm looking to create a system for a classified ads-type site that allows users to create ad postings without going through any kind of account registration process. I want to have a unique access code associated with each email address that users use to make posts. This access code will later be used by users to gain access to the set of posts that they've made in the past.
So these access codes should be not only unique but also secure / unguessable. Any suggestions for what I can look into for implementing this with Ruby on Rails? I haven't been able to find much in researching the topic - most related discussion seems to be around encrypting passwords, hashing, etc, so any general direction is appreciated.
Thanks!
SecureRandom.hex(n=nil) click to toggle source ::hex generates a random hex string.
The argument n specifies the length of the random length. The length
of the result string is twice of n.
If n is not specified, 16 is assumed. It may be larger in future.
The result may contain 0-9 and a-f.
p SecureRandom.hex #=> "eb693ec8252cd630102fd0d0fb7c3485"
p SecureRandom.hex #=> "91dc3bfb4de5b11d029d376634589b61"
You can generate a hash with a salt to create an identifier for each email address. You should make sure two people cannot get the same hash.
It's worth mentioning that the length of this random and unique acces code will be much longer and harder to remember than a username and password.

Mongodb: Is it a good idea to create a unique index on web URLs?

My document looks like:
{"url": "http://some-random-url.com/path/to/article"
"likes": 10
}
The url needs to be unique. Is it a good idea to have a unique index on the url? The URL can be long, resulting in larger index size, more memory footprint, and slower overall performance. Is it a good idea to generate a hash from the url (i am thinking about using murmur3) and create a unique index on that instead. I am assuming that the chances of collision are pretty low, as described here: https://softwareengineering.stackexchange.com/questions/49550/which-hashing-algorithm-is-best-for-uniqueness-and-speed
Does anyone see any drawbacks to this approach? The new document will look like (with a unique index on u_hash instead of url):
{"url": "http://some-random-url.com/path/to/article"
"likes": 10
"u_hash": "<murmur3 hash of url>"
}
UPDATE
I will not be doing regex queries on the url. Will be doing only a complete URL look up. I am more concerned about the performance of this look up, as I believe it will also be used internally by mongodb to maintain unique index, and hence affecting write performance as well (+ longer index). Additionally, my understanding is that mongobd doesn't perform well for long text indexes, as it wasn't designed for that purpose. I may be wrong though, and it could only depend on whether or not that index fits into RAM. Any pointers?
I'd like to expand on the answer of #AlexRyan. While he is right in general, there are some things which need to be taken into consideration for this use case.
First of all, we have to differentiate between a unique index and the _id field.
When the URL needs to be unique in your use case, there has to be a unique index. What we have to decide is wether to use the URL itself or a hashed value of it. The hashing itself would not help with the search, as the hash sum saved in a field would be treated as a string by MongoDB. It may safe space (URLs may be shorter than their hash value), hereby reducing the memory needed for the index. However, doing so takes away the possibility to search for parts of the URL in the index, for example with
db.collection.find({url:{$regex:/stackoverflow/}})
With a unique index on url, this query would use an index, which will be quite fast. Without such (unique) index, this query will result in a comparably slow collection scan.
Plus, creating the hash each and every time before querying, updating or inserting doesn't make these operations faster.
This leaves us with the fact that creating a hash sum and a unique index on it may save some RAM at the cost of making queries on the actual field slower at orders of magnitude. And it introduces the need of creating a hash sum each and every time. Having a index on both the URL and it's hashed value would not make sense at all.
Now to the question wether it is a good idea to use URL as _id one way or the other. Since URLs usually are distinct by nature (they are supposed to return the same content) and the likes are related to that uniqueness, I would tend to use the URL as the id. Since you need the unique index on _id anyway, it serves two purposes here: you have your id for the document, you ensure uniqueness of the URL and - in case you use the natural representation of the URL - it will even be queryable in an efficient way.
Use a unique index on url
db.interwebs.ensureIndex({ "url" : 1}, { "unique" : 1 })
and not a hashed index. Hashed indexes in MongoDB are meant to be used for hashed shard keys and not for unique constraints. From the hashed index docs,
Hashed indexes support sharding a collection using a hashed shard key. Using a hashed shard key to shard a collection ensures a more even distribution of data.
and
You may not create compound indexes that have hashed index fields or specify a unique constraint on a hashed index
If url needs to be unique and you will use it to look up documents, it's absolutely worth having a unique index on url. If you want to use url as the primary key for documents, you can store the url value in the _id field. This field is normally a driver-generated ObjectId but it can be any value you like. There's always a unique index on _id in a MongoDB collection so you get the unique index "for free".
I think the answer is "it depends".
Choosing keys that have no real world meaning embedded in them may save you pain in the future. This is especially true if you decide you need to change it but you have a lot of foreign keys referencing it.
Most database management systems offer you a way to generate unique IDs.
In Oracle, you might use a sequence.
In MySQL you might use AUTO_INCREMENT when you define the table itself.
The way that mongodb assigns unique ids to documents is different than in relational databases. They use ObjectIDs for this purpose.
One of the interesting things about ObjectIDs is that they are generated by the driver.
Because of the algorithm that is used to generate them, they are guaranteed to be unique even if you have a large cluster of app and database servers.
You can learn more about them here:
http://docs.mongodb.org/manual/reference/object-id/
A lot of engineering work has gone into ensuring that ObjectIds unique.
I use them by default unless there is a really good reason not to.
So far, I have not found a really good reason to not use them.

Persist data between requests invisible from the user

Here's my problem: I want to show the user a hash from a random number, then that user has to guess if the number is higher or lower than a certain treshold. After the game, I want to show him the number so he can verify that the number has not changed during play.
But how do I do that? I have thought about using Redis to store the random number and the user id in there, but it seems like there are easier solutions.
Thanks!
Edit: In the end I chose Redis because it's server side only for maximum security and I may find some additional uses for it.
You could use a "signed" cookie. This prevents the user to "look" at it, and gives no overhead
See http://api.rubyonrails.org/classes/ActionDispatch/Cookies.html
To write:
cookies.signed[:guess] = guess
To read:
cookies.signed[:guess]

Questions about implementing surrogate key in Ruby on Rails

For an upcoming project we need to have unique real world identifiers that are exposed to users for things like Account Numbers or Case Numbers (like a bug tracking ID). These will always be system generated and unchangeable. Right now we plan to run strictly on Heroku.
While (as my name would suggest) I am new to the wonderfulness that is Ruby on Rails, I have a long background in enterprise application development. I'm trying to bridge between what I have done in the past while doing in the "RoR way"
Obviously RoR has wonderful primary key support. I have read dozens of posts here recommending to adapt business requirements to just use the out of the box id/key methodology.
So let me describe what I am trying to accomplish and please let me know if you have faced similar objectives and what approach you took.
1) Would like to have a human readable key with a consistent length. There is value in always having an Account ID or Transaction ID that is the same length (for form validation, training sales staff, etc.) Using Ruby's innate key generation one could just add buffer characters (e.g. 100000 instead of 1).
2) Compactness: My initial plan was to go with a base 36 unique key (e.g. 36 values [0..9],[a..z]). As part of our API/interface we plan on exposing certain non-confidential objects based on a shortform URL (e.g. xx.co/000001). I like the idea of being able to have a five character identifier in base 36 vs. 7+ in decimal.
So I can think of two possible approaches:
a) add my own field and develop my own unique key generator (or maybe someone will point me to one).
b) Pad leading digits (and I assume I can force the unique key generation to start at 1xxxxxxx rather than 0000001). Then use the to_s(36) method to convert it to and from base 36 for all interactions with humans. Maybe even store the actual ID value in the database in the base 36 format to avoid ongoing conversions, but always do the conversion before a query to avoid the need to have another index.
I'm leaning towards approach B, as it seems like it would be optimal from a DB performance standpoint and that it would require the least investment in non-value added overhead. Once again, any real world experience with these topics and thoughts on the best approach would be greatly appreciated.
Thanks in advance!
I would never use the primary key in a Rails table for anything of business importance. There will come a day when someone on the business end will want to change it, and it'll end up being an enormous pain in the butt and will invalidate a bunch of URLs you and your users thought were canonical and will mess up all your foreign keys and blah blah blah. It's just a really bad idea and I would encourage you not to do it.
The Rails way to do this is have a new column, called something like number or bug_tracking_number or whatever strikes your fancy, and before_validation implement a callback that gives it a value. This is where you can let your creativity shine; something like this sounds like what you want:
before_validation( :on => :create ) do
self.number = CaseNumber.count + 1
end
You can pad the number there, ensure its uniqueness, or do whatever else you want.

Unique Identifiers that are User-Friendly and Hard to Guess

My team is working on an application with a legacy database that uses two different values as unique identifiers for a Group object: Id is an auto-incrementing Identity column whose value is determined by the database upon insertion. GroupCode is determined by the application after insertion, and is "Group" + theGroup.Id.
What we need is an algorithm to generate GroupCode's that:
Are unique.
Are reasonably easy for a user to type in correctly.
Are difficult for a hacker to guess.
Are either created by the database upon insertion, or are created by the app before the insertion (i.e. not dependent on the identity column).
The existing solution meets the first two criteria, but not the last two. Does anyone know of a good solution to meet all of the above criteria?
One more note: Even though this code is used externally by users, and even though Id would make a better identifier for other tables to link their foreign keys to, the GroupCode is used by other tables to refer to a specific Group.
Thanks in advance.
Would it be possible to add a new column? It could consist of the Identity and a random 32-bit number.
That 64 bit number could then be translated to a «Memorable Random String». It wouldn't be perfect security wise but could be good enough.
Here's an example using Ruby and the Koremutake gem.
require 'koremu'
# http://pastie.org/96316 adds Array.chunk
identity=104711
r=rand(2**32)<<32 # in this example 5946631977955229696
ka = KoremuFixnum.new(r+identity).to_ka.chunk(3)
ka.each {|arr| print KoremuArray.new(arr).to_ks + " "}
Result:
TUSADA REGRUMI LEBADE
Also check out Phonetically Memorable Password Generation Algorithms.
Have you looked into Base32/Base36 content encoding? Base32 representation of a Identity seed column will make it unique, easy to enter but definitely not secure. However most non-programmers will have no idea how the string value is generated.
Also using Base32/36 you can maintain normal database integer based primary keys.

Resources