Rails ActiveRecord validation: maximum length of a string? - ruby-on-rails

I have these string / text fields in my database migration file:
t.string :author
t.string :title
t.string :summary
t.text :content
t.string :link
And these are my questions:
Every string / text attribute should have a maximum length validation for both purposes, security (if you don't want to receive a few MB of text input) and database (if string = varchar, mysql has a 255 characters limit). Is that right or is there any reason not to have a maximum length validation for totally every string / text attribute in the database?
If I don't care about the exact length of author and title as long, as they are not too long to be stored as strings, should I set a maximum length to 255 for each of those?
If the maximum possible length of URL is about 2000 characters, is it safe to store links as strings, and not as texts? Should I be validating a maximum length of the link attribute if I am already validating its format using regexp?
Should a content (text) attribute have a maximum length just to protect the database from the input of an unlimited length? For example, is setting a maximum length of a text field to 100,000 characters reasonable, or is this totally pointless and inefficient?
I understand, that these questions might seem unimportant to some people, but still – that's a validation of input, which is required for any application, – and I think it's worth to be rather paranoid here.

The question is great, and perhaps people with more knowledge of rails/mysql internals will be able to expand more.
1) Having any validation in the model depends where you want the failure to happen in case it exceeds the limit. The model is the best option since most likely it will cover most objects using the model.
Other alternative is simply limiting form fields using maxlength attribute.
The first option does not work for optional fields.
2) I am not aware of any rule of thumb. Use whatever you know is the longest and make it a bit bigger.
3) My rule is that anything above 255 is text. You can find more info on this Here
4) If the column holds the same content - there might be value in that. Some use cases might have different maxlength depending on content type or user.
All of the above is also affected by how strict data validation requirements are in the project.

Related

Is there ever reason to use string data type when using rails 6 with postgres database?

I see here that :text data type seems to perform exactly the same as :string
This comment particularly:
PostgreSQL implementation prefers text. The only difference for pg string/text is constraint on length for string. No performance differences.
Is there any good reason to ever use :string when making a rails app using a postgresql database?
No difference in performance, difference in semantics.
Several libraries (for example simpleform) look at the data type of the field in the database and perform differently depending on it. Simple form will add a number input if it's a number, a checkbox if it's a boolean and so on. For this case, it will add a single line text field for string and a multiline text box for text.
Since there is no difference in performance, you can use either, but it's still useful to denote semantics.

Bean Validation Min/Max wrong message

I'm using Min/Max Beanvalidation. Here is an example:
#Min(value = 100, message="too low")
#Max(value = 1000, message="too high")
private Integer example;
If i enter 99 i get the correct message "too low". If i enter 1001 i also get the correct message "too high". If i enter a very high number e.g. 10000000000 i get a generic message which i found out is this one: javax.faces.converter.BigIntegerConverter.BIGINTEGER={2}. So i suspect that if the user enters a number which is larger then the actual field type, he will get another message.
This is actually not what i want to achieve. I always want to show the user "too high". Is there a way to achieve this?
There are really two things going on, conversion and validation. In a first step JSF needs to take your string input and convert it to a number. This is where you get the error. Your value cannot be converted to an Integer. If conversions works, JSF populates your model and that's where validation kicks in. If validation then fails you get the defined Bean Validation messages. So what can you do:
Configure the JSF message for javax.faces.converter.BigIntegerConverter.BIGINTEGER={2} to be more descriptive
Change the datatype, for example use BigInteger. In this case the conversion from string to number will work
Use string in the bean and validate the string. You probably need then to convert to a number at a different point though, but that depends on you use case.
The maximum for Integer in java is 2^31 which is just over 2.1 billion. The input you used, 10 billion, is then beyond the maximum of an integer and would overflow the field, so it is not a valid given the field type, regardless of any validation you may have in place. you could switch the field type to be a BigInteger, then override the default validation messages to fit your needs, but that may be overkill given the purpose of your question. You can also have custom messages
Why not just limit the amount of characters in the inputfield in the frontend, for example
<h:inputText maxlength="4"/>
I'd guess it's possible to bypass if you really want, but I would'nt worry too much about the usability for someone hacking the site :-)

Big Integers and Custom Validation

I'm somewhat new to Rails and I'm trying to learn about custom validations.
One common requirement in Brazil are CPF/CNPJ/RG fields. They are a type of identification number and follow a specific format.
For example:
CPFs are 11 digit numbers. They follow this pattern: xxx.xxx.xxx-xx
I'm trying to store them in an Integer field but I'm getting (Using Postgres):
PG::Error: ERROR: value "xxxxxxxxxxx" is out of range for type
integer
What is the proper way to store this? Bigint (How?)? A string?
My second question is:
How can I specify a custom validation (a method) for this field that could be called somewhat like this:
class User < AR::Base
validates :cpf, presence: true, unique: true, cpf: true
Assuming performance is not critical, strings are fine. That way you can keep the dots and dashes. As mentioned by others in this thread, bigint or numeric may be far more performant if that's a concern.
If you keep the field a string, you can easily validate it with regex:
validates_format_of :cpf, with: /^[0-9]{3}\.[0-9]{3}\.[0-9]{3}\-[0-9]{2}$/
For small tables, just store as text to preserve the format.
For big tables, performance and storage size may be an issue. If your pattern is guaranteed, you may very well store the number as bigint and format it on retrieval with to_char():
Write:
SELECT translate('111.222.333-55', '.-', '')::bigint
This also serves as partial validation. Only digits, . and - are allowed in your string. The pattern might still be violated, you have to check explicitly with something like #Michael provided.
Read:
SELECT to_char(11122233355, 'FM000"."000"."000"-"00')
Returns:
111.222.333-55
Don't forget the leading FM in the pattern to remove the leading whitespace (where a negative sign might go for numbers).
A bigint occupies 8 bytes on disk and can easily store 11-digit numbers.
text (or varchar) need 1 byte plus the actual string, which amounts to 15 bytes in your case.
Plus, processing bigint is generally a bit faster than processing text of equal length..
Personally I would always store these values as bigint and apply formatting on input/output (as Erwin suggests) or in the application.
The main reasons are storage efficiency (as Erwin mentions) and efficiency of comparison. When you compare 11111111112 to 11111111113 as text, PostgreSQL will use language-specific collation rules that are correct for text, but may not be what you want for numbers. They're also slow; a recent question on SO reported a five-fold speed-up in text comparisons by using the COLLATE "C" option to force plain POSIX collations; numeric collations are faster again.
Most of these standard numbers have their own internal check-sums, often a variant of the Luhn algorithm. Validating these in a CHECK constraint is likely to be a good idea. You can implement a Luhn algorithm check against integers pretty easily in PL/PgSQL or plain SQL; I wrote some samples on the PostgreSQL wiki.
Whatever you do, make sure you have a CHECK constraint on the column that validates the number on storage, so you don't get invalid and nonsensical values stored.

floating point precision in ruby on rails model validations

I am trying to validate a dollar amount using a regex:
^[0-9]+\.[0-9]{2}$
This works fine, but whenever a user submits the form and the dollar amount ends in 0(zero), ruby(or rails?) chops the 0 off.
So 500.00 turns into 500.0 thus failing the regex validation.
Is there any way to make ruby/rails keep the format entered by the user, regardless of trailing zeros?
I presume your dollar amount is of decimal type. So, any value user enters in the field is being cast from string to appropriate type before saving to the database. Validation applies to the values already converted to numeric types, so regex is not really a suitable validation filter in your case.
You have couple of possibilities to solve this, though:
Use validates_numericality_of. That way you leave the conversion completely to Rails, and just check whether the amount is within a given range.
Use validate_each method and code your validation logic yourself (e.g. check whether the value has more than 2 decimal digits).
Validate the attribute before it's been typecasted:
This is especially useful in
validation situations where the user
might supply a string for an integer
field and you want to display the
original string back in an error
message. Accessing the attribute
normally would typecast the string to
0, which isn‘t what you want.
So, in your case, you should be able to use:
validates_format_of :amount_before_type_cast, :with => /^[0-9]+\.[0-9]{2}$/, :message => "must contain dollars and cents, seperated by a period"
Note, however, that users might find it tedious to follow your rigid entry rules (I would really prefer being able to type 500 instead 500.00, for example), and that in some locales period is not a decimal separator (if you ever plan to internationalize your app).
In general if you wish to “remember” the decimal precision of a floating point value, you should use a decimal type, not a binary float.
On the other hand, I'm not certain why you would wish to force the string representation in such a strict manner… How about accepting any number and formatting it with e.g. number_to_currency?
Usually with money it's best to store it as an integer in cents (500 cents is $5.00). I use the Money gem to handle this.

How should I present a cost field to the user, and store it in the database?

Right now I have two fields for cost. One for dollars and one for cents. This works, but it is a bit ugly. It also doesn't allow the user to enter the term "free" or "no cost" if they want. But if I only have one field, I might have to make my parser a bit smarter. What do you think?
On the server side, I combine dollars and cents to store them as decimals in my database. Mainly so that I can gather statistics (cost averages, etc.) quickly.
Do you think it is better to store the cost as a string? Then whenever I actually use the cost for stats or other purposes, I would convert it to a decimal at that point. Or am I on the right track?
There is a rule in database design that states that "atomic data" should not be split. By this rule a price, or cost is such an example of atomic data and therefore it should never be split among multiple columns just like you shouldn't split a phone number among multiple columns (unless you really have a very good reason for it - very rare)
Use a DECIMAL data type. Something like DECIMAL(8,3) should work and it's supported by all ANSI SQL compliant database products!
You can consult Joe Celko's "Thinking In Sets" book for a discussion of this topic. See section 1.6.2, pages 21-22.
EDIT -
It seems from your question that you are also concerned with how to accept user's input in a form that resembles the price (xxxx.xx) - hence the two input boxes, for the whole dollars, and the pennies.
I recommend using a single input box and then doing input validation using Regular Expressions to match your format (i.e. something like [0-9]+(.[0-9]{1,3})? would probably work but could be improved). You could then parse the validated string to a Decimal type in your language, or just pass it as a string into your database - SQL will know how to cast it to a DECIMAL type.
Keep the whole cost as decimal. If it's free, then keep the cost as 0. In presentation if cost is zero - write "free" instead of 0.
I generally store the cost as the lowest unit (pennies) and then convert it to whole dollars later.
So a cost of $4.50 gets stored as 450. Free items would be -1 pennies. You could store free things as 0 pennies as well, this gives you the flexibility to use 0 and -1 to mean two slightly different things (free vs no sale?).
It also makes it easier to support countries that don't use cents if you choose to go that route.
As for presenting the data entry field, I personally don't like it when I have to keep switching fields for tiny things (like when they break up phone numbers into 3 fields, or IP addresses into 4). I'd present one field, and let the users type the decimal point in themselves. That way, your users don't have to tab (or click, if they are unfamiliar with tab) to the next field.
Use cents, use 450 for $4.50 this will save you problems that are arising very often
from the fact that floating point operations are not safe. Just try the following expression in irb:
0.4 - 0.3 == 0.1 will return false. All because of floating point representation
innacuracies.
In my models I'm always using:
attr_accessor :price_with_cents
def price_with_cents
self.price/100.00
end
def price\_with\_cents==(num)
self.price = (num.to_f * 100.00).to_i
end
And the name of column is just price and integer type.
I don't have much experience with decimal columns and their representation in ruby (which can be float that is problematic as i've shown at the begining).
Don't allow garbage to make it to your database. If you're expecting a dollar amount on a field, than make sure it's valid before it gets in there. This will allow you to report better on the data and allow simpler formatting on output.
I suggest making this a single field with validation on update or insert.
if field != SpecialFreeTag then
try to convert to decimal
if fail then report to user
otherwise accept value
Use try parse or regular expressions to help with the validation.
I would store the cost as decimal with the scale being no less than 2 and maybe even 3-5. If something is bought in bulk the unit cost could easily include fractions of a cent. Free items have a cost of 0. If the cost is unknown then allow null values also.

Resources