Regex password validation for non-sequence data - ruby-on-rails

I'm attempting to validate that my passwords may not contain any sequential characters such as "123" or "abcd" etc. I'm new to regex and i'm trying to do it by having the following code in my User model:
validate :password_complexity
def password_complexity
if password.present? and not password.match(/^(?=.*[a-z])(?=.*[A-Z])(?=.*\d). /)
errors.add :password, "must include at least one lowercase letter, one uppercase letter, and one digit"
end
end

This is probably somewhat outside the bounds of regex. I would recommend looking into zxcvbn, which has also been ported into a ruby gem of the same name and even exists as a devise plugin. This will provide you with significantly more robust verification. You may also consider just inspecting the code for these projects, for inspiration, if you are determined to learn how one might implement something like this from scratch.

I agree that this isn't a job for a regex: maybe you could do it but it would be very complicated. Here's how i would do it.
def contains_sequential_characters?(string)
sequential = false
string.split("").each_with_index do |char,i|
if (string[i+1] == string[i]+1) && char =~ /[a-y0-8]/i
sequential = true
end
end
sequential
end
At the heart of this solution is that calling an array index on a string (eg "foo"[2]) will give you the ascii code of the character at that position. We only care about ascii codes that are next to each other between a-z and 0-9, which is why i'm also matching the first character against a-y (not z) and 0-8 (not 9).

You can use the below regex, which will accept a minimum of 13 digits and will not accept either sequential digits or sequential characters:
(?=.*?[0-9])(?!.*((12)|(23)|(34)|(45)|(56)|(67)|(78)|(90)|(01)))(?=.*?[a-zäöüß])(?!.*((ab)|(bc)|(cd)|(ef)|(fg)|(gh)|(hi)|(ij)|(jk)|(kl)|(lm)|(mn)|(no)|(op)|(pq)|(qr)|(rs)|(st)|(tu)|(uv)|(vw)|(wx)|(xy)|(yz)))(?=.*?[A-ZÄÖÜ])(?!.*((AB)|(BC)|(CD)|(EF)|(FG)|(GH)|(HI)|(IJ)|(JK)|(KL)|(LM)|(MN)|(NO)|(OP)|(PQ)|(QR)|(RS)|(ST)|(TU)|(UV)|(VW)|(WX)|(xy)|(yz))).{13,}

Related

rails validation with regex

I'm reading agile web development with rails 6.
In chapter 7, Task B: validation and unite testing
class Product < ApplicationRecord
validates :image_url, allow_blank: true, format: {
with: %r{\.(gif|jpg|png)\z}i,
}
what does the i mean in the end here?
It should mean that it's ending with .git or .jpg or .png
The i in your query tells the regex to match using a case insensitive match. There is nothing really unique to rails here so you may want to look into regexes in general to learn all the different terms you can use to modify your expression.
The expression %r{\.(gif|jpg|png)\z}i is equivalent to /\.(gif|jpg|png)\z/i
the \. means the period character
the | is an or as you stated
the \z is end of string with some caveats that you can read more about here: http://www.regular-expressions.info/anchors.html
and the i is incentive case matching
This means you would match 'test.jpg', 'test.JPg', 'test.JPG' or any permutation of those three characters in any case preceded by a period that occurs at the end of the string.
Here are the docs for regex formats in ruby specific:
https://ruby-doc.org/2.7.7/Regexp.html
And here is something where you can play with and learn regexes in general and try some expressions yourself:
https://regexr.com
short explain:
The "i" at the end of the regular expression is a modifier that makes the expression case-insensitive. This means that it will match both upper and lowercase letters in the image URL.

Specifying minimum length when using FFaker::Internet.user_name

I have a spec that keeps failing because FFaker::Internet.user_name generates a word that is less than 5 characters.
How do I specify a minimum length in this stmt:
username { FFaker::Internet.user_name }
String#ljust — Returns a copy of self of a given length, right-padded with a given other string.
username { FFaker::Internet.user_name.ljust(5,"12345") }
From what I see you try to use FFaker in your factory. Why overcomplicate things for your specs, when you could define sequence
sequence(:username) do |n|
"username-#{n}"
end
But the question is valid and you may have some legitimate needs to use ffaker, and there are many ways to do it. You can just concatenate username twice, why not?
username { FFaker.username + FFaker.username }
Or keep looking for a username that length is of minimal lenght:
username do
do
name = FFaker.username
while name.length < 5
name
end
Or monkeypatch ffaker and implement it yourself https://github.com/ffaker/ffaker/blob/0578f7fd31c9b485e9c6fa15f25b3eca724790fe/lib/ffaker/internet.rb#L43 + https://github.com/ffaker/ffaker/blob/0578f7fd31c9b485e9c6fa15f25b3eca724790fe/lib/ffaker/name.rb#L75
for example
class FFaker
def long_username(min_length = 5)
fetch_sample(FIRST_NAMES.select {|name| name.length >= min_lenght })
end
end
There're many ways you can achieve this, but if I had to do it, I'd do something like
(FFaker::Internet.user_name + '___')[0...5]
#=> "Lily_"
There are three underscores because after the quick lookup to the name list, I found the minimum length of first name is two characters so two plus three will always be at least five characters.
I'm only taking five character substring so as to not always have trailing underscore, but that's just my personal preference, you can use username plus three underscores and your test case will do fine.
You can't, but you could do FFaker::Name.name.join, this generates first name and middle name
You can also build it manually with regex using FFaker::String like so:
# 5 characters username, with first character being a letter
username { FFaker::String.from_regexp(/[a-zA-Z]\w{4}/) }
# random 5-10 characters username with first character being a letter
regex = Regexp.new("[a-zA-Z]\\w{#{Random.rand(4..9)}}")
username { FFaker::String.from_regexp(regex) }
Another idea is to add some random numbers to the end. This worked for me, was easy to implement, and looks fairly natural (since humans tend to do it when creating usernames in the wild).
E.g. this adds "1234" to the end of usernames:
"steve" + "1234"
and with Faker:
Faker::Internet.unique.user_name + "1234"
Note: if you want a random string (instead of "1234", try some of these approaches. However, it may not be necessary if you're already using the Faker .unique method as in the above example.

Formfield should have 4 numbers followed by 2 letters in Rspec

I am writing an application where a user could fill in their zipcode. In the Netherlands a zipcode has a format of 4 times a number, followed by 2 letters. For example, 1234AB.
In my test I have written so far:
before(:each) do
#zipcode = Zipcode.new
#zipcode.zipcode = "1234AB"
#zipcode.house_number = 2
end
it "should have a valid zipcode" do
#zipcode.zipcode.should_not be_empty
#zipcode.zipcode.should be_a(String)
#zipcode.zipcode.length.should == 6
end
How ca I write the test that it checks if there are 4 numbers followed by 2 letters? And how should I write that in code it self?
You will want to use a Regex (regular expression). For your example, it would be quite simple:
/\A[0-9]{4}[A-Z]{2}\z/
which means: Line start, 4 times a digit, 2 times a uppercase letter, line end.
(I know this might not look simple if you're not familiar with the concept, but just read up on it and it will become clear very quickly. I recommend to read up on regexes in a Ruby context to know about all the quirks of Ruby regular expressions if you don't already.)
To use this regex inside your spec, I think you can use the following:
#zipcode.zipcode.should match(/\A[0-9]{4}[A-Z]{2}\z/)
To use it to actually validate this format inside your model, use this:
validates :zipcode, format: /\A[0-9]{4}[A-Z]{2}\z/

Conditional Regular Expression testing of a CSV

I am doing some client side validation in ASP.NET MVC and I found myself trying to do conditional validation on a set of items (ie, if the checkbox is checked then validate and visa versa). This was problematic, to say the least.
To get around this, I figured that I could "cheat" by having a hidden element that would contain all of the information for each set, thus the idea of a CSV string containing this information.
I already use a custom [HiddenRequired] attribute to validate if the hidden input contains a value, with success, but I thought as I will need to validate each piece of data in the csv, that a regular expression would solve this.
My regular expression work is extremely weak and after a good 2 hours I've almost given up.
This is an example of the csv string:
true,3,24,over,0.5
to explain:
true denotes if I should validate the rest. I need to conditionally switch in the regex using this
3 and 24 are integers and will only ever fall in the range 0-24.
over is a string and will either be over or under
0.5 is a decimal value, of unknown precision.
In the validation, all values should be present and at least of the correct type
Is there someone who can either provide such a regex or at least provide some hints, i'm really stuck!
Try this regex:
#"^(true,([01]?\d|2[0-4]),([01]?\d|2[0-4]),(over|under),\d+\.?\d+|false.*)$"
I'll try to explain it using comments. Feel free to ask if anything is unclear. =)
#"
^ # start of line
(
true, # literal true
([01]?\d # Either 0, 1, or nothing followed by a digit
| # or
2[0-4]), # 20 - 24
([01]?\d|2[0-4]), # again
(over|under), # over or under
\d+\.?\d+ # any number of digits, optional dot, any number of digits
| #... OR ...
false.* # false followed by anything
)
$ # end of line
");
I would probably use a Split(',') and validate elements of the resulting array instead of using a regex. Also you should watch out for the \, case (the comma is part of the value).

How to make a Ruby string safe for a filesystem?

I have user entries as filenames. Of course this is not a good idea, so I want to drop everything except [a-z], [A-Z], [0-9], _ and -.
For instance:
my§document$is°° very&interesting___thisIs%nice445.doc.pdf
should become
my_document_is_____very_interesting___thisIs_nice445_doc.pdf
and then ideally
my_document_is_very_interesting_thisIs_nice445_doc.pdf
Is there a nice and elegant way for doing this?
I'd like to suggest a solution that differs from the old one. Note that the old one uses the deprecated returning. By the way, it's anyway specific to Rails, and you didn't explicitly mention Rails in your question (only as a tag). Also, the existing solution fails to encode .doc.pdf into _doc.pdf, as you requested. And, of course, it doesn't collapse the underscores into one.
Here's my solution:
def sanitize_filename(filename)
# Split the name when finding a period which is preceded by some
# character, and is followed by some character other than a period,
# if there is no following period that is followed by something
# other than a period (yeah, confusing, I know)
fn = filename.split /(?<=.)\.(?=[^.])(?!.*\.[^.])/m
# We now have one or two parts (depending on whether we could find
# a suitable period). For each of these parts, replace any unwanted
# sequence of characters with an underscore
fn.map! { |s| s.gsub /[^a-z0-9\-]+/i, '_' }
# Finally, join the parts with a period and return the result
return fn.join '.'
end
You haven't specified all the details about the conversion. Thus, I'm making the following assumptions:
There should be at most one filename extension, which means that there should be at most one period in the filename
Trailing periods do not mark the start of an extension
Leading periods do not mark the start of an extension
Any sequence of characters beyond A–Z, a–z, 0–9 and - should be collapsed into a single _ (i.e. underscore is itself regarded as a disallowed character, and the string '$%__°#' would become '_' – rather than '___' from the parts '$%', '__' and '°#')
The complicated part of this is where I split the filename into the main part and extension. With the help of a regular expression, I'm searching for the last period, which is followed by something else than a period, so that there are no following periods matching the same criteria in the string. It must, however, be preceded by some character to make sure it's not the first character in the string.
My results from testing the function:
1.9.3p125 :006 > sanitize_filename 'my§document$is°° very&interesting___thisIs%nice445.doc.pdf'
=> "my_document_is_very_interesting_thisIs_nice445_doc.pdf"
which I think is what you requested. I hope this is nice and elegant enough.
From http://web.archive.org/web/20110529023841/http://devblog.muziboo.com/2008/06/17/attachment-fu-sanitize-filename-regex-and-unicode-gotcha/:
def sanitize_filename(filename)
returning filename.strip do |name|
# NOTE: File.basename doesn't work right with Windows paths on Unix
# get only the filename, not the whole path
name.gsub!(/^.*(\\|\/)/, '')
# Strip out the non-ascii character
name.gsub!(/[^0-9A-Za-z.\-]/, '_')
end
end
In Rails you might also be able to use ActiveStorage::Filename#sanitized:
ActiveStorage::Filename.new("foo:bar.jpg").sanitized # => "foo-bar.jpg"
ActiveStorage::Filename.new("foo/bar.jpg").sanitized # => "foo-bar.jpg"
If you use Rails you can also use String#parameterize. This is not particularly intended for that, but you will obtain a satisfying result.
"my§document$is°° very&interesting___thisIs%nice445.doc.pdf".parameterize
For Rails I found myself wanting to keep any file extensions but using parameterize for the remainder of the characters:
filename = "my§doc$is°° very&itng___thsIs%nie445.doc.pdf"
cleaned = filename.split(".").map(&:parameterize).join(".")
Implementation details and ideas see source: https://github.com/rails/rails/blob/master/activesupport/lib/active_support/inflector/transliterate.rb
def parameterize(string, separator: "-", preserve_case: false)
# Turn unwanted chars into the separator.
parameterized_string.gsub!(/[^a-z0-9\-_]+/i, separator)
#... some more stuff
end
If your goal is just to generate a filename that is "safe" to use on all operating systems (and not to remove any and all non-ASCII characters), then I would recommend the zaru gem. It doesn't do everything the original question specifies, but the filename produced should be safe to use (and still keep any filename-safe unicode characters untouched):
Zaru.sanitize! " what\ēver//wëird:user:înput:"
# => "whatēverwëirduserînput"
Zaru.sanitize! "my§docu*ment$is°° very&interes:ting___thisIs%nice445.doc.pdf"
# => "my§document$is°° very&interesting___thisIs%nice445.doc.pdf"
There is a library that may be helpful, especially if you're interested in replacing weird Unicode characters with ASCII: unidecode.
irb(main):001:0> require 'unidecoder'
=> true
irb(main):004:0> "Grzegżółka".to_ascii
=> "Grzegzolka"

Resources