The Ruby %r{ } expression - ruby-on-rails

In a model there is a field
validates :image_file_name, :format => { :with => %r{\.(gif|jpg|jpeg|png)$}i
It looks pretty odd for me. I am aware that this is a regular expression. But I would like:
to know what exactly it means. Is %r{value} equal to /value/ ?
be able to replace it with normal Ruby regex operator /some regex/ or =~. Is this possible?

%r{} is equivalent to the /.../ notation, but allows you to have '/' in your regexp without having to escape them:
%r{/home/user}
is equivalent to:
/\/home\/user/
This is only a syntax commodity, for legibility.
Edit:
Note that you can use almost any non-alphabetic character pair instead of '{}'.
These variants work just as well:
%r!/home/user!
%r'/home/user'
%r(/home/user)
Edit 2:
Note that the %r{}x variant ignores whitespace, making complex regexps more readable. Example from GitHub's Ruby style guide:
regexp = %r{
start # some text
\s # white space char
(group) # first group
(?:alt1|alt2) # some alternation
end
}x

With %r, you could use any delimiters.
You could use %r{} or %r[] or %r!! etc.
The benefit of using other delimeters is that you don't need to escape the / used in normal regex literal.

\. => contains a dot
(gif|jpg|jpeg|png) => then, either one of these extensions
$ => the end, nothing after it
i => case insensitive
And it's the same as writing /\.(gif|jpg|jpeg|png)$/i.

this regexp matches all strings that ends with .gif, .jpg...
you could replace it with
/\.(gif|jpg|jpeg|png)$/i

It mean that image_file_name must end ($) with dot and one of gif, jpg, jpeg or png.
Yes %r{} mean exactly the same as // but in %r{} you don't need to escape /.

Related

rails validation with regex

I'm reading agile web development with rails 6.
In chapter 7, Task B: validation and unite testing
class Product < ApplicationRecord
validates :image_url, allow_blank: true, format: {
with: %r{\.(gif|jpg|png)\z}i,
}
what does the i mean in the end here?
It should mean that it's ending with .git or .jpg or .png
The i in your query tells the regex to match using a case insensitive match. There is nothing really unique to rails here so you may want to look into regexes in general to learn all the different terms you can use to modify your expression.
The expression %r{\.(gif|jpg|png)\z}i is equivalent to /\.(gif|jpg|png)\z/i
the \. means the period character
the | is an or as you stated
the \z is end of string with some caveats that you can read more about here: http://www.regular-expressions.info/anchors.html
and the i is incentive case matching
This means you would match 'test.jpg', 'test.JPg', 'test.JPG' or any permutation of those three characters in any case preceded by a period that occurs at the end of the string.
Here are the docs for regex formats in ruby specific:
https://ruby-doc.org/2.7.7/Regexp.html
And here is something where you can play with and learn regexes in general and try some expressions yourself:
https://regexr.com
short explain:
The "i" at the end of the regular expression is a modifier that makes the expression case-insensitive. This means that it will match both upper and lowercase letters in the image URL.

Validate some text in a ruby on rails model

I want to validate a text string to make sure it is safe. I do not want to escape it, as I want to display it.
I have tried
validates :description, :format => { :with => /^[\-$ ?!."'\/,a-z0-9]+$/i
and it works in that it passes text with characters that are allowed and fails when characters not listed above are include.
But Brakeman issues a message that advocates replacing the ^ with \A and the $ with /z. However if I do this, the validator fails all tests.
It is not working cause you are using forward slash/ instead of back slash \. The /z means matching the characters / and z literally. It is \z or \Z And they mean as follows:
\z means the end of the string. Whereas
\Z means before an optional \n, and the end of the string.
So use the one which fits best with you!

Regex to validate string having only characters (not special characters), blank spaces and numbers

I am using Ruby on Rails 3.0.9 and I would like to validate a string that can have only characters (not special characters - case insensitive), blank spaces and numbers.
In my validation code I have:
validates :name,
:presence => true,
:format => { :with => regex } # Here I should set the 'regex'
How I should state the regex?
There are a couple ways of doing this. If you only want to allow ASCII word characters (no accented characters like Ê or letters from other alphabets like Ӕ or ל), use this:
/^[a-zA-Z\d\s]*$/
If you want to allow only numbers and letters from other languages for Ruby 1.8.7, use this:
/^(?:[^\W_]|\s)*$/u
If you want to allow only numbers and letters from other languages for Ruby 1.9.x, use this:
^[\p{Word}\w\s-]*$
Also, if you are planning to use 1.9.x regex with unicode support in Ruby on Rails, add this line at the beginning of your .rb file:
# coding: utf-8
You're looking for:
[a-zA-Z0-9\s]+
The + says one or more so it'll not match empty string. If you need to match them as well, use * in place of +.
In addition to what have been said, assign any of the regular expresion to your regex variable in your control this, for instance
regex = ^[a-zA-Z\d\s]*$

Regular expression for valid subdomain in Ruby

I'm attempting to validate a string of user input that will be used as a subdomain. The rules are as follows:
Between 1 and 63 characters in length (I take 63 from the number of characters Google Chrome appears to allow in a subdomain, not sure if it's actually a server directive. If you have better advice on valid max length, I'm interested in hearing it)
May contain a-zA-Z0-9, hyphen, underscore
May not begin or end with a hyphen or underscore
EDIT: From input below, I've added the following:
4. Should not contain consecutive hyphens or underscores.
Examples:
a => valid
0 => valid
- => not valid
_ => not valid
a- => not valid
-a => not valid
a_ => not valid
_a => not valid
aa => valid
aaa => valid
a-a-a => valid
0-a => valid
a&a => not valid
a-_0 => not valid
a--a => not valid
aaa- => not valid
My issue is I'm not sure how to specify with a RegEx that the string is allowed to be only one character, while also specifying that it may not begin or end with a hyphen or underscore.
Thanks!
You can't can have underscores in proper subdomains, but do you need them? After trimming your input, do a simple string length check, then test with this:
/^[a-z\d]+(-[a-z\d]+)*$/i
With the above, you won't get consecutive - characters, e.g. a-bbb-ccc passes and a--d fails.
/^[a-z\d]+([-_][a-z\d]+)*$/i
Will allow non-consecutive underscores as well.
Update: you'll find that, in practice, underscores are disallowed and all subdomains must start with a letter. The solution above does not allow internationalised subdomains (punycode). You're better of using this
/\A([a-z][a-z\d]*(-[a-z\d]+)*|xn--[\-a-z\d]+)\z/i
I'm not familiar with Ruby regex syntax, but I'll assume it's like, say, Perl. Sounds like you want:
/^(?![-_])[-a-z\d_]{1,63}(?<![-_])$/i
Or if Ruby doesn't use the i flag, just replace [-a-z\d_] with [-a-zA-Z\d_].
The reason I'm using [-a-zA-Z\d_] instead of the shorter [-\w] is that, while nearly equivalent, \w will allow special characters such as ä rather than just ASCII-type characters. That behavior can be optionally turned off in most languages, or you can allow it if you like.
Some more information on character classes, quantifiers, and lookarounds
/^([a-z0-9][a-z0-9\-\_]{0,61}[a-z0-9]|[a-z0-9])$/i
I've took it as a challenge to create a regex that should match only strings with non-repeating hyphens or underscores and also check the proper length for you:
/^([a-z0-9]([_\-](?![_\-])|[a-z0-9]){0,61}[a-z0-9]|[a-z0-9])$/i
The middle part uses a lookaround to verify that.
^[a-zA-Z]([-a-zA-Z\d]*[a-zA-Z\d])?$
This simply enforces the standard in an efficient way without backtracking. It does not check the length, but Regex is inefficient at things like that. Just check the string length (1 to 64 chars).
/[^\W\_](.+?)[^\W\_]$/i should work for ya (try our http://rubular.com/ to test out regular expressions)
EDIT: actually, this doesn't check single/double letter/numbers. try /([^\W\_](.+?)[^\W\_])|([a-z0-9]{1,2})/i instead, and tinker with it in rubular until you get exactly what ya want (if this doesn't take care of it already).

How to make a Ruby string safe for a filesystem?

I have user entries as filenames. Of course this is not a good idea, so I want to drop everything except [a-z], [A-Z], [0-9], _ and -.
For instance:
my§document$is°° very&interesting___thisIs%nice445.doc.pdf
should become
my_document_is_____very_interesting___thisIs_nice445_doc.pdf
and then ideally
my_document_is_very_interesting_thisIs_nice445_doc.pdf
Is there a nice and elegant way for doing this?
I'd like to suggest a solution that differs from the old one. Note that the old one uses the deprecated returning. By the way, it's anyway specific to Rails, and you didn't explicitly mention Rails in your question (only as a tag). Also, the existing solution fails to encode .doc.pdf into _doc.pdf, as you requested. And, of course, it doesn't collapse the underscores into one.
Here's my solution:
def sanitize_filename(filename)
# Split the name when finding a period which is preceded by some
# character, and is followed by some character other than a period,
# if there is no following period that is followed by something
# other than a period (yeah, confusing, I know)
fn = filename.split /(?<=.)\.(?=[^.])(?!.*\.[^.])/m
# We now have one or two parts (depending on whether we could find
# a suitable period). For each of these parts, replace any unwanted
# sequence of characters with an underscore
fn.map! { |s| s.gsub /[^a-z0-9\-]+/i, '_' }
# Finally, join the parts with a period and return the result
return fn.join '.'
end
You haven't specified all the details about the conversion. Thus, I'm making the following assumptions:
There should be at most one filename extension, which means that there should be at most one period in the filename
Trailing periods do not mark the start of an extension
Leading periods do not mark the start of an extension
Any sequence of characters beyond A–Z, a–z, 0–9 and - should be collapsed into a single _ (i.e. underscore is itself regarded as a disallowed character, and the string '$%__°#' would become '_' – rather than '___' from the parts '$%', '__' and '°#')
The complicated part of this is where I split the filename into the main part and extension. With the help of a regular expression, I'm searching for the last period, which is followed by something else than a period, so that there are no following periods matching the same criteria in the string. It must, however, be preceded by some character to make sure it's not the first character in the string.
My results from testing the function:
1.9.3p125 :006 > sanitize_filename 'my§document$is°° very&interesting___thisIs%nice445.doc.pdf'
=> "my_document_is_very_interesting_thisIs_nice445_doc.pdf"
which I think is what you requested. I hope this is nice and elegant enough.
From http://web.archive.org/web/20110529023841/http://devblog.muziboo.com/2008/06/17/attachment-fu-sanitize-filename-regex-and-unicode-gotcha/:
def sanitize_filename(filename)
returning filename.strip do |name|
# NOTE: File.basename doesn't work right with Windows paths on Unix
# get only the filename, not the whole path
name.gsub!(/^.*(\\|\/)/, '')
# Strip out the non-ascii character
name.gsub!(/[^0-9A-Za-z.\-]/, '_')
end
end
In Rails you might also be able to use ActiveStorage::Filename#sanitized:
ActiveStorage::Filename.new("foo:bar.jpg").sanitized # => "foo-bar.jpg"
ActiveStorage::Filename.new("foo/bar.jpg").sanitized # => "foo-bar.jpg"
If you use Rails you can also use String#parameterize. This is not particularly intended for that, but you will obtain a satisfying result.
"my§document$is°° very&interesting___thisIs%nice445.doc.pdf".parameterize
For Rails I found myself wanting to keep any file extensions but using parameterize for the remainder of the characters:
filename = "my§doc$is°° very&itng___thsIs%nie445.doc.pdf"
cleaned = filename.split(".").map(&:parameterize).join(".")
Implementation details and ideas see source: https://github.com/rails/rails/blob/master/activesupport/lib/active_support/inflector/transliterate.rb
def parameterize(string, separator: "-", preserve_case: false)
# Turn unwanted chars into the separator.
parameterized_string.gsub!(/[^a-z0-9\-_]+/i, separator)
#... some more stuff
end
If your goal is just to generate a filename that is "safe" to use on all operating systems (and not to remove any and all non-ASCII characters), then I would recommend the zaru gem. It doesn't do everything the original question specifies, but the filename produced should be safe to use (and still keep any filename-safe unicode characters untouched):
Zaru.sanitize! " what\ēver//wëird:user:înput:"
# => "whatēverwëirduserînput"
Zaru.sanitize! "my§docu*ment$is°° very&interes:ting___thisIs%nice445.doc.pdf"
# => "my§document$is°° very&interesting___thisIs%nice445.doc.pdf"
There is a library that may be helpful, especially if you're interested in replacing weird Unicode characters with ASCII: unidecode.
irb(main):001:0> require 'unidecoder'
=> true
irb(main):004:0> "Grzegżółka".to_ascii
=> "Grzegzolka"

Resources