Assistance with Some Interesting Syntax in Some Ruby Code I've Found - ruby-on-rails

I'm currently reading Agile Web Development With Rails, 3rd edition. On page 672, I came across this method:
def capitalize_words(string)
string.gsub(/\b\w/) { $&.upcase }
end
What is the code in the block doing? I have never seen that syntax. Is it similar to the array.map(&:some_method) syntax?

It's Title Casing The Input. inside the block, $& is a built-in representing the current match (\b\w i.e. the first letter of each word) which is then uppercased.

You've touched on one of the few things I don't like about Ruby :)
The magic variable $& contains the matched string from the previous successful pattern match. So in this case, it'll be the first character of each word.
This is mentioned in the RDoc for String.gsub:
http://ruby-doc.org/core/classes/String.html#M000817

gsub replaces everything that matched in the regex with the result of the block. so yes, in this case you're matching the first letter of words, then replacing it with the upcased version.
as to the slightly bizarre syntax inside the block, this is equivalent (and perhaps easier to understand):
def capitalize_words(string)
string.gsub(/\b\w/) {|x| x.upcase}
end
or even slicker:
def capitalize_words(string)
string.gsub /\b\w/, &:upcase
end
as to the regex (courtesy the pickaxe book), \b matches a word boundary, and \w any 'word character' (alphanumerics and underscore). so \b\w matches the first character of the word.

Related

How to remove from string before __

I am building a Rails 5.2 app.
In this app I got outputs from different suppliers (I am building a webshop).
The name of the shipping provider is in this format:
dhl_freight__233433
It could also be in this format:
postal__US-320202
How can I remove all that is before (and including) the __ so all that remains are the things after the ___ like for example 233433.
Perhaps some sort of RegEx.
A very simple approach would be to use String#split and then pick the second part that is the last part in this example:
"dhl_freight__233433".split('__').last
#=> "233433"
"postal__US-320202".split('__').last
#=> "US-320202"
You can use a very simple Regexp and a ask the resulting MatchData for the post_match part:
p "dhl_freight__233433".match(/__/).post_match
# another (magic) way to acces the post_match part:
p $'
Postscript: Learnt something from this question myself: you don't even have to use a RegExp for this to work. Just "asddfg__qwer".match("__").post_match does the trick (it does the conversion to regexp for you)
r = /[^_]+\z/
"dhl_freight__233433"[r] #=> "233433"
"postal__US-320202"[r] #=> "US-320202"
The regular expression matches one or more characters other than an underscore, followed by the end of the string (\z). The ^ at the beginning of the character class reads, "other than any of the characters that follow".
See String#[].
This assumes that the last underscore is preceded by an underscore. If the last underscore is not preceded by an underscore, in which case there should be no match, add a positive lookbehind:
r = /(?<=__[^_]+\z/
This requires the match to be preceded by two underscores.
There are many ruby ways to extract numbers from string. I hope you're trying to fetch numbers out of a string. Here are some of the ways to do so.
Ref- http://www.ruby-forum.com/topic/125709
line.delete("^0-9")
line.scan(/\d/).join('')
line.tr("^0-9", '')
In the above delete is the fastest to trim numbers out of strings.
All of above extracts numbers from string and joins them. If a string is like this "String-with-67829___numbers-09764" outut would be like this "6782909764"
In case if you want the numbers split like this ["67829", "09764"]
line.split(/[^\d]/).reject { |c| c.empty? }
Hope these answers help you! Happy coding :-)

Rails strip all except numbers commas and decimal points

Hi I've been struggling with this for the last hour and am no closer. How exactly do I strip everything except numbers, commas and decimal points from a rails string? The closest I have so far is:-
rate = rate.gsub!(/[^0-9]/i, '')
This strips everything but the numbers. When I try add commas to the expression, everything is getting stripped. I got the aboves from somewhere else and as far as I can gather:
^ = not
Everything to the left of the comma gets replaced by what's in the '' on the right
No idea what the /i does
I'm very new to gsub. Does anyone know of a good tutorial on building expressions?
Thanks
Try:
rate = rate.gsub(/[^0-9,\.]/, '')
Basically, you know the ^ means not when inside the character class brackets [] which you are using, and then you can just add the comma to the list. The decimal needs to be escaped with a backslash because in regular expressions they are a special character that means "match anything".
Also, be aware of whether you are using gsub or gsub!
gsub! has the bang, so it edits the instance of the string you're passing in, rather than returning another one.
So if using gsub! it would be:
rate.gsub!(/[^0-9,\.]/, '')
And rate would be altered.
If you do not want to alter the original variable, then you can use the version without the bang (and assign it to a different var):
cleaned_rate = rate.gsub!(/[^0-9,\.]/, '')
I'd just google for tutorials. I haven't used one. Regexes are a LOT of time and trial and error (and table-flipping).
This is a cool tool to use with a mini cheat-sheet on it for ruby that allows you to quickly edit and test your expression:
http://rubular.com/
You can just add the comma and period in the square-bracketed expression:
rate.gsub(/[^0-9,.]/, '')
You don't need the i for case-insensitivity for numbers and symbols.
There's lots of info on regular expressions, regex, etc. Maybe search for those instead of gsub.
You can use this:
rate = rate.gsub!(/[^0-9\.\,]/g,'')
Also check this out to learn more about regular expressions:
http://www.regexr.com/

regex basic url expression

Hi I'm creating a regular expression (ruby) to test the beginning and end of string. I have both parts but can't join them.
Beginning of string
\A(http:\/\/+)
End of string
(.pdf)\z
How to join?
Bonus if it could validate in-between and accept anything (to avoid http://.pdf)
By the way, rubular http://rubular.com is a neat place to validate expressions
Use .+ to match any character except \n one or more times.
\A(http:\/\/+).+(\.pdf)\z
Should match http://www.stackoverflow.com/bestbook.pdf but not http://.pdf

Retaining the pattern characters while splitting via Regex, Ruby

I have the following string
str="HelloWorld How areYou I AmFine"
I want this string into the following array
["Hello","World How are","You I Am", "Fine"]
I have been using the following regex, it splits correctly but it also omits the matching pattern, i also want to retain that pattern.
What i get is
str.split(/[a-z][A-Z]/)
=> ["Hell", "orld How ar", "ou I A", "ine"]
It omitts the matching pattern.
Can any one help me out how to retain these characters as well in the resulting array
In Ruby 1.9 you can use positive lookahead and positive lookbehind (lookahead and lookbehind regex constructs are also called zero-width assertions). They match characters, but then give up the match and only return the result, thus you won't loose your border characters:
str.split /(?<=[a-z])(?=[A-Z])/
=> ["Hello", "World How are", "You I Am", "Fine"]
Ruby 1.8 does not support lookahead/lookbehind constructs. I recommend to use ruby 1.9 if possible.
If you are forced to use ruby 1.8.7, I think regex won't help you and the best solution I can think of is to build a simple state machine: iterate over each character in your original string and build first string until you encounter border condition. Then build second string etc.
Three answers so far, each with a limitation: one is rails-only and breaks with underscore in original string, another is ruby 1.9 only, the third always has a potential error with its special character. I really liked the split on zero-width assertion answer from #Alex Kliuchnikau, but the OP needs ruby 1.8 which doesn't support lookbehind. There's an answer that uses only zero-width lookahead and works fine in 1.8 and 1.9 using String#scan instead of #split.
str.scan /.*?[a-z](?=[A-Z]|$)/
=> ["Hello", "World How are", "You I Am", "Fine"]
I think this will do the job for you
str.underscore.split(/_/).each do |s|
s.capitalize!
end

How to make a Ruby string safe for a filesystem?

I have user entries as filenames. Of course this is not a good idea, so I want to drop everything except [a-z], [A-Z], [0-9], _ and -.
For instance:
my§document$is°° very&interesting___thisIs%nice445.doc.pdf
should become
my_document_is_____very_interesting___thisIs_nice445_doc.pdf
and then ideally
my_document_is_very_interesting_thisIs_nice445_doc.pdf
Is there a nice and elegant way for doing this?
I'd like to suggest a solution that differs from the old one. Note that the old one uses the deprecated returning. By the way, it's anyway specific to Rails, and you didn't explicitly mention Rails in your question (only as a tag). Also, the existing solution fails to encode .doc.pdf into _doc.pdf, as you requested. And, of course, it doesn't collapse the underscores into one.
Here's my solution:
def sanitize_filename(filename)
# Split the name when finding a period which is preceded by some
# character, and is followed by some character other than a period,
# if there is no following period that is followed by something
# other than a period (yeah, confusing, I know)
fn = filename.split /(?<=.)\.(?=[^.])(?!.*\.[^.])/m
# We now have one or two parts (depending on whether we could find
# a suitable period). For each of these parts, replace any unwanted
# sequence of characters with an underscore
fn.map! { |s| s.gsub /[^a-z0-9\-]+/i, '_' }
# Finally, join the parts with a period and return the result
return fn.join '.'
end
You haven't specified all the details about the conversion. Thus, I'm making the following assumptions:
There should be at most one filename extension, which means that there should be at most one period in the filename
Trailing periods do not mark the start of an extension
Leading periods do not mark the start of an extension
Any sequence of characters beyond A–Z, a–z, 0–9 and - should be collapsed into a single _ (i.e. underscore is itself regarded as a disallowed character, and the string '$%__°#' would become '_' – rather than '___' from the parts '$%', '__' and '°#')
The complicated part of this is where I split the filename into the main part and extension. With the help of a regular expression, I'm searching for the last period, which is followed by something else than a period, so that there are no following periods matching the same criteria in the string. It must, however, be preceded by some character to make sure it's not the first character in the string.
My results from testing the function:
1.9.3p125 :006 > sanitize_filename 'my§document$is°° very&interesting___thisIs%nice445.doc.pdf'
=> "my_document_is_very_interesting_thisIs_nice445_doc.pdf"
which I think is what you requested. I hope this is nice and elegant enough.
From http://web.archive.org/web/20110529023841/http://devblog.muziboo.com/2008/06/17/attachment-fu-sanitize-filename-regex-and-unicode-gotcha/:
def sanitize_filename(filename)
returning filename.strip do |name|
# NOTE: File.basename doesn't work right with Windows paths on Unix
# get only the filename, not the whole path
name.gsub!(/^.*(\\|\/)/, '')
# Strip out the non-ascii character
name.gsub!(/[^0-9A-Za-z.\-]/, '_')
end
end
In Rails you might also be able to use ActiveStorage::Filename#sanitized:
ActiveStorage::Filename.new("foo:bar.jpg").sanitized # => "foo-bar.jpg"
ActiveStorage::Filename.new("foo/bar.jpg").sanitized # => "foo-bar.jpg"
If you use Rails you can also use String#parameterize. This is not particularly intended for that, but you will obtain a satisfying result.
"my§document$is°° very&interesting___thisIs%nice445.doc.pdf".parameterize
For Rails I found myself wanting to keep any file extensions but using parameterize for the remainder of the characters:
filename = "my§doc$is°° very&itng___thsIs%nie445.doc.pdf"
cleaned = filename.split(".").map(&:parameterize).join(".")
Implementation details and ideas see source: https://github.com/rails/rails/blob/master/activesupport/lib/active_support/inflector/transliterate.rb
def parameterize(string, separator: "-", preserve_case: false)
# Turn unwanted chars into the separator.
parameterized_string.gsub!(/[^a-z0-9\-_]+/i, separator)
#... some more stuff
end
If your goal is just to generate a filename that is "safe" to use on all operating systems (and not to remove any and all non-ASCII characters), then I would recommend the zaru gem. It doesn't do everything the original question specifies, but the filename produced should be safe to use (and still keep any filename-safe unicode characters untouched):
Zaru.sanitize! " what\ēver//wëird:user:înput:"
# => "whatēverwëirduserînput"
Zaru.sanitize! "my§docu*ment$is°° very&interes:ting___thisIs%nice445.doc.pdf"
# => "my§document$is°° very&interesting___thisIs%nice445.doc.pdf"
There is a library that may be helpful, especially if you're interested in replacing weird Unicode characters with ASCII: unidecode.
irb(main):001:0> require 'unidecoder'
=> true
irb(main):004:0> "Grzegżółka".to_ascii
=> "Grzegzolka"

Resources