How to combine Ruby regexp conditions - ruby-on-rails

I need to check if a string is valid image url.
I want to check beginning of string and end of string as follows:
Must start with http(s):
Must end by .jpg|.png|.gif|.jpeg
So far I have:
(https?:)
I can't seem to indicate beginning of string \A, combine patterns, and test end of string.
Test strings:
"http://image.com/a.jpg"
"https://image.com/a.jpg"
"ssh://image.com/a.jpg"
"http://image.com/a.jpeg"
"https://image.com/a.png"
"ssh://image.com/a.jpeg"
Please see http://rubular.com/r/PqERRim5RQ
Using Ruby 2.5

Using your very own demo, you could use
^https?:\/\/.*(?:\.jpg|\.png|\.gif|\.jpeg)$
See the modified demo.
One could even simplify it to:
^https?:\/\/.*\.(?:jpe?g|png|gif)$
See a demo for the latter as well.
This basically uses anchors (^ and $) on both sides, indicating the start/end of the string. Additionally, please remember that you need to escape the dot (\.) if you want to have ..
There's quite some ambiguity going on in the comments section, so let me clarify this:
^ - is meant for the start of a string
(or a line in multiline mode, but in Ruby strings are always in multiline mode)
$ - is meant for the end of a string / line
\A - is the very start of a string (irrespective of multilines)
\z - is the very end of a string (irrespective of multilines)

You may use
reg = %r{\Ahttps?://.*\.(?:png|gif|jpe?g)\z}
The point is:
When testing at online regex testers, you are testing a single multiline string, but in real life, you will validate lines as separate strings. So, in those testers, use ^ and $ and in real code, use \A and \z.
To match a string rather than a line you need \A and \z anchors
Use %r{pat} syntax if you have many / in your pattern, it is cleaner.
Online Ruby test:
urls = ['http://image.com/a.jpg',
'https://image.com/a.jpg',
'ssh://image.com/a.jpg',
'http://image.com/a.jpeg',
'https://image.com/a.png',
'ssh://image.com/a.jpeg']
reg = %r{\Ahttps?://.*\.(?:png|gif|jpe?g)\z}
urls.each { |url|
puts "#{url}: #{(reg =~ url) == 0}"
}
Output:
http://image.com/a.jpg: true
https://image.com/a.jpg: true
ssh://image.com/a.jpg: false
http://image.com/a.jpeg: true
https://image.com/a.png: true
ssh://image.com/a.jpeg: false

The answers here are quite good, but if you wanted to avoid using a complicated regex and communicate your intent more clearly to a reader, you could let URI and File do the heavy lifting for you.
(And since you're using 2.5, let's use #match? instead of other regex-matching methods.)
def valid_url?(url)
# Let URI parse the URL.
uri = URI.parse(url)
# Is the scheme http or https, and does the extension match expected formats?
uri.scheme.match?(/https?/i) && File.extname(uri.path).match?(/(png|jpe?g|gif)/i)
rescue URI::InvalidURIError
# If it's an invalid URL, URI will throw this error.
# We'll return `false`, because a URL that can't be parsed by URI isn't valid.
false
end
urls.map { |url| [url, valid_url?(url)] }
#=> Results in:
'http://image.com/a.jpg', true
'https://image.com/a.jpg', true
'ssh://image.com/a.jpg', false
'http://image.com/a.jpeg', true
'https://image.com/a.png', true
'ssh://image.com/a.jpeg', false
'https://image.com/a.tif', false
'http://t.co.uk/proposal.docx', false
'not a url', false

Related

Jenkins/Groovy: How to pull specific part of a string

I have a string that look like:
data = ABSIFHIEHFINE -2938 NODFNJN {[somedate]} oiejfoen
I need to pull {[somedate]} only with {[]} included.
I tried to do data.substring(0, data.indexOf(']}')) to remove the end of the string but it is also removing the symbols that I need to keep
I need to pull {[somedate]} only with {[]} included.
def data = 'ABSIFHIEHFINE -2938 NODFNJN {[somedate]} oiejfoen'
// you could do error checking on these to ensure
// >= 0 and end > start and handle that however
// is appropriate for your requirements...
def start = data.indexOf '{['
def end = data.indexOf ']}'
def result = data[start..(end+1)]
assert result == '{[somedate]}'
You can do it using regular expression search:
data = "ABSIFHIEHFINE -2938 NODFNJN {[somedate]} oiejfoen"
def matcher = data =~ /\{\[.+?\]\}/
if( matcher ) {
echo matcher[0]
}
else {
echo "no match"
}
Output:
{[somedate]}
Explanations:
=~ is the find operator. It creates a java.util.regex.Matcher.
The string between the forward slashes (which is just another way to define a string literal), is the regular expression: \{\[.+?\]\}
RegEx breakdown:
\{\[ - literal { and [ which must be escaped because they have special meaning in RegEx
.+? - any character, at least one, as little as possible (to support finding multiple sub strings enclosed in {[]})
\]\} - literal ] and } which must be escaped because they have special meaning in RegEx
You can test the RegEx only or use Groovy IDE to test the full sample code (replace echo by println).

Tidy long string in Ruby

I have a method in Ruby, which needs an API URL:
request_url = "http://api.abc.com/v3/avail?rev=#{ENV['REV']}&key=#{ENV['KEY']}&locale=en_US&currencyCode=#{currency}&arrivalDate=#{check_in}&departureDate=#{check_out}&includeDetails=true&includeRoomImages=true&room1=#{total_guests}"
I want to format it to be more readable. It should take arguments.
request_url = "http://api.abc.com/v3/avail?
&rev=#{ENV['REV']}
&key=#{ENV['KEY']}
&locale=en_US
&currencyCode=#{currency}
&arrivalDate=#{check_in}
&departureDate=#{check_out}
&includeDetails=true
&includeRoomImages=true
&room1=#{total_guests}"
But of course there's line break. I tried heredoc, but I want it to be in one line.
I would prefer to not build URI queries by joining strings, because that might lead to URLs that are not correctly encoded (see a list of characters that need to be encoded in URIs).
There is the Hash#to_query method in Ruby on Rails that does exactly what you need and it ensure that the parameters are correctly URI encoded:
base_url = 'http://api.abc.com/v3/avail'
arguments = {
rev: ENV['REV'],
key: ENV['KEY'],
locale: 'en_US',
currencyCode: currency,
arrivalDate: check_in,
departureDate: check_out,
includeDetails: true,
includeRoomImages: true,
room1: total_guests
}
request_url = "#{base_url}?#{arguments.to_query}"
You could use an array and join the strings:
request_url = [
"http://api.abc.com/v3/avail?",
"&rev=#{ENV['REV']}",
"&key=#{ENV['KEY']}",
"&locale=en_US",
"&currencyCode=#{currency}",
"&arrivalDate=#{check_in}",
"&departureDate=#{check_out}",
"&includeDetails=true",
"&includeRoomImages=true",
"&room1=#{total_guests}",
].join('')
Even easier, you can use the %W array shorthand notation so you don't have to write out all the quotes and commas:
request_url = %W(
http://api.abc.com/v3/avail?
&rev=#{ENV['REV']}
&key=#{ENV['KEY']}
&locale=en_US
&currencyCode=#{currency}
&arrivalDate=#{check_in}
&departureDate=#{check_out}
&includeDetails=true
&includeRoomImages=true
&room1=#{total_guests}
).join('')
Edit: Of course, spickermann makes a very good point above on better ways to accomplish this specifically for URLs. However, if you're not constructing a URL and just working with strings, the above methods should work fine.
You can extend strings in Ruby using the line continuation operator. Example:
request_url = "http://api.abc.com/v3/avail?" \
"&rev=#{ENV['REV']}" \
"&key=#{ENV['KEY']}"

How to Build regular expression pattern in rails

I am actually writing rails code where i want to check if
params[:name] = any character like = , / \
to return true or return false otherwise.
How do i build a regex pattern for this or if any other better way exists would help too .
sanitized = params[:name].scan(/[=,\/\\]/)
if sanitized.empty?
# No such character in params[:name]
else
# oops, found atleast 1
end
HTH
I don't know if it's achieved the status of "idiomatic", but I think the most compact way of achieving this in Ruby is with double !:
!!(params[:name] =~ /[=,\/\\]/)
as discussed in How to return a boolean value from a regex

Regex in Ruby: expression not found

I'm having trouble with a regex in Ruby (on Rails). I'm relatively new to this.
The test string is:
http://www.xyz.com/017010830343?$ProdLarge$
I am trying to remove "$ProdLarge$". In other words, the $ signs and anything between.
My regular expression is:
\$\w+\$
Rubular says my expression is ok. http://rubular.com/r/NDDQxKVraK
But when I run my code, the app says it isn't finding a match. Code below:
some_array.each do |x|
logger.debug "scan #{x.scan('\$\w+\$')}"
logger.debug "String? #{x.instance_of?(String)}"
x.gsub!('\$\w+\$','scl=1')
...
My logger debug line shows a result of "[]". String is confirmed as being true. And the gsub line has no effect.
What do I need to correct?
Use /regex/ instead of 'regex':
> "http://www.xyz.com/017010830343?$ProdLarge$".gsub(/\$\w+\$/, 'scl=1')
=> "http://www.xyz.com/017010830343?scl=1"
Don't use a regex for this task, use a tool designed for it, URI. To remove the query:
require 'uri'
url = URI.parse('http://www.xyz.com/017010830343?$ProdLarge$')
url.query = nil
puts url.to_s
=> http://www.xyz.com/017010830343
To change to a different query use this instead of url.query = nil:
url.query = 'scl=1'
puts url.to_s
=> http://www.xyz.com/017010830343?scl=1
URI will automatically encode values if necessary, saving you the trouble. If you need even more URL management power, look at Addressable::URI.

Why does this regex check return true for this string?

I need a regex that will determine if a string is a tweet URL. I've got this
Regexp.new(/http:|https:\/\/(twitter\.com\/.*\/status\/.*|twitter\.com\/.*\/statuses\/.*|www\.twitter\.com\/.*\/status\/.*|www\.twitter\.com\/.*\/statuses\/.*|mobile\.twitter\.com\/.*\/status\/.*|mobile\.twitter\.com\/.*\/statuses\/.*)/i)
Why does it return true for the following?
"http://i.stack.imgur.com/QdOS0.jpg".match(Regexp.new(/http:|https:\/\/(twitter\.com\/.*\/status\/.*|twitter\.com\/.*\/statuses\/.*|www\.twitter\.com\/.*\/status\/.*|www\.twitter\.com\/.*\/statuses\/.*|mobile\.twitter\.com\/.*\/status\/.*|mobile\.twitter\.com\/.*\/statuses\/.*)/i))? true : false
=> true
http: will always match a URL starting with http:
Try the following:
/https?:\/\/(twitter\.com\/.*\/status\/.*|twitter\.com\/.*\/statuses\/.*|www\.twitter\.com\/.*\/status\/.*|www\.twitter\.com\/.*\/statuses\/.*|mobile\.twitter\.com\/.*\/status\/.*|mobile\.twitter\.com\/.*\/statuses\/.*)/i
The question mark will make the s optional, thus matching http or https.
Your regex could be abbreviated like :
#^https?://(:?www\.|mobile\.)?twitter\.com/.*?/status(:?es)?/.*#i
explanation:
# regex delimiter
^ start of line
https? http or https
:// ://
(:? start of non capture group
www\.|mobile\. www. or mobile.
)? end of group
twitter\.com/ twitter.com
.*? any number of any char not greedy
/status /status
(:?es)? non capture group that contains possibly `es`
/.* / followed by any number of any char
$ end of string
#i delimiter and case insensitive
No need for regular expressions here (as usual).
require 'uri'
uri = URI.parse("http://www.twitter.com/status/12345")
p uri.host.split('.')[-2] == 'twitter' # returns true
More docs at: http://ruby-doc.org/stdlib/
You should group your OR-Clauses, like this:
(http:|https:)
Additionally, it wouldn't hurt to specify beginning and end of it:
^(http:|https:).*$
The start of your regex specifies an option of just 'http:', which naturally matches the URL you are testing. Depending on how strict you need your check to be, you could just remove the http/https parts from the start of the regex.
While many other answers show you a better regex, the answer is because /foo|bar/ will match either foo or bar, and what you wrote was /http:|.../, hence all URLs will be matched.
See #giraff's answer for how you could have written the alternation to do what you expect, or #M42's or #Koraktor's answers for a better regexp.
And as posted in the comments, note that you can write a regex literal as %r{...} instead of /.../, which is nice when you want to use / characters in your regex without escaping them.

Resources