I'm trying to pull the user_id from a foursquare URL, like this one:
https://foursquare.com/user/99999999
The following regex pulls exactly what I need (a series of numbers that terminate with the end of the line):
\d+$
However, I'm not sure how to set a string equal to the matched characters. I'm aware of sub and gsub, but those methods substitute a matched string for something else.
I'm looking for a way to specifically pull the section of a string that matches my regex (if it exists)
I like to use the return of match():
Anything wrapped in a capture () in the regex, gets assigned to the match result array
"https://foursquare.com/user/99999999".match(/(\d+)\z/)[1] #=> "99999999"
>> "https://foursquare.com/user/99999999"[/(\d+)\z/, 1]
=> "99999999"
>> "https://foursquare.com/user/99999999" =~ /(\d+)\z/
=> 28
>> $1
=> "99999999"
>> "https://foursquare.com/user/99999999".split('/').last
=> "99999999"
There are many ways. I personally like String#[] though
Related
I would like to match instances of a word in string, as long as the word is not in a URL.
An example would be find the instances of 'hello' in the following:
hello this is a regex problem http://geocities.com/hello/index.html?hello! Hello how are you!
The simplest regex for this problem is:
/\bhello\b/i
However this returns all four instances of 'hello', including the two contained within the URL string.
I have experimented with negative look-behinds for 'http' but so far nothing has worked. Any ideas?
Here are several solutions based on The Best Regex Trick Ever for 1) counting matches outside of a URL, 2) removing matches not in a URL, and 3) wrapping the matches with a tag outside of a URL:
s = "hello this is a regex problem http:"+"//geocities.com/hello/index.html?hello! Hello how are you!"
# Counting
p s.scan(/https?:\/\/\S*|(hello)/i).flatten.compact.count
## => 2
# Removing
p s.gsub(/(https?:\/\/\S*)|hello/i, '\1')
## => " this is a regex problem http://geocities.com/hello/index.html?hello! how are you!"
# Wrapping with a tag
p s.gsub(/(https?:\/\/\S*)|(hello)/i) { $1 || "<span>#{$2}</span>" }
## => "<span>hello</span> this is a regex problem http://geocities.com/hello/index.html?hello! <span>Hello</span> how are you!"
You may wrap hello pattern with word boundaries if you need to match a whole word, \bhello\b.
See the online Ruby demo
Notes
.scan(/https?:\/\/\S*|(hello)/i).flatten.compact.count - matches a URL starting with http or https, or matches and captures hello in Group 1, .scan only returns captured substrings, but it also returns nil once the URL is matched, so .compact is required to remove nil items from the flattened array and .count returns the number of items in the array.
.gsub(/(https?:\/\/\S*)|hello/i, '\1') matches and captures URLs into Group 1 and hello just matches all hellos outside of URLs, and the matches are replaced with \1, backreference to Group 1 that is an empty string when just hello is found.
s.gsub(/(https?:\/\/\S*)|(hello)/i) { $1 || "<span>#{$2}</span>" } matches and captures URLs into Group 1 and hellos into Group 2. If Group 1 was matched, $1 puts this value back into the string, else, the Group 2 is wrapped with tags and inserted back into the string.
If I'm correct you need to get words after url. You can just use space(\s) as delimiter of your string
"http://geocities.com/hello/index.html?hello! Hello how are you!".scan(/\s(\w+)/i)
=> [["Hello"], ["how"], ["are"], ["you"]]
Or
"http://geocities.com/hello/index.html?hello! Hello how are you!".scan(/\s(hello)/i)
=> [["Hello"]]
Here, we can first collect our URLs, altered by our desired words in a capturing group, with an expression similar to:
http[^\s]+|(hello|you)
Demo
RegEx Circuit
jex.im visualizes regular expressions:
Advice
The fourth bird advises that:
I would go for the word boundaries and only hello in the group: \bhttp\S+|\b(hello)\b
[PostgreSQL(9.4), Rails(4.1)]
The problem:
I have a table with the names of tools. The column_name is hstore type and looks like this: name -> ('en': value, 'de': value). Worth noting that 'de' is unnecessary in this problem, cause all names are stored only in 'en' key.
Next I have to construct a search query that will find the right record, but the format of the text in query are unknown, e.g.:
In DB:
WQXZ 123GT, should match query: WQXZ_123-GT
In DB:
Three Words Name 123-D45, should match query: Three_WORDS_NAME 123D45
and so on...
Solution:
To get this happen I want to normalize the value that I'm looking for and the query in such way that both of them will be identical. To do this I need to make both values in downcase, remove all whitspaces, remove all non-alphanumeric characters, so the values above will be:
wqxz123gt == wqxz123gt
and
threewordsname123d45 == threewordsname123d45
I have no problem to format a search value in ruby:
"sTR-in.g24 3".downcase.gsub(/\s/, "").gsub(/\W/, "") # => "string243"
But I can't understand how to do this in SQL-search query to look like:
Tool.where("CODE_I_AM_LOOKING_FOR(name -> 'en') = (?)", value.downcase.gsub(/\s/, "").gsub(/\W/, ""))
Thank you for your time.
UPD: I can make a downcase in query:
Tool.where("lower(name -> 'en') = (?)", value.downcase)
But it solves only a part of the problem (downcase). The whitespaces and non-word characters (dots, dashes, underscores, etc.) are still an issue.
You can use Postgres replace function to remove spaces. Then use lower function to match on that value. Like this.
Tool.where("lower(replace(name -> 'en', ' ', '')) = (?)", value.downcase.gsub(/\s/, "").gsub(/\W/, "") )
I hope this would be helpful.
Nitin Srivastava's answer directed me in right direction. All I needed was to use regexp_replace function.
So the proper query is:
Tool.where(
"lower(regexp_replace((name -> 'en'), '[^a-zA-Z0-9]+', '', 'g')) = ?",
value.downcase.gsub(/\s/, "").gsub(/\W/,"")
)
I am using IOS regular expression engine to match any text in the form:
"[h1]test text[/h1]"
i wrote: #"\\[h1]([^.]*)[/h1\\]]"
to match this form, but it is working sometimes and other times it matches text out of bound of the last bracket, is it the best form to match these strings or what you suggest ?
I would recommend using (.*?) instead of ([^.]*?).
It looks want you want is "between [h1] and [/h1] match anything." That would be (.*?).
What you have is "between [h1] and [/h1] match anything which is not a period (.)."
In addition, you have a problem with your ending [/h1\\]] means end with a /, h, 1, or ]. I think you want \\[/h1] which means end with the string [/h1].
The final regex would be #"\\[h1](.*?)\\[/h1]".
I have a long list of information stored in a variable and I need to run some regex expressions against that variable and get various pieces of information from what is found.
How can you store the line that matches a regex expression in a variable?
How can you get the line number of the line that matches a regex expression?
Here is an example of what I'm talking about.
body = "service timestamps log datetime msec localtime show-timezone
service password-encryption
!
hostname switch01
!
boot-start-marker"
If I search for the line that contains "hostname" I need the line number, in this case it would be 4. I also need to store the line "hostname switch01" as another variable.
Any ideas?
Thanks!
First you'd want to convert the string to lines: body.split('\n'), then you want to add line numbers to the lines: .each_with_index. Then you want to select the lines .select {|line, line_nr| line =~ your_regex }. Putting it all together:
body.split('\n').each_with_index
.select {|line, line_nr| line =~ your_regex }
.map {|line, line_nr| line_nr }
This will give you all the lines matching 'your_regex'
Let's say you have an object file that provides a #lines method:
lines = file.lines.each_with_index.select {|line, i| line =~ /regex/ }
If you already have a list of lines you can leave out the call to #lines. If you have a string you can use string.split("\n").
This will result in the variable lines containing an array of 2-element arrays with the line that matched your RegEx and the index of the line in the original file.
Breakdown
file.lines gets the lines - of course the other methods I mentioned might also apply here for you. We then add the index to each element with #each_with_index, because you want to store these as well. This has the same effect as #map.with_index {|e, i| [e, i]}, i.e. map every element to [element, index]. We then use the #select method to get all lines that do match your RegEx (FYI, =~ is the matching operator in Ruby, Perl and other languages - in case you didn't already know). We're done after that, but you might need to further transform the data so you can process it.
I have a list of names in my DB and I need to sort them alphabetically. However, I need to show the greek letters first, and then the latin ones. For example, I have:
[Jale, Βήτα, Άλφα, Ben]
and I need to order it like this:
[Άλφα, Βήτα, Ben, Jale]
Any suggestions would be much appreciated :)
I like to solve these problems by playing around in irb. Here's one way you could go about finding this solution. First, we'll define our test array:
>> names = %w{Jale Βήτα Άλφα Ben}
=> ["Jale", "Βήτα", "Άλφα", "Ben"]
To solve this, let's first transform the array into 2-tuples which contain a flag indicating whether the name is greek or not, and then the name itself. We want the flag to be sortable, so we'll first find a regex match for latin-only characters, and coerce it to be a string.
>> names.map{|name| [(name =~ /^\w+$/).to_s, name]}
=> [["0", "Jale"], ["", "Βήτα"], ["", "Άλφα"], ["0", "Ben"]]
Then we'll sort the 2-tuples:
>> names.map{|name| [(name =~ /^\w+$/).to_s, name]}.sort
=> [["", "Άλφα"], ["", "Βήτα"], ["0", "Ben"], ["0", "Jale"]]
We now have a sort order where we have first the greek names, then the latin names. We can shorten this into our solution:
>> names.sort_by{|name| [(name =~ /^\w+$/).to_s, name]}
=> ["Άλφα", "Βήτα", "Ben", "Jale"]
I gave one solution above. Another approach is to partition the names into greek and latin, then sort within those groups, then flatten the two arrays into one:
>> names.partition{|name| name !~ /^\w+$/}.map(&:sort).flatten
=> ["Άλφα", "Βήτα", "Ben", "Jale"]
This might be a little more elegant and understandable than my other solution, but it is less flexible. Note that name !~ /\w+$ will return something like true if the name has non-latin characters, ie, is greek.