Ruby regex to find words starting with # - ruby-on-rails

I'm trying to write a very simple regex to find all words in a string that start with the symbol #. Then change the word to a link. Like you would see in a Twitter where you can mention other usernames.
So far I have written this
def username_link(s)
s.gsub(/\#\w+/, "<a href='/username'>username</a>").html_safe
end
I know it's very basic and not much, but I'd rather write it on my own right now, to fully understand it, before searching GitHub to find a more complex one.
What I'm trying to find out is how can I reference that matched word and include it in the place of username. Once I can do that i can easily strip the first character, #, out of it.
Thanks.

You can capture using parentheses and backreference with \1 (and \2, and so on):
def username_link(s)
s.gsub(/#(\w+)/, "<a href='/\\1'>\\1</a>").html_safe
end
See also this answer

You should use gsub with back references:
str = "I know it's very basic and not much, but #tim I'd rather write it on my own."
def username_to_link(str)
str.gsub(/\#(\w+)/, '#\1')
end
puts username_to_link(str)
#=> I know it's very basic and not much, but #tim I'd rather write it on my own.

Following Regex should handle corner cases which other answers ignore
def auto_username_link(s)
s.gsub(/(^|\s)\#(\w+)($|\s)/, "\\1<a href='/\\2'>\\2</a>\\3").html_safe
end
It should ignore strings like "someone#company" or "#username-1" while converting everything like "Hello #username rest of message"

How about this:
def convert_names_to_links(str)
str = " " + str
result = str.gsub(
/
(?<=\W) #Look for a non-word character(space/punctuation/etc.) preceeding
# #an "#" character, followed by
(\w+) #a word character, one or more times
/xm, #Standard normalizing flags
'#\1'
)
result[1..-1]
end
my_str = "#tim #tim #tim, ##tim,#tim t#mmy?"
puts convert_names_to_links(my_str)
--output:--
#tim #tim #tim, ##tim,#tim t#mmy?

Related

Looping through array targeting upcase letters only

Am trying to loop through a string which i have converted to an array and target only the upcase letters which i will then insert an empty space before the capitalized letter. My code checks for the first cap letter and adds the space but am struggling to do it for the next cap letter which in this case is "T". Any advise would be appreciated. Thanks
def break_camel(str)
# ([A-Z])/.match(str)
saved_string = str.freeze
cap_index =str.index(/[A-Z]/)
puts(cap_index)
x =str.split('').insert(cap_index, " ")
x.join
end
break_camel("camelCasingTest")
It's much easier to operate on your string directly, using String#gsub, than breaking it into pieces, operating on each piece then gluing everything back together again.
def break_camel(str)
str.gsub(/(?=[A-Z])/, ' ')
end
break_camel("camelCasingTest")
#=> "camel Casing Test"
break_camel("CamelCasingTest")
#=> " Camel Casing Test"
This converts a "zero-width position", immediately before each capital letter (and after the preceding character, if there is one), to a space. The expression (?=[A-Z]) is called a positive lookahead.
If you don't want to insert a space if the capital letter is at the beginning of a line, change the method as follows.
def break_camel(str)
str.gsub(/(?<=.)(?=[A-Z])/, ' ')
end
break_camel("CamelCasingTest")
#=> "Camel Casing Test"
(?<=.) is a positive lookbehind that requires the capital letter to be preceded by any character for the match to be made.
Another way of writing this is as follows.
def break_camel(str)
str.gsub(/(?<=.)([A-Z]))/, ' \1')
end
break_camel("CamelCasingTest")
#=> "Camel Casing Test"
Here the regular expression matches a capital letter that is not at the beginning of the line and saves it to capture group 1. It is then replaced by a space followed by the contents of capture group 1.
I think your approach is looking to keep reapplying your method until needed. One extension of your code is to use recursion:
def break_camel(str)
regex = /[a-z][A-Z]/
if str.match(regex)
cap_index = str.index(regex)
str.insert(cap_index + 1, " ")
break_camel(str)
else
str
end
end
break_camel("camelCasingTest") #=> "camel Casing Test"
Notice the break_camel method inside the method. Another way is by using the scan method passing the appropriate regex before rejoining them.
In code:
'camelCasingTest'.scan(/[A-Z]?[a-z]+/).join(' ') #=> "camel Casing Test"
Do you have to implement your own?
Looks like titleize https://apidock.com/rails/ActiveSupport/Inflector/titleize has this covered.

Remove certain regex from a string in Rails

I am building a tweet-like system that includes #mentions and #hashtags. Right now, I need to take a tweet that will come to the server like this:
hi [#Bob D](member:Bob D) whats the deal with [#red](tag:red)
and save it in the database as:
hi #Bob P whats the deal with #red
I have the flow of what the code looks like in my mind but can't get it to work. Basically, I need to do the following:
Scan the string for any [#...] (an array like structure that begins with an #)
Delete the paranthesis after the array like structure(so for [#Bob D](member:Bob D), remove everything in paranthesis)
Remove the brackets surrounding a substring that begins with #(meaning, delete the [] from [#...])
I will also need to do the same for #. I'm almost certain this can be done by using regular expressions the slice! method, but i'm really having trouble coming up with the regular expressions needed and the control flow.
I think it would be something like this:
a = "hi [#Bob D](member:Bob D) whats the deal with [#red](tag:red)"
substring = a.scan <regular expression here>
substring.each do |matching_substring| #the loop should get rid of the paranthesis but not the brackets
a.slice! matching_substring
end
#Something here should get rid of brackets
The problem with the code above is that I can't figure out the regex and it doesn't get rid of the brackets.
This regex should work for this
/(\[(#.*?)\]\((.*?)\))/
you can use this rubular to test it
the ? after the * makes it non-greedy so it should capture each match
the code would look something like
a = "hi [#Bob D](member:Bob D) whats the deal with [#red](tag:red)"
substring = a.scan (\[(#.*?)\]\((.*?)\))
substring.each do |matching_substring|
a.gsub(matching_substring[0], matching_substring[1]) # replaces [#Bob D](member:Bob D) with #Bob D
matching_substring[1] #the part in the brackets sans brackets
matching_substring[2] #the part in the parentheses sans parentheses
end
Consider this:
str = "hi [#Bob D](member:Bob D) whats the deal with [#red](tag:red)"
BRACKET_RE_STR = '\[
(
[##]
[^\]]+
)
\]'
PARAGRAPH_RE_STR = '\(
[^)]+
\)'
BRACKET_RE = /#{BRACKET_RE_STR}/x
PARAGRAPH_RE = /#{PARAGRAPH_RE_STR}/x
BRACKET_AND_PARAGRAPH_RE = /#{BRACKET_RE_STR}#{PARAGRAPH_RE_STR}/x
str.gsub(BRACKET_AND_PARAGRAPH_RE) { |s| s.sub(PARAGRAPH_RE, '').sub(BRACKET_RE, '\1') }
# => "hi #Bob D whats the deal with #red"
The longer, or more complex the pattern, the harder it is to maintain or update, so keep them as small as possible. Build complex patterns from simple ones so it's easier to debug and extend.

ruby on rails regular expressions

In my Rails application i have a generic search to display the matching results. What I have done to produce matching results is to replace blank spaces by "%" symbol. Its working perfectly but only if there is a gap between the search term . If I enter a single word it says "no matching string".
class TweetsController<ApplicationController
def index
city = params[:show]
search_term = params[:text]
search_term[" "] = "%"
city_coordinates = Coordinates.where('city=?', city)
#tweets = if (city_coordinates.count == 1 && city_coordinates.first.valid_location?)
Tweets.for_coordinates(city_coordinates.first) & Tweets.where("tweet_text LIKE?" ,"%#{search_term}%").all
else if (Coordinates.count != 1 )
Tweets.for_user_location(city) & Tweets.where("tweet_text LIKE ?" , "%#{search_term}%").all
else
#tweets = Tweets.where("%tweet_text% LIKE ? ", "%#{search_term}%").all
end
end
end
end
I am getting output only if I type two words like "Harbhajan Singh", "VVS Laxman" . If I type a single word its saying no matching strings. Anybody help me with this. I need the output both ways the user enters single word or two words or more .Anybody help me with this.
Probably, you are getting an
IndexError: string not matched
Thats because when there is a single word coming in params[:text], this code
search_term[" "] = "%"
raises the error.
You might want to read the string documentation for more details. It states:
If the regular expression or string is used as the index doesn’t match a position in the string, IndexError is raised.
Hope this helps.
I'm not too great with regular expressions myself, so I usually turn to Rubular. It helps you build and test regular expressions for Ruby.

Interpret newlines as <br>s in markdown (Github Markdown-style) in Ruby

I'm using markdown for comments on my site and I want users to be able to create line breaks by pressing enter instead of space space enter (see this meta question for more details on this idea)
How can I do this in Ruby? You'd think Github Flavored Markdown would be exactly what I need, but (surprisingly), it's quite buggy.
Here's their implementation:
# in very clear cases, let newlines become <br /> tags
text.gsub!(/^[\w\<][^\n]*\n+/) do |x|
x =~ /\n{2}/ ? x : (x.strip!; x << " \n")
end
This logic requires that the line start with a \w for a linebreak at the end to create a <br>. The reason for this requirement is that you don't to mess with lists: (But see the edit below; I'm not even sure this makes sense)
* we don't want a <br>
* between these two list items
However, the logic breaks in these cases:
[some](http://google.com)
[links](http://google.com)
*this line is in italics*
another line
> the start of a blockquote!
another line
I.e., in all of these cases there should be a <br> at the end of the first line, and yet GFM doesn't add one
Oddly, this works correctly in the javascript version of GFM.
Does anyone have a working implementation of "new lines to <br>s" in Ruby?
Edit: It gets even more confusing!
If you check out Github's official Github Flavored Markdown repository, you'll find yet another newline to <br> regex!:
# in very clear cases, let newlines become <br /> tags
text.gsub!(/(\A|^$\n)(^\w[^\n]*\n)(^\w[^\n]*$)+/m) do |x|
x.gsub(/^(.+)$/, "\\1 ")
end
I have no clue what this regex means, but it doesn't do any better on the above test cases.
Also, it doesn't look like the "don't mess with lists" justification for requiring that lines start with word characters is valid to begin with. I.e., standard markdown list semantics don't change regardless of whether you add 2 trailing spaces. Here:
item 1
item 2
item 3
In the source of this question there are 2 trailing spaces after "item 1", and yet if you look at the HTML, there is no superfluous <br>
This leads me to think the best regex for converting newlines to <br>s is just:
text.gsub!(/^[^\n]+\n+/) do |x|
x =~ /\n{2}/ ? x : (x.strip!; x << " \n")
end
Thoughts?
I'm not sure if this will help, but I just use simple_format()
from ActionView::Helpers::TextHelper
ActionView simple_format
my_text = "Here is some basic text...\n...with a line break."
simple_format(my_text)
output => "<p>Here is some basic text...\n<br />...with a line break.</p>"
Even if it doesn't meet your specs, looking at the simple_format() source code .gsub! methods might help you out writing your own version of required markdown.
A little too late, but perhaps useful for other people. I've gotten it to work (but not thoroughly tested) by preprocessing the text using regular expressions, like so. It's hideous as a result of the lack of zero-width lookbehinds, but oh well.
# Append two spaces to a simple line, if it ends in newline, to render the
# markdown properly. Note: do not do this for lists, instead insert two newlines. Also, leave double newlines
# alone.
text.gsub! /^ ([\*\+\-]\s+|\d+\s+)? (.+?) (\ \ )? \r?\n (\r?\n|[\*\+\-]\s+|\d+\s+)? /xi do
full, pre, line, spaces, post = $~.to_a
if post != "\n" && pre.blank? && post.blank? && spaces.blank?
"#{pre}#{line} \n#{post}"
elsif pre.present? || post.present?
"#{pre}#{line}\n\n#{post}"
else
full
end
end

Assistance with Some Interesting Syntax in Some Ruby Code I've Found

I'm currently reading Agile Web Development With Rails, 3rd edition. On page 672, I came across this method:
def capitalize_words(string)
string.gsub(/\b\w/) { $&.upcase }
end
What is the code in the block doing? I have never seen that syntax. Is it similar to the array.map(&:some_method) syntax?
It's Title Casing The Input. inside the block, $& is a built-in representing the current match (\b\w i.e. the first letter of each word) which is then uppercased.
You've touched on one of the few things I don't like about Ruby :)
The magic variable $& contains the matched string from the previous successful pattern match. So in this case, it'll be the first character of each word.
This is mentioned in the RDoc for String.gsub:
http://ruby-doc.org/core/classes/String.html#M000817
gsub replaces everything that matched in the regex with the result of the block. so yes, in this case you're matching the first letter of words, then replacing it with the upcased version.
as to the slightly bizarre syntax inside the block, this is equivalent (and perhaps easier to understand):
def capitalize_words(string)
string.gsub(/\b\w/) {|x| x.upcase}
end
or even slicker:
def capitalize_words(string)
string.gsub /\b\w/, &:upcase
end
as to the regex (courtesy the pickaxe book), \b matches a word boundary, and \w any 'word character' (alphanumerics and underscore). so \b\w matches the first character of the word.

Resources