Ruby, gsub and regex - ruby-on-rails

Quick background: I have a string which contains references to other pages. The pages are linked to using the format: "#12". A hash followed by the ID of the page.
Say I have the following string:
str = 'This string links to the pages #12 and #125'
I already know the IDs of the pages that need linking:
page_ids = str.scan(/#(\d*)/).flatten
=> [12, 125]
How can I loop through the page ids and link the #12 and #125 to their respective pages? The problem I've run into is if I do the following (in rails):
page_ids.each do |id|
str = str.gsub(/##{id}/, link_to("##{id}", page_path(id))
end
This works fine for #12 but it links the "12" part of #125 to the page with ID of 12.
Any help would be awesome.

Instead of extracting the ids first and then replacing them, you can simply find and replace them in one go:
str = str.gsub(/#(\d*)/) { link_to("##{$1}", page_path($1)) }
Even if you can't leave out the extraction step because you need the ids somewhere else as well, this should be much faster, since it doesn't have to go through the entire string for each id.
PS: If str isn't referred to from anywhere else, you can use str.gsub! instead of str = str.gsub

if your indexes always end at word boundaries, you can match that:
page_ids.each do |id|
str = str.gsub(/##{id}\b/, link_to("##{id}", page_path(id))
end
you only need to add the word boundary symbol \b on the search pattern, it is not necessary for the replacement pattern.

Related

Check if String Contains an Emoji in Ruby

In ruby, here is how you can check for a substring in a string:
str = "hello world"
str.include?("lo")
=> true
When I am attempting to save an emoji in a text column in a rails application (the text column within a mysql database is utf8), it comes back with this error:
Incorrect string value: \xF0\x9F\x99\x82
For my situation in a rails application, it suffices to see if an emoji is present in the submitted text. If an emoji is present: raise a validation error. Example:
class MyModel < ApplicationRecord
validate :cannot_contain_emojis
private
def cannot_contain_emojis
if my_column.include?("/\xF0")
errors.add(:my_column, 'Cannot include emojis")
end
end
end
Note: The reason I am checking for \xF0 is because according to this site, it appears that all, or most, emoji's begin with this signature.
This however does not work. It continues to return false even when it is true. I'm pretty sure the issue is that my include statement doesn't work because the emoji is not converted to bytes for the comparison.
Question
How can I make a validation to check that an emoji is not passed in?
Example bytes for a smiley face in UTF8: \xF0\x9F\x99\x82
You can use the Emoji Unicode property to test for Emoji using a Regexp, something like this:
def cannot_contain_emojis
if /\p{Emoji}/ =~ my_column
errors.add(:my_column, 'Cannot include emojis')
end
end
Unicode® Technical Standard #51 "UNICODE EMOJI" contains a more sophisticated regex:
\p{RI} \p{RI}
| \p{Emoji}
( \p{EMod}
| \x{FE0F} \x{20E3}?
| [\x{E0020}-\x{E007E}]+ \x{E007F} )?
(\x{200D} \p{Emoji}
( \p{EMod}
| \x{FE0F} \x{20E3}?
| [\x{E0020}-\x{E007E}]+ \x{E007F} )?
)*
[Note: some of those properties are not implemented in Onigmo / Ruby.]
However, checking for Emojis probably not going to be enough. It is pretty clear that your text processing is somehow broken at some point. And if it is broken by an Emoji, then there is a chance it will also be broken by my name, or the name of Ruby's creator 松本 行弘, or by the completely normal English word “naïve”.
Instead of playing a game of whack-a-mole trying to detect every Emoji, mathematical symbol, Arabic letter, typographically correct punctuation mark, etc., it would be much better simply the fix the text processing.
I found Jörg's solution was only working when passing in the string itself and not a variable. Not sure why that is.
/\p{Emoji}/ =~ "🎃"
=> 0
value = "1f383"
=> "1f383"
/\p{Emoji}/ =~ value
=> 0
/\p{Emoji}/ =~ "hello"
=> nil
Regardless I'd recommend using the unicode-emoji gem, as its approach is comprehensive. Its source code and documentation can be found on GitHub.

ruby regex with quotes

I'm trying to pass more than one regex parameter for parts of a string that needs to be replaced. Here's the string:
str = "stands in hall "Let's go get to first period everyone" Students continue moving to seats."
Here is the expected string:
str = "stands in hall "Let's go get to first period everyone" Students continue moving to seats."
This is what I tried:
str.gsub(/'|"/, "'" => "\'", """ => "\"")
This is what I got:
"stands in hall \"Let's go get to first period everyone\" Students continue moving to seats."
How do I get the quotes in while sending in two regex parameters using gsub?
This is an HTML unescaping problem.
require 'cgi'
CGI.unescape_html(str)
This gives you the correct answer.
From my comments on this question:
Your updated version is correct. The only reason the slashes are in your final line of code is that it's an escape sequence so that you don't mistakenly think the first slash is used to terminate the string. Try assigning your output and printing it:
str1 = str.gsub(/'|"/, "'" => "\'", """ => "\"")
puts str1
and you'll see that the slashes are gone when str1 is printed using puts.
The difference is that autoevaluating variables within irb (which is what I assume you're doing to execute this sample code) automatically calls the inspect method, which for string variables shows the string in its entirety.
Because I did not understand unescaping characters I found an alternative solution that might be the "rails-way"
Can you use <%= raw 'some_html' %>
My final solution ended up being this instead of messy regex and requiring CGI
<%= raw evidence_score.description %>
Unescaping HTML string in Rails

How do I scan url for a specific string with spaces and special characters?

I'm using stringscanner on my request URL in order to get the name of the user's currently selected category, but I've been having difficulty dealing with spaces and special characters.
request.url.scan(/\?category=\w+/).to_s.gsub('?category=', '')
URL examples followed by result
http://localhost:3000/search?category=dog&search=&utf8=%E2%9C%93 => ["dog"]
http://localhost:3000/search?category=dog.com&search=&utf8=%E2%9C%93 => ["dog"]
http://localhost:3000/search?category=dog+cat&search=&utf8=%E2%9C%93 => ["dog"]
I'm trying to get ["dog"] ["dog.com"] and ["dog cat"], but am currently stuck. Any ideas?
Note: Considering removing spaces from categories and replacing them with dashes as multiple spaces could be problematic, but if it's possible to create one function to rule them all, that would be awesome.
This is Rails, is there a reason you're not just using params[:category]?
If you are trying to extract params then you could use parse_query :
uri = "http://localhost:3000/search?category=dog+cat&search=&utf8=%E2%9C%93"
result = Rack::Utils.parse_query(URI(uri).query) #=> {"category"=>"dog cat", "search"=>"", "utf8"=>"\xE2\x9C\x93"}
result["category"] #=> dog cat

Generate a link_to on the fly if a URL is found inside the contents of a db text field?

I have an automated report tool (corp intranet) where the admins have a few text area boxes to enter some text for different parts of the email body.
What I'd like to do is parse the contents of the text area and wrap any hyperlinks found with link tags (so when the report goes out there are links instead of text urls).
Is ther a simple way to do something like this without figuring out a way of parsing the text to add link tags around a found (['http:','https:','ftp:] TO the first SPACE after)?
Thank You!
Ruby 1.87, Rails 2.3.5
Make a helper :
def make_urls(text)
urls = %r{(?:https?|ftp|mailto)://\S+}i
html_text = text.gsub urls, '\0'
html_text
end
on the view just call this function , you will get the expected output.
like :
irb(main):001:0> string = 'here is a link: http://google.com'
=> "here is a link: http://google.com"
irb(main):002:0> urls = %r{(?:https?|ftp|mailto)://\S+}i
=> /(?:https?|ftp|mailto):\/\/\S+/i
irb(main):003:0> html = string.gsub urls, '\0'
=> "here is a link: http://google.com"
There are many ways to accomplish your goal. One way would be to use Regex. If you have never heard of regex, this wikipedia entry should bring you up to speed.
For example:
content_string = "Blah ablal blabla lbal blah blaha http://www.google.com/ adsf dasd dadf dfasdf dadf sdfasdf dadf dfaksjdf kjdfasdf http://www.apple.com/ blah blah blah."
content_string.split(/\s+/).find_all { |u| u =~ /^https?:/ }
Which will return: ["http://www.google.com/", "http://www.apple.com/"]
Now, for the second half of the problem, you will use the array returned above to subsititue the text links for hyperlinks.
links = ["http://www.google.com/", "http://www.apple.com/"]
links.each do |l|
content_string.gsub!(l, "<a href='#{l}'>#{l}</a>")
end
content_string will now be updated to contain HTML hyperlinks for all http/https URLs.
As I mentioned earlier, there are numerous ways to tackle this problem - to find the URLs you could also do something like:
require 'uri'
URI.extract(content_string, ['http', 'https'])
I hope this helps you.

Nokogiri/Ruby array question

I have a quick question. I am currently writing a Nokogiri/Ruby script and have the following code:
fullId = doc.xpath("/success/data/annotatorResultBean/annotations/annotationBean/concept/fullId")
fullId.each do |e|
e = e.to_s()
g.write(e + "\n")
end
This spits out the following text:
<fullId>D001792</fullId>
<fullId>D001792</fullId>
<fullId>D001792</fullId>
<fullId>D008715</fullId>
I wanted the just the numbers text in between the "< fullid>" saved, without the < fullId>,< /fullId> markup. What am I missing?
Bobby
I think you want to use the text() accessor (which returns the child text values), rather than to_s() (which serializes the entire node, as you see here).
I'm not sure what the g object you're calling write on is, but the following code should give you an array containing all of the text in the fullId nodes:
doc.xpath(your_xpath).map {|e| e.text}

Resources