Rails, Radiant, and Regex - ruby-on-rails

I'm working on a Rails site that uses the Radiant CMS and am building stateful navigation as per the first method in this link.
I'm matching the URL on regular expressions to determine whether or not to show the active state of each navigation link. As an example, here are two sample navigation elements, one for the Radiant URL /communications/ and one for /communications/press_releases/:
<r:if_url matches="/communications\/$/"><li class="bottom-border selected">Communications</li></r:if_url>
<r:unless_url matches="/communications\/$/"><li class="bottom-border">Communications</li></r:unless_url>
<r:if_url matches="/communications\/press_releases/"><li class="bottom-border selected">Press Releases</li></r:if_url>
<r:unless_url matches="/communications\/press_releases/"><li class="bottom-border">Press Releases</li></r:unless_url>
Everything's working fine for the Press Releases page--that is, when the URL is /communications/press_releases the Press Releases nav item gets the 'selected' class appropriately, and the Communications nav item is unselected. However, the Communications regular expression doesn't seem to be functioning correctly, as when the URL is /communications/ neither element has the 'selected' class (so the regex must be failing to match). However, I've tested
>> "/communications/".match(/communications\/$/)
=> #<MatchData:0x333a4>
in IRB, and as you can see, the regular expression seems to be working fine. What might be causing this?
TL;DR: "/communications/" matches /communications\/$/ in the Ruby shell but not in the context of the Radiant navigation. What's going on here?

From Radiant's wiki, it looks like you don't need to add /s around your regexs or escape /s. Try:
<r:if_url matches="/communications/$"><li class="bottom-border selected">Communications</li></r:if_url>
<r:unless_url matches="/communications/$"><li class="bottom-border">Communications</li></r:unless_url>
<r:if_url matches="/communications/press_releases/"><li class="bottom-border selected">Press Releases</li></r:if_url>
<r:unless_url matches="/communications/press_releases/"><li class="bottom-border">Press Releases</li></r:unless_url>
What is happening behind the scenes is that Radiant calls Regex.new on the string in matches, so the regex you were trying to match before was this one:
Regexp.new '/communications\/$/'
# => /\/communications\/$\//
which translates to 'slash communications slash end-of-line slash' which I really doubt is what you want.
Ruby Regexs are interesting in that there are symbols for both start(^) and end of line($) as well as start(\A) and end of string(\Z). That's why sometimes you will see people using \A and \Z in their regexes.

Related

Why are "tel:" links removed in sanitization, and how to allow them

I am using Rails sanitize helper to clean up input text from users, that may be formatted as markdown.
I noticed that the method strips down tel: links, and I wonder why, and how can I allow them.
>> sanitize("<a href='http://123'>click</a>")
=> "click"
>> sanitize("<a href='tel:123'>click</a>")
=> "<a>click</a>"
Of course, I have tried figuring it out from the page linked above, but was unable to. I would prefer to avoid writing a "scrubber" class, or any other class for that simple task.
I have also tried what I think means "allow all hrefs" but it did not have any effect (even after restarting the server).
# In config/application.rb
config.action_view.sanitized_allowed_attributes = ['href']
In Rails 4, Loofah is being used for sanitizing HTML. To know more please visit this link.
Rails team extracted this feature into separate gem.
If you check this line, Loofah::HTML5::WhiteList::ALLOWED_PROTOCOLS doesnt have tel in their list, thus it is being striped off from anchor tags.
Solution:
Create an initializer that would add tel to above set of protocols.
Loofah::HTML5::WhiteList::ALLOWED_PROTOCOLS.add('tel')
Restart app and this should work.

Rails I18n _html suffix rule and translate helper called from a controller

According to ruby docs, the translate (or t) helper delegates to I18n#translate but also performs several additional functions, among which is: "it’ll mark the translation as safe HTML if the key has the suffix _html".
I would expect that it should work equally in both views and in controllers, but in my experience it doesn't: t(:hello_html) does work as expected in views (marks the translation as html_safe), but it does not mark the result as safe html when invoked from a controller.
To reproduce the problem you could add hello_html: '<i>Hello world</i>' to your locales/en.yml and flash.now[:notice] = t(:hello_html) to any convenient action of any controller. For me that resulted in an escaped html markup in a flash messages area which was an unexpected outcome for me.
My questions are:
is there anyone else who experienced or is able to reproduce this problem?
what is it: a rails bug, a feature, or just my project's unique "oddity"?
is there any easy way to work this around?
(Tested in rails 3.2.11 and 3.2.13)
You are correct about this functionality not being available to controllers given that the overloaded .t method is defined in ActionView::Helpers::TranslationHelper. I think this is probably an oversight as opposed to an actual bug.
Off the top of my head there are 2 ways you can get around this in your project :
Call .html_safe in your controller (this worked for me in a quick test).
flash[:notice] = t(:hello_html).html_safe
Send the translation key as the flash message as opposed to the actual message :
Controller :
flash[:translate_notice] = :hello_html
View :
%div= t flash[:translate_notice]
Granted, the latter option might get a bit messy if you need to pass interpolations, YMMV.

What characters are allowed in a dynamic segment (param) in Rails?

I am using Rails and have a user entered field that can become a param in the URL. I'd like to add a validation that stops the users from entering any fields that will cause routing errors, as currently if the user enters a value like that we get an error "No route matches [GET]..." So far I know periods and slashes are not allowed...
What regex should I use for my validation? Or what regex does Rails use by default for dynamic segments?
Since no one has actually answered the question, just suggested workarounds. (Which are probably better, if you are in the right circumstances to use them.) I experimented to find the characters that caused issues. I tested all punctuation available on a standard US keyboard. I also tested space and (horizontal) tab. I did not test any extended Unicode punctuation, nor control characters.
The characters I found to cause problems in Rails 3.2.9, using webrick and the composite_primary_keys gem are:
,/.%
To validate that a field contains none of these characters:
validates :field_name, :format => { :with => /\A[^,\/\.%]*\z/,
:message => "commas, slashes, periods, and percent signs (,/.%) are not allowed"}
Many of the other characters I tried are not valid directly in URLs, but Rails automatically URL encodes them so they do not cause an issue.
As mentioned in the comments on the original question, some of these characters can be enabled by configuring Rails other than the defaults, but in doing so you will disable other features of Rails. To enable them you need to add :constraints or :id settings in your route definition.
I have not completely tested enabling all these characters, but I believe the consequences are:
Ch Consequence of enabling use
-- ---------------------------------------
, Must not use gem composite_primary_keys
/ Limits ability to route to child items
. Disables automatic format handling
% Not sure this can be enabled
Maybe you can let the user insert whatever, than use to_params + parameterize to write the url, and if you want some regex, take a look at the parameterize source code.
Example of to_params, the documentation and source code see:
http://apidock.com/rails/ActiveSupport/Inflector/parameterize
Hope it helps!
From rails code in action_pack action_dispath/journey/path/pattern.rb
#separator_re = "([^#{separator}]+)" # where separator comes from #separators = "/.?"
So the default regular expression used to match a dynamic segment seems to be:
/([^\/\.\?])/

#! as opposed to just # in a permalink

I'm designing a permalink system and I just noticed that Twitter and Hipmunk both prefix their permalinks with #!. I was wondering why this is, and if the exclamation point in particular is there for a reason. Wouldn't #/ work just as well, since they're no doubt using a framework that lets them redirect queries to certain templates with a regex URL parser?
http://www.hipmunk.com/#!BOS.SEA,Dec15.Jan02
http://twitter.com/#!/dozba
My only guess is it's because browsers use # to link to an anchor element. Is this why the exclamation point is appended?
This is done to make an "AJAX" page crawlable [by google] for indexing -- It does not affect the other well-defined semantics of the fragment identifier at all!
See Making AJAX Applications Crawlable: Getting Started
Briefly, the solution works as follows: the crawler finds a pretty AJAX URL (that is, a URL containing a #! hash fragment). It then requests the content for this URL from your server in a slightly modified form. Your web server returns the content in the form of an HTML snapshot, which is then processed by the crawler. The search results will show the original URL.
I am sure other search-engines are also following this lead/protocol.
Happy coding.
Also, It is actually perfectly valid, at least per HTML5, to have an element with an ID of "!foo" so the
reasoning in the post is invalid. See the article "The id attribute just got more classy":
HTML5 gets rid of the additional restrictions on the id attribute. The only requirements left — apart from being unique in the document — are that the value must contain at least one character (can’t be empty), and that it can’t contain any space characters.
My guess is that both pages use this in their JavaScript to differ between # (a link to an anchor) and their custom #! which loads some additional content using Ajax.
In that case pretty much everything else would work after the # sign.

Extracting email addresses in an html block in ruby/rails

I am creating a parser that wards off against spamming and harvesting of emails from a block of text that comes from tinyMCE (so it may or may not have html tags in it)
I've tried regexes and so far this has been successful:
/\b[A-Z0-9._%+-]+#[A-Z0-9.-]+\.[A-Z]{2,4}\b/i
problem is, i need to ignore all email addresses with mailto hrefs. for example:
test#mail.com
should only return the second email add.
To get a background of what im doing, im reversing the email addresses in a block so the above example would look like this:
moc.liam#tset
problem with my current regex is that it also replaces the one in href. Is there a way for me to do this with a single regex? Or do i have to check for one then the other? Is there a way for me to do this just by using gsub or do I have to use some nokogiri/hpricot magicks and whatnot to parse the mailtos? Thanks in advance!
Here were my references btw:
so.com/questions/504860/extract-email-addresses-from-a-block-of-text
so.com/questions/1376149/regexp-for-extracting-a-mailto-address
im also testing using this:
http://rubular.com/
edit
here's my current helper code:
def email_obfuscator(text)
text.gsub(/\b[A-Z0-9._%+-]+#[A-Z0-9.-]+\.[A-Z]{2,4}\b/i) { |m|
m = "<span class='anti-spam'>#{m.reverse}</span>"
}
end
which results in this:
<a target="_self" href="mailto:<span class='anti-spam'>moc.liamg#tset</span>"><span class="anti-spam">moc.liamg#tset</span></a>
Another option if lookbehind doesn't work:
/\b(mailto:)?([A-Z0-9._%+-]+#[A-Z0-9.-]+\.[A-Z]{2,4})\b/i
This would match all emails, then you can manually check if first captured group is "mailto:" then skip this match.
Would this work?
/\b(?<!mailto:)[A-Z0-9._%+-]+#[A-Z0-9.-]+\.[A-Z]{2,4}\b/i
The (?<!mailto:) is a negative lookbehind, which will ignore any matches starting with mailto:
I don't have Ruby set up at work, unfortunately, but it worked with PHP when I tested it...
Why not just store all the matched emails in an array and remove any duplicates? You can do this easily with the ruby standard library and (I imagine) it's probably quicker/more maintainable than adding more complexity to your regex.
emails = ["email_one#example.com", "email_one#example.com", "email_two#example.com"]
emails.uniq # => ["email_one#example.com", "email_two#example.com"]

Resources