What characters are allowed in a dynamic segment (param) in Rails? - ruby-on-rails

I am using Rails and have a user entered field that can become a param in the URL. I'd like to add a validation that stops the users from entering any fields that will cause routing errors, as currently if the user enters a value like that we get an error "No route matches [GET]..." So far I know periods and slashes are not allowed...
What regex should I use for my validation? Or what regex does Rails use by default for dynamic segments?

Since no one has actually answered the question, just suggested workarounds. (Which are probably better, if you are in the right circumstances to use them.) I experimented to find the characters that caused issues. I tested all punctuation available on a standard US keyboard. I also tested space and (horizontal) tab. I did not test any extended Unicode punctuation, nor control characters.
The characters I found to cause problems in Rails 3.2.9, using webrick and the composite_primary_keys gem are:
,/.%
To validate that a field contains none of these characters:
validates :field_name, :format => { :with => /\A[^,\/\.%]*\z/,
:message => "commas, slashes, periods, and percent signs (,/.%) are not allowed"}
Many of the other characters I tried are not valid directly in URLs, but Rails automatically URL encodes them so they do not cause an issue.
As mentioned in the comments on the original question, some of these characters can be enabled by configuring Rails other than the defaults, but in doing so you will disable other features of Rails. To enable them you need to add :constraints or :id settings in your route definition.
I have not completely tested enabling all these characters, but I believe the consequences are:
Ch Consequence of enabling use
-- ---------------------------------------
, Must not use gem composite_primary_keys
/ Limits ability to route to child items
. Disables automatic format handling
% Not sure this can be enabled

Maybe you can let the user insert whatever, than use to_params + parameterize to write the url, and if you want some regex, take a look at the parameterize source code.
Example of to_params, the documentation and source code see:
http://apidock.com/rails/ActiveSupport/Inflector/parameterize
Hope it helps!

From rails code in action_pack action_dispath/journey/path/pattern.rb
#separator_re = "([^#{separator}]+)" # where separator comes from #separators = "/.?"
So the default regular expression used to match a dynamic segment seems to be:
/([^\/\.\?])/

Related

Regex to normalize topic links in Discourse forum

I am using Discourse forum software. As in its current state, Discourse presents links to topic in two ways, with and without a post number at the end.
Example:
forum.domain.com/t/some-topic/23
forum.domain.com/t/some-topic/23/5
The first one is what I want and the second one I want to not be displayed in the forum at all.
I've written a post about it on Discourse forum but didn't receive an answer what Regex to put in the permalink normalization input field in the admin section.
I was told that there is an option to do it using permalink normalization like so (It's an example shown in the admin under the Regex input text, I didn't write it):
permalink normalizations
Apply the following regex before matching permalinks,
for example: /(topic.)\?./\1 will strip query strings from topic routes.
Format is regex+string use \1 etc. to access captures
I don't know what Regex I should use in order to remove the numerical value of the post number from links. I need it only for topic links.
This is the routes.rb routing library and this is the permalink.rb library (I think that the permalink library should help get a better clue how to achieve this). I have no idea how to approach this, because it seems that I need some knowledge of the Discourse routing to make it work. For example, I don't understand why (topic.) is part of the regex, what does it mean, so their example doesn't help me to find a solution.
In the admin I have an input field in which I nee to put the normalization regex code.
I need help with the Regex. I need the regex to work with all topics.
Things I've tried that didn't work out:
/(\/\d+)\/\d+$/\1
/(t/[^/]+/\d+).*/\1
/(\/\d+)\/[0-9]+$/\1
/(\/\d+)\/[0-9]+/\1
/(\/\d+)\/\d+$/\1/
/(forum.domain.com(\/\w+)*\/\d+)\/\d+(?=\s|$)/\1
Note: The Permalink Normalization input field treats the character | as a separator to separate between several Regex expressions.
I think this may be the expression you are looking for to put inside de settings field:
/(t\/.*\/\d+)(\/\d+)/\1
You can see it working on Rubular.
However, the code that generates the url is not using the normalization code, so the expression is being ignored.
You could try normalizing the permalink there:
def last_post_url
url = "#{Discourse.base_uri}/t/#{slug}/#{id}/#{posts_count}"
url = Permalink.normalize_url url
url
end
I didn't truly understand your question, but if I got it right, you are saying that you want links with /some-number at the end but don't what links with /some-number/some-number at the end. If that is the case, the regex is:
forum\.domain\.com\/t\/[^0-9\/]+\/\d{1,9}$
You can replace 'forum' with your forum name and 'domain' with your domain name.
This will remove trailing "/<digits>" after another "/<digits>":
/(forum.domain.com(\/\w+)*\/\d+)\/\d+(?=\s|$)/\1

Rails, Radiant, and Regex

I'm working on a Rails site that uses the Radiant CMS and am building stateful navigation as per the first method in this link.
I'm matching the URL on regular expressions to determine whether or not to show the active state of each navigation link. As an example, here are two sample navigation elements, one for the Radiant URL /communications/ and one for /communications/press_releases/:
<r:if_url matches="/communications\/$/"><li class="bottom-border selected">Communications</li></r:if_url>
<r:unless_url matches="/communications\/$/"><li class="bottom-border">Communications</li></r:unless_url>
<r:if_url matches="/communications\/press_releases/"><li class="bottom-border selected">Press Releases</li></r:if_url>
<r:unless_url matches="/communications\/press_releases/"><li class="bottom-border">Press Releases</li></r:unless_url>
Everything's working fine for the Press Releases page--that is, when the URL is /communications/press_releases the Press Releases nav item gets the 'selected' class appropriately, and the Communications nav item is unselected. However, the Communications regular expression doesn't seem to be functioning correctly, as when the URL is /communications/ neither element has the 'selected' class (so the regex must be failing to match). However, I've tested
>> "/communications/".match(/communications\/$/)
=> #<MatchData:0x333a4>
in IRB, and as you can see, the regular expression seems to be working fine. What might be causing this?
TL;DR: "/communications/" matches /communications\/$/ in the Ruby shell but not in the context of the Radiant navigation. What's going on here?
From Radiant's wiki, it looks like you don't need to add /s around your regexs or escape /s. Try:
<r:if_url matches="/communications/$"><li class="bottom-border selected">Communications</li></r:if_url>
<r:unless_url matches="/communications/$"><li class="bottom-border">Communications</li></r:unless_url>
<r:if_url matches="/communications/press_releases/"><li class="bottom-border selected">Press Releases</li></r:if_url>
<r:unless_url matches="/communications/press_releases/"><li class="bottom-border">Press Releases</li></r:unless_url>
What is happening behind the scenes is that Radiant calls Regex.new on the string in matches, so the regex you were trying to match before was this one:
Regexp.new '/communications\/$/'
# => /\/communications\/$\//
which translates to 'slash communications slash end-of-line slash' which I really doubt is what you want.
Ruby Regexs are interesting in that there are symbols for both start(^) and end of line($) as well as start(\A) and end of string(\Z). That's why sometimes you will see people using \A and \Z in their regexes.

Rails i18n strings auto-lowercased?

I've noticed that in my new Rails 3.0 application all German i18n strings are converted to lowercase (except for the first letter).
When having a string like this:
de:
email: "E-Mail"
the output is always like "E-mail". Same story with all the other strings - uppercase letters within a sentence are auto-converted to lowercase.
Is this default behaviour that I have to disable, or is there any other problem? I have successfully set the locale correctly, as these strings to actually work.
Thanks for your help
Arne
There should be no modifications to the content you specify as part of the internationalization process. It sounds like something is calling humanize on the string before it is output. Some of the standard Rails form helper methods do this I believe. If you just output the translation using t('email') you should see 'E-Mail' correctly.
Update:
From your comments it seems like it is a label that is causing the problem. If you explicitly specify the text for the label rather than relying on the default behaviour you will get the translation exactly as you specify. So,
<%= f.label(:email, t('email')) %>
should generate the correct label from the translations.
However, it isn't ideal. I think you may also run into problems with the generated validation error messages.
Got the same issue. solved it by adding the _html suffix to the I18n translation key. it seems that using this suffix suppresses the humanize usage.

Extracting email addresses in an html block in ruby/rails

I am creating a parser that wards off against spamming and harvesting of emails from a block of text that comes from tinyMCE (so it may or may not have html tags in it)
I've tried regexes and so far this has been successful:
/\b[A-Z0-9._%+-]+#[A-Z0-9.-]+\.[A-Z]{2,4}\b/i
problem is, i need to ignore all email addresses with mailto hrefs. for example:
test#mail.com
should only return the second email add.
To get a background of what im doing, im reversing the email addresses in a block so the above example would look like this:
moc.liam#tset
problem with my current regex is that it also replaces the one in href. Is there a way for me to do this with a single regex? Or do i have to check for one then the other? Is there a way for me to do this just by using gsub or do I have to use some nokogiri/hpricot magicks and whatnot to parse the mailtos? Thanks in advance!
Here were my references btw:
so.com/questions/504860/extract-email-addresses-from-a-block-of-text
so.com/questions/1376149/regexp-for-extracting-a-mailto-address
im also testing using this:
http://rubular.com/
edit
here's my current helper code:
def email_obfuscator(text)
text.gsub(/\b[A-Z0-9._%+-]+#[A-Z0-9.-]+\.[A-Z]{2,4}\b/i) { |m|
m = "<span class='anti-spam'>#{m.reverse}</span>"
}
end
which results in this:
<a target="_self" href="mailto:<span class='anti-spam'>moc.liamg#tset</span>"><span class="anti-spam">moc.liamg#tset</span></a>
Another option if lookbehind doesn't work:
/\b(mailto:)?([A-Z0-9._%+-]+#[A-Z0-9.-]+\.[A-Z]{2,4})\b/i
This would match all emails, then you can manually check if first captured group is "mailto:" then skip this match.
Would this work?
/\b(?<!mailto:)[A-Z0-9._%+-]+#[A-Z0-9.-]+\.[A-Z]{2,4}\b/i
The (?<!mailto:) is a negative lookbehind, which will ignore any matches starting with mailto:
I don't have Ruby set up at work, unfortunately, but it worked with PHP when I tested it...
Why not just store all the matched emails in an array and remove any duplicates? You can do this easily with the ruby standard library and (I imagine) it's probably quicker/more maintainable than adding more complexity to your regex.
emails = ["email_one#example.com", "email_one#example.com", "email_two#example.com"]
emails.uniq # => ["email_one#example.com", "email_two#example.com"]

what if html_escape would stop escaping '&'?

is there any danger if the rails html_escape function would stop escaping '&'? I tested a few cases and it doesn't seem to create any problems. Can you give me a contrary an example? Thanks.
If you put an unescaped "&" into an HTML attribute, it would make your page invalid. For example:
Link
The page is now invalid as the & indicates an entity. This is true for any usage of an & on a page (for example, view source and hopefully you'll notice that Stack Overflow escapes the & signs in this post!)
The following would make the above example valid:
Link
Additional Note
& characters do need to be escaped in URLs if you want to validate your markup against the W3C validator. Example:
Line 9, Column 38: & did not start a character reference.
(& probably should have been escaped as &.)
Example
change an url with adding some argument

Resources