what if html_escape would stop escaping '&'? - ruby-on-rails

is there any danger if the rails html_escape function would stop escaping '&'? I tested a few cases and it doesn't seem to create any problems. Can you give me a contrary an example? Thanks.

If you put an unescaped "&" into an HTML attribute, it would make your page invalid. For example:
Link
The page is now invalid as the & indicates an entity. This is true for any usage of an & on a page (for example, view source and hopefully you'll notice that Stack Overflow escapes the & signs in this post!)
The following would make the above example valid:
Link
Additional Note
& characters do need to be escaped in URLs if you want to validate your markup against the W3C validator. Example:
Line 9, Column 38: & did not start a character reference.
(& probably should have been escaped as &.)
Example

change an url with adding some argument

Related

Difference between these three URL's?

Can somebody explain how does it matter to pass different parameters in a url,
e-g
1: www.domain.com/folder1/folder2/file.html?param=9?val=ty5?test
2: www.domain.com/folder1/folder2/file.html#param=93#val=t5y5?test=9
3: www.domain.com/folder1/folder2/file.html&param=9?val=ty5&test=90#poiu
Basically I want to know what do these three characters (#, &, ?) do in the url. I have seen them most of the times? can I use some thing other than that
e-g: www.domain.com/folder1/folder2/file.html*param=9_val+ty5#test
? indicates the start of the query string
& separates key value pairs of the querystring
# indicates an anchor. Here's more on anchor links.
Note that all three of your urls are incorrect.
Valid url:
http://domain/path/file?name=value&name=value#anc
I notice you've edited your question with an additional question
can I use some thing other than that e-g:
www.domain.com/folder1/folder2/file.html*param=9_val+ty5#test
You can use whatever you like in the part of the querystring or anchor as long as it is url encoded.
This Wikipedia article goes in to the detail and gives some good examples.
A ? indicates the start of the query
A & separates the parameters in the query
A # identifies a fragment in the HTML resource to be rendered. It's often used to identify which but if the page the browser should ensure is in view eg a heading etc
? represents that the URL contains QueryString values.
& is used to for multiple querystring values. Example
www.abc.com/page?id=abc&pwd=def
and # is new for me I first time saw it.
1: ? is used for separating back end code to its arguments. Notice the file extension is html doesn't necessarily says that the back end code is in HTML
2: # is used to link to anchors within the html page
3: & is used for separating arguments with other arguments. in this case, the file.html is also an argument itself, while the backend code is the "/", which can be anything. e.g. index.php, default.asp, index.do. it all depends on your URL rewrite.

Why do ampersands (&) need to be encoded in JSF? Is there a way around this?

I often have Javascript on my JSF XHTML pages that has && in it, which I end up having to encode as &&
For example, when I place the following in my JSF XHTML page file:
I am an & sign
I get the error:
The entity name must immediately follow the '&' in the entity reference
One wayto fix this appears to be to change the '&' to & which I find undesirable to just writing '&'.
It also appears that for cases where I use the '&' in Javascript, I can wrap the Javascript in CDATA tags; when wrapped in CDATA tags, I can then write '&' without having to escape it as &, which is a good workaround to be able to have more readable Javascript code on my page.
But what happens when I want to use the literal '&' elsewhere on the page when it is not within <script> tags and therefore cannot as easily wrap the code in CDATA tags? Must I always escape '&' as & for these cases?
Note trying to use 's ability to escape values and do not seem to be able to fix the issue
Facelets is a XML based view technology. Any characters which have special treatment by the XML parser needs to be XML-escaped when the intent is to present them literally. That covers among others < and &. The < indicates the start of a XML tag like so <foo> and the & indicates the start of a XML entity like so &. The < must be escaped as < and the & as &.
Not escaping them in Facelets would result in the following exception for <
javax.faces.view.facelets.FaceletException: Error Parsing /test.xhtml: Error Traced[line: 42] The content of elements must consist of well-formed character data or markup.
and the following one for &
javax.faces.view.facelets.FaceletException: Error Parsing /test.xhtml: Error Traced[line: 42] The entity name must immediately follow the '&' in the entity reference.
This is not specifically related to JavaScript, this applies to the entire view, including "plain text". Those characters just happen to be JavaScript operators as well. There's no way to go around this, that's just how XML is specified. In JavaScript, there's however one more way to avoid escaping or using CDATA blocks: just put that JS code in its own .js file which you load by <script> or <h:outputScript>.
In EL, there is also the && operator which also needs to be escaped as && as well, but fortunately there's an alias for this operator, the and operator.
See also:
Mozilla Developer Network - Writing JavaScript for XHTML
It's because & is special characters in XML : http://en.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_references
FYI, I tried to write the (c) character in my JSF page.
An error was raised when I wrote &copy : "copy is referenced but not declared"
When I wrote © I got the raw string back.
I could display the special character using the unicode notation : ©
This code worked for me :
<h:outputText value="&copy;" escape="false" />

regular expression for emails NOT ending with replace script

I'm currently modifying my regex for this:
Extracting email addresses in an html block in ruby/rails
basically, im making another obfuscator that uses ROT13 by parsing a block of text for all links that contain a mailto referrer(using hpricot). One use case this doesn't catch is that if the user just typed in an email address(without turning it into a link via tinymce)
So here's the basic flow of my method:
1. parse a block of text for all tags with href="mailto:..."
2. replace each tag with a javascript function that changes this into ROT13 (using this script: http://unixmonkey.net/?p=20)
3. once all links are obfuscated, pass the resulting block of text into another function that parses for all emails(this one has an email regex that reverses the email address and then adds a span to that email - to reverse it back)
step 3 is supposed to clean the block of text for remaining emails that AREN'T in a href tags(meaning it wasn't parsed by hpricot). Problem with this is that the emails that were converted to ROT13 are still found by my regex. What i want to catch are just emails that WEREN'T CONVERTED to ROT13.
How do i do this? well all emails the WERE CONVERTED have a trailing "'.replace" in them. meaning, i need to get all emails WITHOUT that string. so far i have this regex:
/\b([A-Z0-9._%+-]+#[A-Z0-9.-]+.[A-Z]{2,4}('.replace))\b/i
but this gets all the emails with the trailing '.replace i want to get the opposite and I'm currently stumped with this. any help from regex gurus out there?
MORE INFO:
Here's the regex + the block of text im parsing:
http://www.rubular.com/r/NqXIHrNqjI
as you can see, the first two 'email addresses' are already obfuscated using ROT13. I need a regex that gets the emails ohhellzyeah#ribute.com and kaboom#yahoo.com
On negative lookaheads
You can use a negative lookahead to assert that a pattern doesn't match.
For example, the following regex matches all strings that doesn't end with ".replace" string:
^(?!.*\.replace$).*$
As another example, this regex matches all a*b*, except aabb:
^(?!aabb$)a*b*$
Ideally,
See also
regular-expressions.info/Lookaheads and anchors
Flavor comparison - unfortunately, Ruby doesn't support lookbehinds
Specific solution
The following regex works in this scenario: (see on rubular.com):
/\b([A-Z0-9._%+-]+#(?![A-Z0-9.-]*'\.replace\b)[A-Z0-9.-]+\.[A-Z]{2,4})\b/i

Why is this query string invalid?

In my asp.net mvc page I create a link that renders as followed:
http://localhost:3035/Formula/OverView?colorId=349405&paintCode=744&name=BRILLANT%20SILVER&formulaId=570230
According to the W3C validator, this is not correct and it errors after the first ampersand. It complains about the & not being encoded and the entity &p not recognised etc.
AFAIK the & shouldn't be encoded because it is a separator for the key value pair.
For those who care: I send these pars as querystring and not as "/" seperated values because there is no decent way of passing on optional parameters that I know of.
To put all the bits together:
an anchor (<a>) tag's href attribute needs an encoded value
& encodes to &
to encode an '&' when it is part of your parameter's value, use %26
Wouldn't encoding the ampersand into & make it part of my parameter's value?
I need it to seperate the second variable from the first
Indeed, by encoding my href value, I do get rid of the errors. What I'm wondering now however is what to do if for example my colorId would be "123&456", where the ampersand is part of the value.
Since the separator has to be encoded, what to do with encoded ampersands. Do they need to be encoded twice so to speak?
So to get the url:
www.mySite.com/search?query=123&456&page=1
What should my href value be?
Also, I think I'm about the first person in the world to care about this.. go check the www and count the pages that get their query string validated in the W3C validator..
Entities which are part of the attributes should be encoded, generally. Thus you need & instead of just &
It works even if it doesn't validate because most browsers are very, very, very lenient in what to accept.
In addition, if you are outputting XHTML you have to encode every entity everywhere, not just inside the attributes.
All HTML attributes need to use character entities. You only don't need to change & into & within script blocks.
Whatever
Anywhere in an HTML document that you want an & to display directly next to something other than whitespace, you need to use the character entity &. If it is part of an attribute, the & will work as though it was an &. If the document is XHTML, you need to use character entities everywhere, even if you don't have something immediately next to the &. You can also use other character entities as part of attributes to treat them as though they were the actual characters.
If you want to use an ampersand as part of a URL in a way other than as a separator for parameters, you should use %26.
As an example...
Hello
Would send the user to http://localhost/Hello, with name=Bob and text=you & me "forever".
This is a slightly confusing concept to some people, I've found. When you put & in a HTML page, such as in <a href="abc?def=5&ghi=10">, the URL is actually abc?def=5&ghi=10. The HTML parser converts the entity to an ampersand.
Think of exactly the same as how you need to escape quotes in a string:
// though you define your string like this:
myString = "this is \"something\" you know?"
// the string is ACTUALLY: this is "something" you know?
// when you look at the HTML, you see:
<a href="foo?bar=1&baz=2">
// but the url is ACTUALLY: foo?bar=1&bar=2

Encoding of XHTML and & (ampersand)

My website is XHTML Transitional compliant except for one thing: the & (ampersand) in the URL are written as it is, instead of &
That is, all the URLs in my pages are usually like this:
Foo
But XHTML validator generates this error:
cannot generate system identifier for general entity "y"
... and it wants the URL to be written like this:
Foo
The problem is that Internet Explorer and Firefox don't handle the URL correctly and ignore the y parameter. How can I make this link work and validate correctly?
It seems to me that it is impossible to write XHTML pages if the browsers don't work with strict encoded XHTML URLs.
Do you want to see in action? See the difference between these two links (copy and paste them as they are):
http://stackoverflow.com/search?q=ff&sort=newest
and
http://stackoverflow.com/search?q=ff&sort=newest
I have just tried this. What you attempted to do is correct. In HTML if you are writing a link the & characters should be encoded as & You would only encode the & as %26 if you wanted a parameter value to contain an ampersand. I just wrote a simple HTML page that contained a link: Click me
and it worked fine: default2.aspx received the parameters intended and the source passed validation.
The encoding of & as & is required in HTML, not in the link. When the browser sees the & in the HTML source for a link it will interpret it as an ampersand and the link target will be as intended. If you paste a URL into your browser address bar it does not expect it to be HTML and does not try to interpret any HTML encoding that it may contain. This is why your example links that you suggest we should copy/paste into a browser don't work and why we wouldn't expect them to work.
If you post a bit more of your actual code we might be able to see what you have done wrong, but you appear to be heading the right direction by using & in your anchor tags.
It was my fault: the hyperlink control already encoded &, so my URL http://foo?x=1&y=2 was encoded to http://foo?x=1&amp;y=2
Normally the &amp inside the URL is correctly handled by browsers, as you stated.
You could use & instead of & in your URL within your page.
That should allow it to be validated as strict XHTML...
Foo
Note, if used by an ASP.NET Request.QueryString function, the query string doesn't use XML encoding; it uses URL encoding:
/mypath/mypage?b=%26stuff
So you need to provide a function translating '&' into %26.
Note: in that case, Server.URLEncode(”neetu & geetu”), which would produce neetu+%26+geetu, is not what you want, since you need to translate & into %26, not just '&'. You must add a replace() call applied to URLEncode result, in order to replace '%26amp;' by '%26'.
To be even more thorough: use &, a numeric character reference.
Because & is a character entity reference:
Character entity references are defined in the markup language
definition. This means, for example, that for HTML only a specific
range of characters (defined by the HTML specification) can be
represented as character entity references (and that includes only a
small subset of the Unicode range).
That's coming from the wise people at W3C (read this for more).
Of course, this is not a very big deal, but the suggestion of W3C is that the numeric one will be valid and useable everywhere and always, while the named one is 'fine' for HTML but nothing more.
The problem is worse than you think - try it in Safari. &amp; gets converted to &#38; and the hash ends the URL.
The correct answer is to not output XHTML - there's no reason that justifies spending more time on development and alienating Mac users.

Resources