In Reflected XSS, why do we need to sanitize single quote, double quote, ampersand, and backslash - asp.net-mvc

Based on this article
https://resources.infosecinstitute.com/topic/how-to-prevent-cross-site-scripting-attacks/
Reflected XXS happens when data injected is reflected in the response. I get the idea that if I, for example, have a search box in my page and the search term inputted by a user is displayed in the page, someone could write as a search term:
<script>alert('x');</script>
and that would be read as regular HTML element in the page that displays the response.
But lets say greater than and less than are already blocked in input (meaning they wouldn't be able to put in script tags or any tag), what's the issue if I allow single quote, double quote, ampersand, and backslash reflected in the response. I'm trying to make sense of it but I am not sure if I am understanding correctly.

Today the web stack is big and complex with many languages. We have HTML, CSS, JavaScript, VB-Script, SVG, URLs…
Each with its own rules for:
Encoding
Quoting
Commenting
Escaping
Also, each one can be nested inside each other:
And just replacing <> fixes some issues, but not all of them as you don't know where you data will end up, is it in HTML? as a HTML Attribute? inside a JavaScript string? Each one needs different encoding to become safe.
So, the world is a bit more complicated.....

Related

How to get http tag text by id using lua

There is a webpage parser, which takes a page contains several tags, in a certain structure, where divs are badly nested. I need to extract a certain div element, and copy it and all its content to a new html file.
Since I am new to lua, I may need basic clarification for things might seem simple.
Thanks,
The ease of extraction of data is going to largely depend on the page itself. If the page uses the exact same tag information throughout its entirety, it'll be much more difficult to extract than it would if it has named tags.
If you're able to find a version of the page that returns json format, then you're that much better off. Here's a snippet of code on something I wrote to grab definitions from a webpage that did not have json format:
local actualword, definition = string.match(wayup,"<html.-<td class='word'>%c(.-)%c</td>.-<div class=\"definition\">(.-)</div>")
Essentially, this code searched down the page until it found the class "word", and took the word after it (%c is the pattern for control characters). It continued on to "definition" and captured that, as well.
As you can see, it's a bit convoluted, but I had the luck of having specifically named tags for what I wanted.
This is edited to fit your comment. As a side note that I should have mentioned before, if you're familiar with regular expressions, you can use its model to capture what you need. In this case, it's capturing the string in its totality:
local data = string.match(page, "(<div id=\"aa\"><div>.-</div>.-</div>)")
It's rarely the fault of the language, but rather the webpage itself, that makes it hard to data mine anything. Since webpages could literally have hundreds of lines of code, it's hard to pinpoint exactly what you want without coming across garbage information. It's why I prefer a simplified result such as json, since Lua has a json module that can encode/decode and you can get your precise information.

Any problems with using a period in URLs to delimiter data?

I have some easy to read URLs for finding data that belongs to a collection of record IDs that are using a comma as a delimiter.
Example:
http://www.example.com/find:1%2C2%2C3%2C4%2C5
I want to know if I change the delimiter from a comma to a period. Since periods are not a special character in a URL. That means it won't have to be encoded.
Example:
http://www.example.com/find:1.2.3.4.5
Are there any browsers (Firefox, Chrome, IE, etc) that will have a problem with that URL?
There are some related questions here on SO, but none that specific say it's a good or bad practice.
To me, that looks like a resource with an odd query string format.
If I understand correctly this would be equal to something like:
http://www.example.com/find?id=1&id=2&id=3&id=4&id=5
Since your filter is acting like a multi-select (IDs instead of search fields), that would be my guess at a standard equivalent.
Browsers should not have any issues with it, as long as the application's route mechanism handles it properly. And as long as you are not building that query-like thing with an HTML form (in which case you would need JS or some rewrites, ew!).
May I ask why not use a more standard URL and querystring? Perhaps something that includes element class (/reports/search?name=...), just to know what is being queried by find. Just curious, I knows sometimes standards don't apply.

Grails: User inputs formatted string, but formatting not preserved

I am just starting a very basic program in Grails (never used it before, but it seems to be very useful).
What I have so far is:
in X.groovy,
a String named parameters, with constraint of maximum length 50000 and a couple other strings and dates, etc.
in XController.groovy,
static scaffold = X;
It displays the scaffold UI (very handy!), and I can add parameter strings and the other objects associated with it.
My problem is that the parameters string is a long string with formatting that is pasted in by the user. When it is displayed on the browser, however, it does not retain any carriage returns.
What is the best way to go about this? I'm a very beginner at Grails and still have lots and lots of learning to do on this account. Thanks.
The problem is that the string is being displayed using HTML which doesn't parse \n into a new line by default. You need to wrap the text in <pre> (see: http://www.w3schools.com/tags/tag_pre.asp) or replace the \n with <br/> tags to display it correctly to the user.

#! as opposed to just # in a permalink

I'm designing a permalink system and I just noticed that Twitter and Hipmunk both prefix their permalinks with #!. I was wondering why this is, and if the exclamation point in particular is there for a reason. Wouldn't #/ work just as well, since they're no doubt using a framework that lets them redirect queries to certain templates with a regex URL parser?
http://www.hipmunk.com/#!BOS.SEA,Dec15.Jan02
http://twitter.com/#!/dozba
My only guess is it's because browsers use # to link to an anchor element. Is this why the exclamation point is appended?
This is done to make an "AJAX" page crawlable [by google] for indexing -- It does not affect the other well-defined semantics of the fragment identifier at all!
See Making AJAX Applications Crawlable: Getting Started
Briefly, the solution works as follows: the crawler finds a pretty AJAX URL (that is, a URL containing a #! hash fragment). It then requests the content for this URL from your server in a slightly modified form. Your web server returns the content in the form of an HTML snapshot, which is then processed by the crawler. The search results will show the original URL.
I am sure other search-engines are also following this lead/protocol.
Happy coding.
Also, It is actually perfectly valid, at least per HTML5, to have an element with an ID of "!foo" so the
reasoning in the post is invalid. See the article "The id attribute just got more classy":
HTML5 gets rid of the additional restrictions on the id attribute. The only requirements left — apart from being unique in the document — are that the value must contain at least one character (can’t be empty), and that it can’t contain any space characters.
My guess is that both pages use this in their JavaScript to differ between # (a link to an anchor) and their custom #! which loads some additional content using Ajax.
In that case pretty much everything else would work after the # sign.

Encoding of XHTML and & (ampersand)

My website is XHTML Transitional compliant except for one thing: the & (ampersand) in the URL are written as it is, instead of &
That is, all the URLs in my pages are usually like this:
Foo
But XHTML validator generates this error:
cannot generate system identifier for general entity "y"
... and it wants the URL to be written like this:
Foo
The problem is that Internet Explorer and Firefox don't handle the URL correctly and ignore the y parameter. How can I make this link work and validate correctly?
It seems to me that it is impossible to write XHTML pages if the browsers don't work with strict encoded XHTML URLs.
Do you want to see in action? See the difference between these two links (copy and paste them as they are):
http://stackoverflow.com/search?q=ff&sort=newest
and
http://stackoverflow.com/search?q=ff&sort=newest
I have just tried this. What you attempted to do is correct. In HTML if you are writing a link the & characters should be encoded as & You would only encode the & as %26 if you wanted a parameter value to contain an ampersand. I just wrote a simple HTML page that contained a link: Click me
and it worked fine: default2.aspx received the parameters intended and the source passed validation.
The encoding of & as & is required in HTML, not in the link. When the browser sees the & in the HTML source for a link it will interpret it as an ampersand and the link target will be as intended. If you paste a URL into your browser address bar it does not expect it to be HTML and does not try to interpret any HTML encoding that it may contain. This is why your example links that you suggest we should copy/paste into a browser don't work and why we wouldn't expect them to work.
If you post a bit more of your actual code we might be able to see what you have done wrong, but you appear to be heading the right direction by using & in your anchor tags.
It was my fault: the hyperlink control already encoded &, so my URL http://foo?x=1&y=2 was encoded to http://foo?x=1&amp;y=2
Normally the &amp inside the URL is correctly handled by browsers, as you stated.
You could use & instead of & in your URL within your page.
That should allow it to be validated as strict XHTML...
Foo
Note, if used by an ASP.NET Request.QueryString function, the query string doesn't use XML encoding; it uses URL encoding:
/mypath/mypage?b=%26stuff
So you need to provide a function translating '&' into %26.
Note: in that case, Server.URLEncode(”neetu & geetu”), which would produce neetu+%26+geetu, is not what you want, since you need to translate & into %26, not just '&'. You must add a replace() call applied to URLEncode result, in order to replace '%26amp;' by '%26'.
To be even more thorough: use &, a numeric character reference.
Because & is a character entity reference:
Character entity references are defined in the markup language
definition. This means, for example, that for HTML only a specific
range of characters (defined by the HTML specification) can be
represented as character entity references (and that includes only a
small subset of the Unicode range).
That's coming from the wise people at W3C (read this for more).
Of course, this is not a very big deal, but the suggestion of W3C is that the numeric one will be valid and useable everywhere and always, while the named one is 'fine' for HTML but nothing more.
The problem is worse than you think - try it in Safari. &amp; gets converted to &#38; and the hash ends the URL.
The correct answer is to not output XHTML - there's no reason that justifies spending more time on development and alienating Mac users.

Resources