We're using MVC .NET and the RouteCollection class to route URLs in our web app. This functions normally until we pass a URL containing the the text "PRN" anywhere inside the URL. When this happens, the routing will not occur and a 400 Page Not Found error is returned to the client. It's like something is throwing the error before the routing collection is even consulted, because the route the URL should take is never touched (by that I mean the underlying code's break-point is never hit, though the exact same URL without the string "PRN" will hit the break-point).
So I thought it might be a page validation issue, that maybe Microsoft decided to throw exceptions when the URL contains the phrase "PRN" because it is like "print" or "porn" but if that were the case then we'd see the "A potentially dangerous Request.Form value was detected from the client" error, but we don't.
Researching this has been a hassle because Google thinks PRN should return results for "porn", which means 98% of my search results are invalid (and inappropriate). Using the "-porn" clause in Google drops your results down to about 10-30 hits, all useless.
Does anyone know why a URL containing the string "PRN" will not route properly? If you have any posts or threads to point me to, that would be awesome (again, Google has failed me).
Related
I'm maintaining a legacy .NET 4.7.2 MVC application that uses Umbraco CMS 7.15.7. We started getting failed requests registered in Azure Application Insights with the following message
A potentially dangerous Request.Path value was detected from the client (:).
I've seen that it is related to requests such as these (below is a sample request that usually occurs in Application Insights logs, I've just changed the real website domain):
timestamp [UTC]
2021-09-28T21:50:31.1808953Z
name
GET /companies/ https:/www.sample-site.com/companies/sample-company
url
https://www.sample-site.com/companies/ https:/www.sample-site.com/companies/sample-company
success
False
resultCode
400
duration
0.1136
performanceBucket
<250ms
itemType
request
customDimensions
{"_MS.ProcessedByMetricExtractors":"(Name:'Requests', Ver:'1.1')"}
operation_Name
GET /companies/ https:/www.sample-site.com/companies/sample-company
client_Type
PC
client_IP
0.0.0.0
sdkVersion
web:2.9.0-23612
I realise that the problem (:) is because the url and the name end up being somehow "doubled" (https://www.sample-site.com/companies/ https:/www.sample-site.com/companies/sample-company). My goal is not to allow such requests, but rather find the cause of them.
Here are my conclusions as to what might be causing this issue:
The failed requests seem to occur periodically, and in bulk - every 5-10 hours. One product page never gets called twice within a bulk. This led me to believe that they might be caused by a bot, but then again the pattern isn't always clear - there will be minutes apart between two requests.
The requests seem to occur mostly for product pages that are under a certain node in Umbraco (in this instance, the companies node). However, there are some product pages under that node in Umbraco that don't have the occurrence.
The failed request contains https:/ with only one "/", which is odd and seems that this part is concatenated at some point
The few successful requests under the companies node always have a 301 status code, which leads me to believe that the issue could be related to a broken redirect rule somewhere
The error started occurring after we changed some content in Umbraco for certain product pages (mostly canonicalUrl properties of product page nodes). But then again, there are some product pages that do have the property filled out, but do not get the issue.
I can never relate these requests to a user or session in Application Insights
I am aware that this might be too specific to debug and that this might be caused by practically anything - but I do have a feeling that it's somehow related to Umbraco, so somebody with experience in Umbraco and the knowledge how the name property could end up being generated as GET /companies/ https:/www.sample-site.com/companies/sample-company could lead me to some clues?
As additional information, here is a successfully executed request to a product page under the same node in Umbraco:
timestamp [UTC]
2021-09-29T07:27:54.2739984Z
name
GET /companies/second-company
url
https://www.sample-site.com/companies/second-company
success
True
resultCode
301
duration
6.4651000000000005
performanceBucket
<250ms
itemType
request
customDimensions
{"_MS.ProcessedByMetricExtractors":"(Name:'Requests', Ver:'1.1')"}
operation_Name
GET /companies/second-company
client_Type
PC
client_IP
0.0.0.0
sdkVersion
web:2.9.0-23612
Can you add ValidateRequest=false at the top of your page.
Also,
As you are using .Net 4.7.2 you need to allow the below urls in web.config file.
<system.web>
<httpRuntime
requestPathInvalidCharacters="<,*,>,%,&,\,?" />
</system.web>
I have removed the (:), the original default string is
<httpRuntime
requestPathInvalidCharacters="<,>,*,%,&,:,\,?" />
Check these SO1 and SO2 with related discussions.
After doing some research, I found the cause of the issue, so I believe it could be beneficial to others if I share it even though quite some time passed: the requests with incorrect URL were sent by an external SEO tool which we used - so the error originated outside of the system. These requests were sent by the SEO tool as they are shown here, e.g. https://www.sample-site.com/companies/ https:/www.sample-site.com/companies/sample-company, so they weren't malformed somewhere in the request pipeline of our system.
How do I clear/remove query string parameters, which my MVC action, doesn't require/support?
For instance, my action requires, say an id and a bool flag, so the url would be something like: http://localhost:someport/controller/action/?id=1&remove=true
But, if a user types in something like, http://localhost:someport/controller/action/?id=1&remove=true&some-junk-param=0
Then, I want the some-junk-param to be removed and not shown in the address bar, when the request is processed.
Any thoughts?
If you need to get rid of unwanted query string parameters, you have two general options:
Do it on server-side. You can achive this only with redirection, that means when browser asks URL with bad query string, server redirects browser to URL with good query string.
Caveats:
In this case we have redundant query just for cleaning query string.
User will have trash in browser history.
Do it on client-side. ASP.NET MVC Model binder will get only expected parameters from query string, so it's nothing bad with having other values in query string. You can check your URL on client-side with javascript and rewrite it with or without changing history using History API (IE10+).
Caveats:
In this case you will have to support consistency about allowed parameters between JS and C# code
Of course every way is suitable for it's own cases, but looking at caveats the second way is better, because it affects developer expirience whereas first way affects user expirience.
In my asp.net mvc 3 site my actual route looks like /FF.mvc/116/MVt?m=01-12-2012 but some of my users are getting error and they have weird route like
/FF.mvc/116/ossw=((qncufuh)niah(`r)mt
any idea where from this
ossw=((qncufuh)niah(`r)mt
coming from?
My hunch is that your applications pages are indexed by the search (google) engine (perhaps against your wishes :)). If you search anything for example your apps name in google you will see a similar ossw=((qncufuh)niah(r)mt` string in the address bar when the results are returned.
Some employee has searched the page link in google and tried to access it from there.
Inorder to prevent the search spiders from indexing your application's pages add a robots.txt file in your application.
Have you tried something like:
Url = "/FF.mvc/116/MVt?m=" + HttpUtility.UrlEncode("01-12-2012")
I don't know if the char '-' will be error.
the ((qncufuh).... might come from some strange language code.
ASP.Net MVC 3.0, .NET 4.0, IIS 7
I know it has been asked a many times, but I still can't figure out what's wrong with it.
I get these messages only occasionally (less than 1 a day), and I get about 4k visits daily.
Here is a link to the error report:
http://wowreforge.com/elmah.axd/detail?id=6CBE6DCA-88C2-45E7-AF53-A53061B8E25D
(notice there are links to XML and JSON detailed reports)
First thing to note is URL (PATH) contains UTF-8 encoded character : /US/Warsong/Spartan%C3%B6
second thing, request is HEAD, not GET
Neither one of those details should result in the error I receive, I think.
The original URL was:
http://wowreforge.com/US/Warsong/Spartan%C3%B6?reforge=--52145254126214646464--3214325254&crit=7&dodge=90&exp=19&haste=1&hit=10&mastery=100&parry=67&spi=0
I have tried this URL with both GET and HEAD request, but wasn't able to reproduce the error.
Anything else I can poke at?
Notice that PATH_TRANSLATED = E:\web\wowreforgec\htdocs\EU\Kael%27Thas\Acekhor. It looks like the URL encoded character %27 is not being translated to ' before looking up the path of the file on disk. The % character is forbidden by the default configuration of the RequestPathInvalidCharacters property, thus the input is considered dangerous and an exception is thrown.
Edit
The HttpUtility.UrlDecode(string s) method should transform /EU/Kael%27Thas/Acekhor into /EU/Kael'Thas/Acekhor. This method (or one of similar function) should be called at the point where the virtual path is resolved to a physical path. Are you using a custom method to transform the virtual path into a physical path?
If an extra character (like a period, comma or a bracket or even alphabets) gets accidentally added to URL on the stackoverflow.com domain, a 404 error page is not thrown. Instead, URLs self correct themselves & the user is led to the relevant webpage.
For instance, the extra 4 letters I added to the end of a valid SO URL to demonstrate this would be automatically removed when you access the below URL -
https://stackoverflow.com/questions/194812/list-of-freely-available-programming-booksasdf
I guess this has something to do with ASP.NET MVC Routing. How is this feature implemented?
Well, this is quite simple to explain I guess, even without knowing the code behind it:
The text is just candy for search engines and people reading the URL:
This URL will work as well, with the complete text removed!
The only part really important is the question ID that's also embedded in the "path".
This is because EVERYTHING after http://stackoverflow.com/questions/194812 is ignored. It is just there to make the link, if posted somewhere, if more speaking.
Internally the URL is mapped to a handler, e.g., by a rewrite, that transforms into something like: http://stackoverflow.com/questions.php?id=194812 (just an example, don't know the correct internal URL)
This also makes the URL search engine friendly, besides being more readable to humans.