Potentially dangerous Request.Path due to doubled url - asp.net-mvc

I'm maintaining a legacy .NET 4.7.2 MVC application that uses Umbraco CMS 7.15.7. We started getting failed requests registered in Azure Application Insights with the following message
A potentially dangerous Request.Path value was detected from the client (:).
I've seen that it is related to requests such as these (below is a sample request that usually occurs in Application Insights logs, I've just changed the real website domain):
timestamp [UTC]
2021-09-28T21:50:31.1808953Z
name
GET /companies/ https:/www.sample-site.com/companies/sample-company
url
https://www.sample-site.com/companies/ https:/www.sample-site.com/companies/sample-company
success
False
resultCode
400
duration
0.1136
performanceBucket
<250ms
itemType
request
customDimensions
{"_MS.ProcessedByMetricExtractors":"(Name:'Requests', Ver:'1.1')"}
operation_Name
GET /companies/ https:/www.sample-site.com/companies/sample-company
client_Type
PC
client_IP
0.0.0.0
sdkVersion
web:2.9.0-23612
I realise that the problem (:) is because the url and the name end up being somehow "doubled" (https://www.sample-site.com/companies/ https:/www.sample-site.com/companies/sample-company). My goal is not to allow such requests, but rather find the cause of them.
Here are my conclusions as to what might be causing this issue:
The failed requests seem to occur periodically, and in bulk - every 5-10 hours. One product page never gets called twice within a bulk. This led me to believe that they might be caused by a bot, but then again the pattern isn't always clear - there will be minutes apart between two requests.
The requests seem to occur mostly for product pages that are under a certain node in Umbraco (in this instance, the companies node). However, there are some product pages under that node in Umbraco that don't have the occurrence.
The failed request contains https:/ with only one "/", which is odd and seems that this part is concatenated at some point
The few successful requests under the companies node always have a 301 status code, which leads me to believe that the issue could be related to a broken redirect rule somewhere
The error started occurring after we changed some content in Umbraco for certain product pages (mostly canonicalUrl properties of product page nodes). But then again, there are some product pages that do have the property filled out, but do not get the issue.
I can never relate these requests to a user or session in Application Insights
I am aware that this might be too specific to debug and that this might be caused by practically anything - but I do have a feeling that it's somehow related to Umbraco, so somebody with experience in Umbraco and the knowledge how the name property could end up being generated as GET /companies/ https:/www.sample-site.com/companies/sample-company could lead me to some clues?
As additional information, here is a successfully executed request to a product page under the same node in Umbraco:
timestamp [UTC]
2021-09-29T07:27:54.2739984Z
name
GET /companies/second-company
url
https://www.sample-site.com/companies/second-company
success
True
resultCode
301
duration
6.4651000000000005
performanceBucket
<250ms
itemType
request
customDimensions
{"_MS.ProcessedByMetricExtractors":"(Name:'Requests', Ver:'1.1')"}
operation_Name
GET /companies/second-company
client_Type
PC
client_IP
0.0.0.0
sdkVersion
web:2.9.0-23612

Can you add ValidateRequest=false at the top of your page.
Also,
As you are using .Net 4.7.2 you need to allow the below urls in web.config file.
<system.web>
<httpRuntime
requestPathInvalidCharacters="<,*,>,%,&,\,?" />
</system.web>
I have removed the (:), the original default string is
<httpRuntime
requestPathInvalidCharacters="<,>,*,%,&,:,\,?" />
Check these SO1 and SO2 with related discussions.

After doing some research, I found the cause of the issue, so I believe it could be beneficial to others if I share it even though quite some time passed: the requests with incorrect URL were sent by an external SEO tool which we used - so the error originated outside of the system. These requests were sent by the SEO tool as they are shown here, e.g. https://www.sample-site.com/companies/ https:/www.sample-site.com/companies/sample-company, so they weren't malformed somewhere in the request pipeline of our system.

Related

Grails: Don't Run Filters on a Forward (without using flash variables)

I'm writing a plugin that keeps track of the pages that a user has visited in an application (for the purpose of a back button). It does this by having a filter that runs for every controller/action and keeps a list of visited pages. Everything is working great, except that when used in applications that use forwards, the plugin records two entries for the one page since Grails filters run on every request, even when that request is just a forward (ie: internal redirect).
Since this is a plugin (that has to be application agnostic) I can't simply set a flash variable whenever a forward is used to check if a forward has occurred. Is there any way to determine if a forward has occurred in a filter? I'm exploring the different values in the request variable and how they differ between a normal request and a forward, but things can get quite confusing. Any help is highly appreciated.
P.S. The main difference I noticed so far is that the request.forwardURI and request.requestURI differ for a forward, however, the requestURI is in a special format that I currently don't know how to convert to match the forwardURI.
For example for a normal request:
request.forwardURI = '/short-url' (as set in URLMappings) or '/controller/action
request.requestURI = '/grails/controller/action.dispatch'
For a forward:
request.forwardURI = '/short-url' (as set in URLMappings) or '/controller/action'
request.requestURI = '/grails/forwardedController/forwardedAction.dispatch'

Grails UrlMappings alphanumeric id

I managed to have a string id domain with assigned generator, so the url to show/edit is /controller/action/alphanumeric_id.
It worked fine till I do vulnerabilities tests, so I found a problem when the id contain slashes (and backslash) even when encoded with %2F (and %5C) the browser gives 400 Bad Request error.
My mapping is the default one /$controller/$action?/$id? and I watched with a validator into the constraints to see what is going on, but the request not even arrive there when containing these characters.
If I access /controller/action/?id=alphanumeric_id all goes well, but I wonder if there is no way to continue using default short url.
EDIT:
Create a new grails application (mine is version 2.1.3);
Create a controller;
In any action do println params.id, if you want you can do so in UrlMapping too;
Try to access your controller by /appname/controller/action/abc, it goes fine;
Now try /appname/controller/action/a%2Fbc or /appname/controller/action/a%5Cbc, it gives 400 Bad Request;
Other combinations with %00 upto %FF should work as well, but not all of them does, by the way, %00 also do not work.
BUGGED COMBINATIONS: %00 %2F %5C

grails scope questions - page, request, flash

The grails manual shows the following example:
<g:set var="now" value="${new Date()}" scope="request" />
and also indicates by default variables defined by the set are page scope (out of the page, request, flash, session, and application choices). I'm wondering what the difference between page and request scope is, and what an example use of the difference might be.
Also, with the flash scope, the manual indicates: "Grails supports the concept of flash scope as a temporary store for attributes which need to be available for this request and the next request only. Afterwards the attributes are cleared. This is useful for setting a message directly before redirection."
It isn't immediately apparent to me how redirection relates to "this request and the next request", since the example of redirection they give is redirecting from one controller action to another, which doesn't respond in two pages/http responses being sent to the client?
Hopefully those two questions make sense -- i.e. high level difference between page and request scope, and how redirecting between actions is useful for flash scope?
A redirect(controller: "foo", action:"bar") equals a new request (in the context of a servlet at least). Which is why you need flash to be a sort of 'two requests scope', the action you get sent to treats your redirection as a new request. You can explicitly avoid this by using chain().
As for the difference between the page and request scope, my understanding is that the page scope is more or less the model a given view / render process operates on whereas the request is for the entire request cycle. Meaning that whatever you pass off to the view in an action return (or the stuff you put in model: [] of a render()) is the 'page scope'.
As for the manual example I have no clue why they would show any scoping at all in a view g:set operation, setting variables in the view should generally be avoided anyways (separation of concerns and all that jazz).

MVC .NET Urls aren't routed using the RouteCollection

We're using MVC .NET and the RouteCollection class to route URLs in our web app. This functions normally until we pass a URL containing the the text "PRN" anywhere inside the URL. When this happens, the routing will not occur and a 400 Page Not Found error is returned to the client. It's like something is throwing the error before the routing collection is even consulted, because the route the URL should take is never touched (by that I mean the underlying code's break-point is never hit, though the exact same URL without the string "PRN" will hit the break-point).
So I thought it might be a page validation issue, that maybe Microsoft decided to throw exceptions when the URL contains the phrase "PRN" because it is like "print" or "porn" but if that were the case then we'd see the "A potentially dangerous Request.Form value was detected from the client" error, but we don't.
Researching this has been a hassle because Google thinks PRN should return results for "porn", which means 98% of my search results are invalid (and inappropriate). Using the "-porn" clause in Google drops your results down to about 10-30 hits, all useless.
Does anyone know why a URL containing the string "PRN" will not route properly? If you have any posts or threads to point me to, that would be awesome (again, Google has failed me).

Rails 404 handler for non-Rails URLs

I've inherited a site with hundreds of scattered HTML and non-framework PHP files, which I am porting to Ruby on Rails 3.0.
As functionality is added in the Rails app, the corresponding pages are deleted from the document root; but, because there are often links to these in Google or from external sites, simply returning a 404 is not acceptable.
A URL like '/contact.php' should redirect to '/app/contact/', for example.
For the first few cases of this, I created simple stub html files at the old locations, with Meta tags within to perform the redirect. This doesn't scale well, particularly once I start replacing product pages, of which there are thousands.
My preference is to delete the old pages, then have the 404 handler dispatch these to the new Rails app, which will examine the URL using regexes and database lookup to try to figure out what the replacement page is, then issue a 301 redirect to that new page.
In httpd.conf, I placed the directive:
ErrorDocument 404 /app/error/handle404
# /app/error is a rails url.
When I hit "http://localhost/does-not-exist", this causes my ErrorController to be invoked, as expected.
However, within the controller, I cannot find the original path ("/does-not-exist") anywhere in request, request.headers, or ENV - I've been calling likely methods like request.request_uri (which contains /app/error/handle404), and examining request.headers and ENV without finding the expected original path.
The Apache access_log shows only the request for /does-not-exist, indicating that it transparently invoked /app/error/handle404 (without doing a redirect or causing a second request to be made).
How can I get access to the original URL?
Edit: to clarify, here is the sequence of events:
User hits legacy path like http://mysite/foo.php, probably coming from some ancient link from a blog.
...but foo.php no longer exists!
this is a 404, thus Apache invokes ErrorDocument
directive is "ErrorDocument 404 /railsapp/error/handle404"
Rails routes this to ErrorController action "handle404" - this is working correctly
problem: in ErrorController, request.request.uri, request.headers do not provide any clue as to which URL the user was actually trying to get to, like "/foo.php"; I need to know the original URL to serve up an appropriate replacement page.
As I couldn't find the original, non-rewritten URL in the Rails request, I ended up doing it in PHP - plain, old-fashioned, non-framework PHP with explicit mysqli_*() calls.
The PHP error handler receives the necessary information in the $_SERVER hash; $_SERVER['REQUEST_URI'] contains the original URI that I needed.
I look this up in a database, and if I find a corresponding entry, issue a 301 redirect to the new location; if there's no entry, I simply display a 404 page to the user.
Simplified (PHP):
$url = $_SERVER['REQUEST_URI'];
$redir = lookupRedirect($url); # database stuff here
if (! $redir) {
include ('404.phtml');
} else {
header("Status: 301");
header("Location: " . $redir['new_url']);
}
It's an ugly kluge, but I just couldn't find a way to make the Rails app aware of the error URL.

Resources