I'm trying to rewrite a URL with Cloudflare Transform Rule but I need it to make two changes and I also want it to behave according to a regular expression.
For example:
https://test1-server.cloud.servers/account/login
Should be rewritten to:
https://test1-machine.cloud.servers/forms/login
And
https://prod2-server.cloud.servers/account/login
Should be rewritten to:
https://prod2-machine.cloud.servers/forms/login
Could this be done with Cloudflare?
You should be able to achieve this, although it may be more complicated than just a transform rule. With the transform rule, you can only change the path of the final request it seems -
https://developers.cloudflare.com/rules/transform/url-rewrite/examples
So, I don't think changing the subdomain is possible. However, in theory if both your subdomains are pointed at the same IP address, you could a "Request Header Modification" rule to intercept and update the host header value -
https://developers.cloudflare.com/rules/transform/request-header-modification/examples
So your condition would be the same for both rules, but in the request header rule, you'd be changing the host header to test1-machine.cloud.servers.
Here are all available functions you can make use of (for all transform rule types) -
https://developers.cloudflare.com/firewall/cf-firewall-language/functions#transformation-functions
Related
I found this example for using netscaler to rewrite requests to an internal server on a specific port.
set transform action trans_action_RSA_SS -priority 1000 -reqUrlFrom
"https://rsa.domain.public" -reqUrlInto
“https://rsa.domain.local:7004″ -resUrlFrom
"https://rsa.domain.local:7004″ -resUrlInto
"https://rsa.domain.public"
I'd like to expand on the example to point the local destination at a vserver.
Assume my vserver is called INTERNALVSERVER and also assume that it
is configured as a load balancer in front of 3 nodes (I suspect the specifics of that are irrelevant to this situation).
I just want to ensure that my urltransform applies to my vserver properly. Conceptually I'm going for something like this:
set transform action trans_action_RSA_SS -priority 1000 -reqUrlFrom
"https://rsa.domain.public" -reqUrlInto
“https://INTERNALVSERVER:7004″ -resUrlFrom
"https://INTERNALVSERVER:7004″ -resUrlInto
"https://rsa.domain.public"
Apparently the example would work as advertised as long as https://rsa.domain.local points at a vserver/service group. So there's no need to change anything in the rules, just make sure "rsa.domain.local" or the equivalent for your setup is pointing to the right place.
Although above solution would work, from web-server's security perspective it's not good idea to do direct URL transformation of external URL to internal as it would make your internal web server vulnerable to denial of service attack. Also, if your web-admin has not secure web-server, SQL-injection would create big enough of problems for you.
Instead, I would suggest you to do LB and responders to pick/channel valid URL traffic and discard all other web-traffic.
Im getting a 304 redirect from A to B:
A http://example.com/path?value1=a&value2=b
B http://example.com/path/?value1=a&value2=b
Is B the correct form or is the server at fault?
URLs representing directories are usually ended in trailing slashes, but the URL itself is filesystem-agnostic, so both ways are correct.
The trailing slash is almost certainly done by Apache's mod_dir. The documentation contains some points why is this a good practice. Although:
If you don't want this effect and the reasons above don't apply to
you, you can turn off the redirect ...
http://example.com/something/somewhere//somehow/script.js
Does the double slash break anything on the server side? I have a script that parses URLs and i was wondering if it would break anything (or change the path) if i replaced multiple slashes with a single slash. Especially on the server side, some frameworks like CodeIgniter and Joomla use segmented url schemes and routing. I would just want to know if it breaks anything.
HTTP RFC 2396 defines path separator to be single slash.
However, unless you're using some kind of URL rewriting (in which case the rewriting rules may be affected by the number of slashes), the uri maps to a path on disk, but in (most?) modern operating systems (Linux/Unix, Windows), multiple path separators in a row do not have any special meaning, so /path/to/foo and /path//to////foo would eventually map to the same file.
An additional thing that might be affected is caching. Since both your browser and the server cache individual pages (according to their caching settings), requesting same file multiple times via slightly different URIs might affect the caching (depending on server and client implementation).
The correct answer to this question is it depends upon the implementation of the server!
Preface: Double-slash is syntactically valid according to RFC 2396, which defines URL path syntax. As amn explains, it therefore implies an empty URI segment. Note however that RFC 2396 only defines the syntax, not semantics of paths, including empty path segments, so it is up to your server to decide the semantics of the empty path.
You didn't mention the server software stack you're using, perhaps you're even rolling your own? So please use your imagination as to what the semantics could be!
Practically, I would like to point out some everyday semantic-related reasons which mean you should avoid double slashes even though they are syntactically valid:
Since empty being valid is somehow not expected by everyone, it can cause bugs. And even though your server technology of today might be compatible with it, either your server technology of tomorrow or the next version of your server technology of today might decide not to support it any more. Example: ASP.NET MVC Web API library throws an error when you try to specify a route template with a double slash.
Some servers might interpret // as indicating the root path. This can either be on-purpose, or a bug - and then likely it is a security bug, i.e. a directory traversal vulnerability.
Because it is sometimes a bug, and a security bug, some clever server stacks and firewalls will see the substring '//', deduce you are possibly making an attempt at exploiting such a bug, and therefore they will return 403 Forbidden or 400 Bad Request etc, and refuse to actually do any further processing of the URI.
URLs don't have to map to filesystem paths. So even if // in a filesystem path is equivalent to /, you can't guarantee the same is true for all URLs.
Consider the declaration of the relevant path-absolute non-terminal in "RFC3986: Uniform Resource Identifier (URI): Generic Syntax" (specified, as is typical, in ABNF syntax):
path-absolute = "/" [ segment-nz *( "/" segment ) ]
Then consider the segment declaration a few lines further down in the same document:
segment = *pchar
If you can read ABNF, the asterisk (*) specifies that the following element pchar may be repeated multiple times to make up a segment, including zero times. Learning this and re-reading the path-absolute declaration above, you can see that a potentially empty segment imples that the second "/" may repeat indefinitely, hence allowing valid combinations like ////// (arbitrary length of at least one /) as part of path-absolute (which itself is used in specifying the rule describing a URI).
As all URLs are URIs we can conclude that yes, URLs are allowed multiple consecutive forward slashes, per quoted RFC.
But it's not like everyone follows or implements URI parsers per specification, so I am fairly sure there are non-compliant URI/URL parsers and all kinds of software that stacks on top of these where such corner cases break larger systems.
One thing you may want to consider is that it might affect your page indexing in a search engine. According to this web page,
A URL with the same path repeated 3 times will not be indexed in Google
The example they use is:
example.com/path/path/path/
I haven't confirmed this would also be true if you used example.com///, but I would certainly want to find out if SEO optimization was critical for my website.
They mention that "This is because Google thinks it has hit a URL trap." If anyone else knows the answer for sure, please add a comment to this answer; otherwise, I thought it relevant to include this case for consideration.
Yes, it can most definitely break things.
The spec considers http://host/pages/foo.html and http://host/pages//foo.html to be different URIs, and servers are free to assign different meanings to them. However, most servers will treat paths /pages/foo.html and /pages//foo.html identically (because the underlying file system does too). But even when dealing with such servers, it's easily possible for extra slash to break things. Consider the situation where a relative URI is returned by the server.
http://host/pages/foo.html + ../images/foo.png = http://host/images/foo.png
http://host/pages//foo.html + ../images/foo.png = http://host/pages/images/foo.png
Let me explain what that means. Say your server returns an HTML document that contains the following:
<img src="../images/foo.png">
If your browser obtained that page using
http://host/pages/foo.html # Path has 2 segments: "pages" and "foo.html"
your browser will attempt to load
http://host/images/foo.png # ok
However, if your browser obtained that page using
http://host/pages//foo.html # Path has 3 segments: "pages", "" and "foo.html"
you'll probably get the same page (because the server probably doesn't distinguish /pages//foo.html from /pages/foo.html), but your browser will erroneously try to load
http://host/pages/images/foo.png # XXX
You may be surprised for example when building links for resources in your app.
<script src="mysite.com/resources/jquery//../angular/script.js"></script>
will not resolve to mysite.com/resources/angular/script.js but to mysite.com/resources/jquery/angular/script.js what you probably didn't want
Double slashes are evil, try to avoid them.
Your question is "does it break anything". In terms of the URL specification, extra slashes are allowed. Don't read the RFC, here is a quick experiment you can try to see if your browser silently mangles the URL:
echo '<?= $_SERVER['REQUEST_URI'];' > tmp.php
php -S localhost:4000 tmp.php
I tested macOS 10.14 (18A391) with Safari 12.0 (14606.1.36.1.9) and Chrome 69.0.3497.100 and both get the result:
/hello//world
This indicated that using an extra slash is visible to the web application.
Certain use cases will be broken when using a double slash. This includes URL redirects/routing that are expecting a single-slashed URL or other CGI applications that are analyzing the URI directly.
But for normal cases of serving static content, such as your example, this will still get the correct content. But the client will get a cache miss against the same content accessed with different slashes.
I currently do automatic detection of hyperlinks within text in my program. I made it very simple and only look for http:// or www.
However, a user suggested to me that I extend it to other forms, e.g.: https:// or .com
Then I realized it might not stop there because there's ftp and mailto and file, all the other top level domains, and even email addresses and file paths.
What I think is best is to limit it to what is practical by following some often-used standard set of hyperlink detection rules that are currently in use. Maybe how Microsoft Word does it, or maybe how RichEdit does it or maybe you know of a better standard.
So my question is:
Is there a built in function that I can call from Delphi to do the detection, and if so, what would the call look like? (I plan in the future to go to FireMonkey, so I would prefer something that will work beyond Windows.)
If there isn't a function available, is there some place I can find a documented set of rules of what is detected in Word, in RichEdit, or any other set of rules of what should be detected? That would then allow me to write the detection code myself.
Try the PathIsURL function which is declarated in the ShLwApi unit.
Following regex taken from RegexBuddy's library might get you started (I can't make any claims about performance).
Regex
Match; JGsoft; case insensitive:
\b(https?|ftp|file)://[-A-Z0-9+&##/%?=~_|$!:,.;]*[A-Z0-9+&##/%=~_|$]
Explanation
URL: Find in full text
The final character class makes sure that if an URL is part of some text,
punctuation such as a comma or full stop after the URL is not interpreted as part
of the URL.
Matches (whole or partial)
http://regexbuddy.com
http://www.regexbuddy.com
http://www.regexbuddy.com/
http://www.regexbuddy.com/index.html
http://www.regexbuddy.com/index.html?source=library
You can download RegexBuddy at http://www.regexbuddy.com/download.html.
Does not match
regexbuddy.com
www.regexbuddy.com
"www.domain.com/quoted URL with spaces"
support#regexbuddy.com
For a set of rules you might look into RFC 3986
A Uniform Resource Identifier (URI) is a compact sequence of
characters that identifies an abstract or physical resource. This
specification defines the generic URI syntax and a process for
resolving URI references that might be in relative form, along with
guidelines and security considerations for the use of URIs on the
Internet
A regex that validates a URL as specified in RFC 3986 would be
^
(# Scheme
[a-z][a-z0-9+\-.]*:
(# Authority & path
//
([a-z0-9\-._~%!$&'()*+,;=]+#)? # User
([a-z0-9\-._~%]+ # Named host
|\[[a-f0-9:.]+\] # IPv6 host
|\[v[a-f0-9][a-z0-9\-._~%!$&'()*+,;=:]+\]) # IPvFuture host
(:[0-9]+)? # Port
(/[a-z0-9\-._~%!$&'()*+,;=:#]+)*/? # Path
|# Path without authority
(/?[a-z0-9\-._~%!$&'()*+,;=:#]+(/[a-z0-9\-._~%!$&'()*+,;=:#]+)*/?)?
)
|# Relative URL (no scheme or authority)
([a-z0-9\-._~%!$&'()*+,;=#]+(/[a-z0-9\-._~%!$&'()*+,;=:#]+)*/? # Relative path
|(/[a-z0-9\-._~%!$&'()*+,;=:#]+)+/?) # Absolute path
)
# Query
(\?[a-z0-9\-._~%!$&'()*+,;=:#/?]*)?
# Fragment
(\#[a-z0-9\-._~%!$&'()*+,;=:#/?]*)?
$
Regular Expressions may be the way to go here, to define the various patterns which you deem to be appropriate hyperlinks.
I am reworking on the URL formats of my project. The basic format of our search URLs is this:-
www.projectname/module/search/<search keyword>/<exam filter>/<subject filter>/... other params ...
On searching with no search keyword and exam filter, the URL will be :-
www.projectname/module/search///<subject filter>/... other params ...
My question is why don't we see such URLs with back to back slashes (3 slashes after www.projectname/module/search)? Please note that I am not using .htaccess rewrite rules in my project anymore. This URL works perfect functionally. So, should I use this format?
For more details on why we chose this format, please check my other question:-
Suggest best URL style
Web servers will typically remove multiple slashes before the application gets to see the request,for a mix of compatibility and security reasons. When serving plain files, it is usual to allow any number of slashes between path segments to behave as one slash.
Blank URL path segments are not invalid in URLs but they are typically avoided because relative URLs with blank segments may parse unexpectedly. For example in /module/search, a link to //subject/param is not relative to the file, but a link to the server subject with path /param.
Whether you can see the multiple-slash sequences from the original URL depends on your server and application framework. In CGI, for example (and other gateway standards based on it), the PATH_INFO variable that is typically used to implement routing will usually omit multiple slashes. But on Apache there is a non-standard environment variable REQUEST_URI which gives the original form of the request without having elided slashes or done any %-unescaping like PATH_INFO does. So if you want to allow empty path segments, you can, but it'll cut down on your deployment options.
There are other strings than the empty string that don't make good path segments either. Using an encoded / (%2F), \ (%5C) or null byte (%00) is blocked by default by many servers. So you can't put any old string in a segment; it'll have to be processed to remove some characters (often ‘slug’-ified to remove all but letters and numbers). Whilst you are doing this you may as well replace the empty string with _.
Probably because it's not clearly defined whether or not the extra / should be ignored or not.
For instance: http://news.bbc.co.uk/sport and http://news.bbc.co.uk//////////sport both display the same page in Firefox and Chrome. The server is treating the two urls as the same thing, whereas your server obviously does not.
I'm not sure whether this behaviour is defined somewhere or not, but it does seem to make sense (at least for the BBC website - if I type an extra /, it does what I meant it to do.)