Detecting main URL with IdHTTPProxyServer - delphi

I want to make an application to redirect websites.
It has a table with "domains" and "redirect domains".
Once it matched domain, it redirect to redirect domain.
If didn't matched, it redirect to default page.
So I created a Delphi application with IdHTTPProxyServer.
I have configured it to even work with https using "ssleay32.dll" and "libeay32.dll".
Everything works great.
It use "IdHTTPProxyServerHTTPBeforeCommand" event to redirect like this:
with AContext.Connection.IOHandler do
begin
WriteLn('HTTP/1.0 302 Moved Temporarily');
WriteLn('Location: ' + RedirectURL);
WriteLn('Connection: close');
WriteLn;
end;
But how do I distinguish the event call by main URL (user typed in the address bar) and other URLs?
"IdHTTPProxyServerHTTPBeforeCommand" event called lots of times when a page is loading for stat counters, facebook like buttons, etc. I don't want to redirect all of them to default page.
If this is not possible with IdHTTPProxyServer, is there any other options in Delphi or any other language (which can generate native executable. C++ preferred)?
Thank You

From the perspective of a proxy (or the target HTTP server, for that matter), there is no difference whatsoever between a user-typed URL and other URLs. Every HTTP request is self-contained and independent of every other HTTP request. They have to processed as-is on a per-request basis.
If you want to ignore dependent URLs (images, scripts, etc), you will have to know ahead of time what the initial URL is, parse the data that is retrieved from that URL, keep track of any URLs the data refers to, and then ignore those URLs if you see them being requested later. However, there is nothing in the HTTP protocol to tell you what the initial URL is. There is a Referer request header that may help at times, as it is filled in when a browser is requesting dependent resource files, but it is also filled in when the user navigates around from one page to another, so you can't rely on the Referer by itself. You will have to implement your own discovery logic to figure out the initial URL based on more analysis of the URLs being requested by a given client over time.
Only the client really knows what it is requesting and why, a proxy is just a gateway to reach it. So there is only so much smart filtering you can do in a proxy without knowing what the client is actually doing.

Related

How do I make a dynamic URL for a 404 xhtml page?

I have defined a location for the page in the xml
<error-page>
<error-code>404</error-code>
<location>/faces/public/error-page-not-found.xhtml</location>
</error-page>
<error-page>
but I want the URL to be like below:
faces/{variable}/public/error-page-not-found.xhtml
where the value of the variable will change according to different situations
This question is a bit subjective though in general HTTP errors are handled by the server and most of the time by the scripting language on the server (and occasionally the HTTP server software directly).
In example the Apache HTTP web server software allows for rewrites. So you can request a page at example.com/123 though there is no "123" file there. In the code that would determine if you would have something that would be available for that request you would also determine if a resource exists for that request; if not then your server scripting code (PHP, ColdFusion, Perl, ASP.NET, etc) would need to return an HTTP 404. The server code would then have a small snippet that you would put in to the body of the code such as the code you have above.
You would not need to redirect to an error page, you would simply respond with the HTTP 404 response and any XML you'd use to notify the visitor that there is nothing there. HTTP server software such as Apache can't really produce code (it can only reference or rewrite some file to be used for certain requests).
Generally speaking if you have a website that uses a database you'd do the following...
Parse the URL requested so you can determine what the visitor requested.
Determine if a resource should be retrieved for that request (e.g. make a query to the database).
Once you know whether a resource is available or not you then either show the resource (e.g. a member's profile) or server the HTTP status (401: not signed in at all, 403:, signed in though not authorized where no increase in privileges will grant permission, 404: not found, etc) and display the corresponding content.
I would highly recommend that you read about Apache rewrites and PHP, especially it's $_SERVER array (e.g. <?php print_r($_SERVER);?>). You'd use Apache to rewrite all requests to a file so even if they request /1, /a, /about, /contact/, etc they all get processed by a single PHP file where you first determine what the requested URL is. There are tons of questions here and elsewhere on the web that will help you really get a good quick jump start on handling all that such as this: Redirect all traffic to index.php using mod_rewrite. If you do not know how to setup a local HTTP web server I highly recommend looking in to XAMPP, it's what I started out with years ago. Good luck!

HTTP Redirect on a browser without showing intermediate window

I have two servers someserver.com and anotherserver.com
What I need is when a user clicks on someserver.com he or she will be redirected to anotherserver.com
Currently when I do a redirect programatically on the server (ASP.NET MVC IIS)
what a user see is: 1) someserver.com is loaded 2)anotherserver.com is loaded.
What i want is when a user clicks on someserver.com he sees only anotherserver.com in his browser.
Does http protocol allow it?
Thanks!
There are a bunch of approaches to this but the simplest one of you own the domain is to just use domain forwarding st the dns level. Then it won't involve your web server at all. Otherwise, the browser will always have to load the first site, if only briefly, before loading the second one. You can optimize this by just sending a redirect, which should be barely noticeable. Another option would be client-side JavaScript, but if you know on the client that you want to go to the new URL, you could just use a standard hyperlink to it at that point (so I assume this is not an option).

check if url can be loaded in an iframe

Snip.ly nicely checks if the entered web address can be used in an iframe.
I'd like to replicate it in ruby. Looking through their code they send an ajax request to their server and thats where they do the validation.
Even after extensive googling couldn't find anything that could help me accomplish that.
My use case is that we let users add news listings to their page, which are shown in iframes, and would like to show it if the entered url can be used in an iframe.
You can figure out some cases by checking the X-Frame-Options header. But as you mentioned in the comments, it does not work all the time.
In my experience, it's best to side-step the problem altogether.
If you reverse-proxy your request through your rails server, then you can display pretty much anything all the time in your iframe.
Following is an example of the process. I'm assuming that your server is your-server.com and the user wants to list a page on user.com/list. The way it works would be:
Set an iframe's src to https://your-server.com/proxy?url=https://user.com/list`
Intercept the request, extract the url: https://user.com/list
Perform an HTTP request on https://user.com/list to fetch the content
Return it to the browser as if it come from your own server
This approach works pretty much all the time, but it then has other limitations:
- you should reverse proxy any asset on that page that has a relative url; otherwise the css/images may be broken
- you must handle ajax requests on that page
You can fix these as well, by transforming the html before step 4.
You could use https://github.com/waterlink/rack-reverse-proxy for step 2 and 3, instead of re-implementing your own reverse proxy.
You could set it up using the following code in config/application.rb:
config.middleware.insert(0, Rack::ReverseProxy) do
reverse_proxy_options timeout: 10 # avoids waiting for pages that take forever to load
reverse_proxy(/proxy\?url=(.*)/, '$1') # reverse proxy on the url parameter
end

MVC 5 how to achieve POST that behaves like a redirect to GET with content

My client redirects to a https://domain.com/Controller/GetInfo?Querystring method. Now my query string is getting dangerously close to the 2K limit, so I need to reproduce this behavior but pack my query string into the content of the messages. Since it would be heresy (etc.) to try a GET with content, I'll use a POST. However, I can't redirect to a POST since a Redirect has no content.
So, what I am looking for is the best MVC 5 pattern to resolve this: I need to provide lots of content, but I want the resulting page hosted on my remote server (i.e. as if I had redirected)
Also, since I use load balanced servers in azure, I'd prefer maintaining my clean stateless server if at all possible (else I'll have to introduce session caching).
#AntP is absolutely right in the comments above. If your query string is approaching 2K, then you're abusing it.
If there's a particular object you're referencing, then you can simply include the id or some other identifying piece of it and use that to look it up again from your data store.
If there's no persistent record of the object, then you can use something like Session or TempData to store it between one request and the next.
Regardless, it's not possible to redirect with a request body, with also means it's not possible to redirect using POST. The reason for this that the a redirect is not something the server does, but rather the client. The server merely suggests that the client go to a different URL. It's then up to the client (web browser) to issue a new request for that URL. Since the client is the one issuing the request, it makes the decision about what data is or isn't included in that request, not the server.

What are the steps involved from entering a web site address to the page being displayed on the browser?

And how can the process be speeded up from a developer point of view?
There are a lot of things going on.
When you first type in an address, the browser will lookup the hostname in DNS, if it is not already in the browser cache.
Then the browser sends a HTTP GET request to the remote server.
What happens on the server is really up to the server; but it should respond back with a HTTP response, that includes headers, which perhaps describe the content to the browser and how long it is allowed to be cached. The response might be a redirect, in which case the browser will send another request to the redirected page.
Obviously, server response time will be one of the critical points for perceived performance, but there are many other things to it.
When a response is returned from the server, the browser will do a few things. First it will parse the HTML returned, and create it's DOM (Document Object Model) from that. Then it will run any startup Javascript on the page; before the page is ready to be displayed in the browser. Remember, that if the page contains any ressources such as external stylesheets, scripts, images and so on, the browser will have to download those, before it can display the page. Each resource is a separate HTTP get, and there are some latency time involved here. Therefore, one thing that in some cases can greatly reduce load times is to use as few external ressources as possible, and make sure they are cached on the client (so the browser don't have to fetch them for each page view).
To summarize, to optimize performance for a web page, you want to look at, as a minimum:
Server response time
Bandwith /content transfer time.
Make sure you have a small and simple DOM (especially if you need to support IE6).
Make sure you understand client side caching and the settings you need to set on the server.
Make sure you make the client download as little data as possible. Consider GZipping resources and perhaps dynamic content also (dependent on your situation).
Make sure you don't have any CPU intensive javascript on Page load.
You might want to read up on the HTTP Protocol, as well as some of the Best Practices. A couple of tools you can use are YSlow and Google Page Speed
What are the steps involved from entering a web site address to the page being displayed on the browser?
The steps are something like:
Get the IP address of the URL
Create a TCP (HTTP) connection to the IP address, and request the specified page
Receive/download the page via TCP/HTTP; the page may consist of several files/downloads: e.g. the HTML document, CSS files, javascript files, image files ...
Render the page
And how can the process be speeded up from a developer point of view?
Measure to discover which of these steps is slow:
It's only worth optimizing whichever step is the slow one (no point in optimizing steps which are already fast)
The answer to your question varies depending on which step it is.

Resources