How does the URL I type in lead to the eventual content I see in my browser? - url

I'm trying to figure out how these all work together, and there are bits and pieces of information all over the internet.
Here's what I (think) I know:
1) When you enter a url into your browser that gets looked up in a domain name server (DNS), and you are sent an IP address.
2) Your computer then follows this IP address to a server somewhere.
3) On the server there are nameservers, which direct you to the specific content you want within the server. -> This step is unclear to me.
4) With this information, your request is received and the server relays site content back to you.
Is this correct? What do I have wrong? I've done a lot of searching over the past week, and the thing I think I'm missing is the big picture explanation of how all these details tie together.
Smaller questions:
a) How does the nameserver know which site I want directions to?
b) How can a site like GoDaddy own urls? Why do I have to pay them yearly fees, and why can't I buy a url outright?
I'm looking for a cohesive explanation of how all this stuff works together. Thanks!

How contents get loaded when I put a URL in a browser ?
Well there some very well docs available on this topic each step has its own logic and algorithms attached with it, here I am giving you a walk through.
Step 1: DNS Lookup : Domain name get converted into IP address, in this process domain name from the URL is used to find IP address of the associated server machine by looking up records on multiple servers called name servers.
Step 2: Service Request : Once the IP address is known, as service request depending on protocol is created in form of packets and sent to the server machine using IP address. In case of a browser normally it will be a HTTP request; in other cases it can be something else.
Step 3: Request handling: Depending on the service request and underlying protocol, request is handled by a software program which lives normally on the server machine whose address was discovered in previous step. As per the logic programmed on the server program it will return a appropriate response in case of HTTP its called HTTP Response.
Step 4: Response handling: In this step the requesting program in your case a browser receives the response as mentioned in the previous step and renders it and display it as per defined in the protocol, in case of HTTP a HTTP body is extracted and rendered, which is written in HTML.
How does the nameserver know which site I want directions to
URL has a very well defined format, using which a browser find out a hostname/domain name which is used in turn to find out the associated IP address; there are different algorithms that name-servers runs to find out the correct server machine IP.
Find more about DNS resolution here.
How can a site like GoDaddy own urls? Why do I have to pay them yearly fees, and why can't I buy a url outright?
Domain name are resources which needed management and regulation which is done ICANN they have something called registries from which registrar(like GoDaddy) get domains and book them for you; the cost you pay is split up between ICANN and registrar.
Registrar does a lot of work for you, eg setup name server provide hosting etc.
Technically you can create you own domain name but it won't be free off course because you will need to create a name server, need to replicate it other servers and that way you can have whatever name you want (has too be unique); a simple way to do that is by editing your local hosts files in linux it is located at /etc/hosts and in windows it is located at C:\Windows\System32\drivers\etc\hosts but its no good on internet, since it won't be accepted by other servers.

(Precise and detailed description of this process would probably take too much space and time to write, I am sure you can google it somewhere). So, although very simplified, you have pretty good picture of what is going on, but some clarifications are needed (again, I will be somewhat imprecise) :
Step 2: Your computer does follow the IP address received in step 1, but the request set to that IP address usually contains one important piece of information called 'Host header', that is the actual name as you typed in your browser.
Step 3: There is no nameserver involved here, the software(/hardware) is usually called 'webserver' (for example Apache, IIS, nginx etc...). One webserver can serve one or many different sites. In case there are more than one, webserver will use the 'Host header' to direct you to the specific content you want.
ICAAN 'owns' the domain names, and the registration process involves technical and administrative effort, so you pay registrars to handle that.

Related

Given 2 URLs, is it possible to know if the resources are on the same web server?

I am accessing 2 URLs. The domain name/server part is the same. The resource part is different.
The URLs are like the following:
https://aa.bb.com/dir1/dir2
https://aa.bb.com/dir3
When I access the first URL, I get redirected to the second URL. Is it possible that the second URL be hosted on a different web server than the first or both resources would be on the same web server?
If by web server you mean physical computer, absolutely they could be on different servers. Google and Akamai, among others, have large collections of machines serving the same domain names. It helps with speed, since you are likely to receive pages from a server near you.
In general, it does not appear to be possible to reliably tell whether you are talking to the exact same server before and after a redirect. First, it is difficult to test for IP addresses from a Web page (see, e.g., this question and this one). Second, even if the IP addresses are the same before and after the redirect, they may be on different machines. For example, TCP anycast can change which server you are talking to without changing the IP address. Also, network address translation and load-balancing may change which server you are talking to behind a firewall, which you would probably have no way of finding out unless the server provided some ID of its own.

how to add subdomain name from current url using .htacces rules

I have a URL link like,
http://domain.com/abs/def/city and,
i want to display it as http://city.domain.com/ABC/def
using .htaccess.
Can any one help me by providing .ht access rules.
I want to write .htaccess rules for each city name in URL act as sub domain name.
Also i want it to be dynamic as there are different cities are available in site.
i am using below code in .htaccess file, but not working properly.
RewriteRule ^index.php/(.)/(.)/([^/]+)$ http://$3.domain/$1/$2/$3 [R=301,L]
is there any way to get my requirement using or by modifying my above code or by some other .htaccess code.
Sorry, but what you ask is not possible. This is a typical missunderstanding about url rewriting:
Url rewriting rewrites (manipulates) incoming requests on the server side before processing them. It is not possible to alter outgoing content such that contained urls are changed by this means.
There are solutions for that though:
apaches proxy module can "map" one url into the scope of some other url
there are also modules for automatic post processing of generated html markup
more exotic or creative solutions exist, it depends on your situation in the end...
But usually the easiest is to change the application (typically just its central configuration) such that it contains final urls (pointing to the subdomain in your case). Then you can indeed use the rewriting module to "re-map" those to the previous scope when future incoming requests refer to them (they got clicked).
Ok, second step getting additional info from your comments:
Just to get this clear: you understand that it is not possible to change the link you send out by means of rewriting, but you want to change the url shown in the browser after the user has clicked on some city link? That is something different to what you wrote before, that actually is possible. Great.
If the rewriting works as you want it to (you see the desired url in the browsers address bar), then we can go on. The error message indicates a name resolution problem, that has nothing to do with rewriting. Most likely the domain "cambridge.192.168.2.107" cannot be resolved, which is actually not surprising. You cannot mix ip addresses and names, it is either or.
Also I see that you are using internal, non-routable addresses. So you also are responsible for the name resolution yourself, since no public DNS server can guess what you are setting up internally. Did you do that?
I suggest these steps:
stop using an ip addres for this, use a domain name.
since you are working internally, take care that that domain name is actually resolved to your local systems ip address. How you do this depends on your setup and system, obviously. Most likely you need some entry in the file /etc/hosts or similar.
you need to take care that also those "subdomain names" get resolved to the same address. This is not trivial, again it depends on the setting and system you locally use.
if that name resolution works, then you should see a request in your http servers access log file. Then and only then it makes sense to go on...

What happening when typing in the browser URL?

I've been asked this question several times on interview and each time could not give clear answer. So my question is, what is happening when we are typing URL in browser I know that this URL is translated into IP via DNS and it is obtained via GET method. But what is happening in details? Could someone please tell me?
The URL has several parts that mean different things, you can read up on it when you search for "parts of URL" for example.
Essentially when you have a URL like:
http://server.domain.com/path/to/script.php?var=value&var2=value2
then http is the protocol used for the transfer (this can be http, https, ftp or other),
server.domain.com is the DNS of the server to be contacted (which is resolved using DNS) which is itself composed of parts (com is the 1st level domain, domain is 2nd and server is 3rd - read up on DNS resolution to understand more)
the part "/path/to/script.php?var=value&var2=value2" is handed to the server where:
"/path/to/" is path to the document/script being called
"script.php" is the name of the script and
"?var=value&var2=value2" are the parameters handed to the script:
var is going to have the value of "value" and var2 of "value2"
and that's the whole process :)

SEO Destroyed By URL Forwarding - Can't figure out another way

We design and host websites for our clients/sales force. We have our own domain: http://www.firstheartland.com
Our agents fill out a series of forms on our website that are loaded into a database. The database then renders the website as a database driven website.
/repwebsites/repSite.cfm?link=&rep=rick.higgins
/repwebsites/repSite.cfm?link=&rep=troy.thompson
/repwebsites/repSite.cfm?link=&rep=david.kover
The database application reads which "rep" the site is for and the appropriate page to display from the query string. The page then outputs the content and the appropriate CSS to style the page and give it its own individual branding.
We have told the user to use Domain Name Forwarding to get the users to their spot on our server. However, everyone seems to be getting indexed under our domain instead of their own. We could in theory assign an new IP to them, the cost is not the issue.
The issue is how we would possibly accomplish this.
With all of that said, them being indexed under our domain would still be OK as long as they would actually show up high in the ranking for their search term.
For instance, an agent owns TroyLThompson.com. If I search Troy L Thompson, It does not show up in my search. Only, "troy thompson first heartland" works (they show up third)
Apart from scrapping the whole system, I don't know what to do. I'm very open to ideas.
I'm sure you can get this to work as most hosting companies will host hundreds of websites on a single server (i.e. multiple domains on one IP).
I think you need your clients to update the nameservers for their domains (i.e. DNS) to return the IP address of your hosting server. Then you need to configure your server to return the right website based on the domain that was originally requested.
That requires your "database driven website" to look in the HTTP request and check which domain was originally requested, then it can handle the request accordingly.
- If you are using Apache, see how to configure Apache to host multiple domains on one IP address.
- If you are using Microsoft IIS, maybe Host-Header Routing is what you need.
You will likely need code changes on your "database driven website" to cope with these changes.
I'm not sure that having a dedicated IP address per domain will help much, as then you have to find a way to host all those IP addresses from a single web server. However, if your web server architecture already supports a shared database and multiple servers, then that approach might work well for you, especially if you expect the load from some domains to be so heavy that you need a dedicated web server for them.
Google does not include URL in its index which return a 301 status code. The reason is pretty obvious on second thought, because the redirect tells Google "Whatever was here before has moved there, please update your references". One solution I can see is setting up Apache virtual hosts on your server for each external domain, and have each rep configure their domain's DNS A record to point to the IP address of your server.

Best way to redirect image requests to a different webserver?

I am trying to reduce the load on my webservers by adding an "Image server" (a dedicated server for handling image requests), and redirecting all requests for .gif,.jpg,.png etc., to it.
My question is, what is the best way to handle the redirection?
At the firewall level? (can I do this using iptables?)
At the load balancer level? (can ldirectord handle this?)
At the apache level - using rewrite rules?
Thanks for any suggestions on the best way to do this.
--Update--
One thing I would add is that these are domains that are hosted for 3rd parties, so I can't expect all the developers to modify their code and point their images to another server.
The further up the chain you can do it, the better.
Ideally, do it at the DNS level by using a different domain for your images (eg imgs.example.com)
If you can afford it, get someone else to do it by using a CDN (Content delivery network).
-Update-
There are also 2 featuers of apache's mod_rewrite that you might want to look at. They are all described well at http://httpd.apache.org/docs/1.3/misc/rewriteguide.html.
The first is under the heading "Dynamic Miror" in the above document, that uses the mod_rewrite Proxy flag [p]. This lets your server silently fetch files from another domain and return them.
The second is to just redirect the request to the new domain. This second option puts less strain on your server, but requests still need to come in and it slows down the final rendering of the page, as each request needs to make an essentially redundant request to your server first.
i agree with rikh. If you want images to be served from a different webserver, then serve them on a different web-server. For example:
<IMG src="images/Brett.jpg">
becomes
<IMG src="http://brettnesbitt.akamia-technologies.com/images/Brett.jpg">
Any kind of load balancer will still feed the image from the web-server's pipe, which is what you're trying to avoid.
i, of course, know what you really want. What you really want is for any request like:
GET images/Brett.jpg HTTP/1.1
to automatically get converted into:
HTTP/1.1 307 Temporary Redirect
Location: http://brettnesbitt.akamia-technologies.com/images/Brett.jpg
this way you don't have to do any work, except copy the images to the other web-server.
That i really don't know how to do.
By using the phrase "NAT", it implies that the firewall/router receives HTTP requests, and you want to forward the request to a different internal server if the HTTP request was for image files.
This then begs the question about what you're actually trying to save. No matter which internal web-server services the HTTP request, the data is still going to have to flow through the firewall/router's pipe.
The reason i bring it up is because the common scenario when someone wants to serve images from a different server is because they want to split up high-bandwidth, mostly static, low-CPU cost content from their actual logic.
Only using NAT to re-write the packet and send it to a different server will not work towards that common issue.
The other reason might be because images are not static content on your system, and a request to
GET images/Brett.jpg HTTP/1.1
actually builds an image on the fly, with a high-CPU cost, or only using with data available (i.e. SQL Server database) to ServerB.
If this is the case then i would still use a different server name on the image request:
GET http://www.brettsoft.com/default.aspx HTTP/1.1
GET http://imageserver.brettsoft.com/images/Brett.jpg HTTP/1.1
i understand what you're hoping for, with network packet inspection to override the NAT rule and send it to another server - i've never seen any such thing that can do that.
It sounds more "proxy-ish", where the web-proxy does this. (i.e. pfSense and m0n0wall can't do it)
Which then leads to a kind of solution we used once: a custom web-server that analyzes the request, makes the appropriate request off some internal server, and binary writes the response to the client.
That pain in the ass solution was insisted upon by a "security consultant", who apparently believes in security through obscurity.
i know IIS cannot do such things for you itself - i don't know about other web-server products.
i just asked around, and apparently if you wanted to write a custom kernel module for you linux based router, you could have it inspect packets and take appropriate action. Such a module might exist. There are, apparently, plenty of other open-sourced modules to use as a starting point.
But i'd rather shoot myself in the head.

Resources