In terms of s3 urls, are there really 2 kinds? And why? What are the different syntaxes?
bucket.s3.amazonaws.com/key
and
s3.amazonaws.com/bucket/key
Is this it? Why are there 2? Are there more? Are these correct?
AWS is deprecating old path style URLs:
https://aws.amazon.com/blogs/aws/amazon-s3-path-deprecation-plan-the-rest-of-the-story/
Old vs. New S3 currently supports two different addressing models:
path-style and virtual-hosted style. Let’s take a quick look at each
one. The path-style model looks like either this (the global S3
endpoint):
https://s3.amazonaws.com/jbarr-public/images/ritchie_and_thompson_pdp11.jpeg
https://s3.amazonaws.com/jeffbarr-public/classic_amazon_door_desk.png
Or this (one of the regional S3 endpoints):
https://s3-us-east-2.amazonaws.com/jbarr-public/images/ritchie_and_thompson_pdp11.jpeg
https://s3-us-east-2.amazonaws.com/jeffbarr-public/classic_amazon_door_desk.png
In this example, jbarr-public and jeffbarr-public are bucket names;
/images/ritchie_and_thompson_pdp11.jpeg and
/classic_amazon_door_desk.png are object keys.
Even though the objects are owned by distinct AWS accounts and are in
different S3 buckets (and possibly in distinct AWS regions), both of
them are in the DNS subdomain s3.amazonaws.com. Hold that thought
while we look at the equivalent virtual-hosted style references
(although you might think of these as “new,” they have been around
since at least 2010):
https://jbarr-public.s3.amazonaws.com/images/ritchie_and_thompson_pdp11.jpeg
https://jeffbarr-public.s3.amazonaws.com/classic_amazon_door_desk.png
These URLs reference the same objects, but the objects are now in
distinct DNS subdomains (jbarr-public.s3.amazonaws.com and
jeffbarr-public.s3.amazonaws.com, respectively). The difference is
subtle, but very important. When you use a URL to reference an object,
DNS resolution is used to map the subdomain name to an IP address.
With the path-style model, the subdomain is always s3.amazonaws.com or
one of the regional endpoints; with the virtual-hosted style, the
subdomain is specific to the bucket. This additional degree of
endpoint specificity is the key that opens the door to many important
improvements to S3.
The additional functionality of providing multiple URL patterns for an object in the S3 is due to the Virtual Hosts and Website Hosting and publishing the data from the root directory. I got this info from
In the Bucket starting URL style - bucket.s3.amazonaws.com/key you can simple add the files like favicon, robots.txt etc where as in the other URL pattern - s3.amazonaws.com/bucket/key - there is no notion of root directory where you can put those files.
Content Snippet from AWS S3 Page - Virtual Hosting of Buckets :
In general, virtual hosting is the practice of serving multiple web
sites from a single web server. One way to differentiate sites is by
using the apparent host name of the request instead of just the path
name part of the URI. An ordinary Amazon S3 REST request specifies a
bucket by using the first slash-delimited component of the Request-URI
path. Alternatively, you can use Amazon S3 virtual hosting to address
a bucket in a REST API call by using the HTTP Host header. In
practice, Amazon S3 interprets Host as meaning that most buckets are
automatically accessible (for limited types of requests) at
http://bucketname.s3.amazonaws.com. Furthermore, by naming your bucket
after your registered domain name and by making that name a DNS alias
for Amazon S3, you can completely customize the URL of your Amazon S3
resources, for example, http://my.bucketname.com/.
Besides the attractiveness of customized URLs, a second benefit of
virtual hosting is the ability to publish to the "root directory" of
your bucket's virtual server. This ability can be important because
many existing applications search for files in this standard location.
For example, favicon.ico, robots.txt, crossdomain.xml are all expected
to be found at the root.
Related
I have a complex web app at example-app.com, hosting fully on AWS using ELB and Route 53 for DNS. It's a Rails app.
I'm running an experiment that I'm using in the rails app, at example-app.com/test. I want to set up new-domain-app.com, to point at example-app.com/test, and have the URL cloacked to always be new-domain-app.com. It's a single page site, so it shouldn't require any navigation.
I'm having a lot of trouble figuring out how to set up my DNS on Route 53 to accomplish this. Does anyone have good ideas on what this Route 53 configuration should look like?
AWS offers a very simple way to implement this -- with CloudFront. Forget about the fact that it's marketed as a CDN. It's also a reverse proxy that can prepend a fixed value onto the path, and send a different hostname to the back-end server than the one typed into the browser, which sounds like what you need.
Create a CloudFront web distribution.
Configure the new domain name as an alternate domain name for the distribution.
For the origin server, put your existing hostname.
For the origin path, put /test -- or whatever string you want prefixed onto the path sent by the browser.
Configure the cache behavior as needed -- enable forwarding of the query string or cookies if needed and any headers your app wants to see, but not Host.
Point your new domain name at CloudFront... But before you do that, note that your CloudFront distribution has a dxxxexample.cloudfront.net hostname. After the distribution finishes setting up (the "In Progress" status goes away, usually in 5 to 20 minutes) your site should be accessible at the cloudfront.net hostname.
How this works: When you type http://example.com into the browser, CloudFront will add the origin path onto the path the browser sends, so GET / HTTP/1.1 becomes GET /test/ HTTP/1.1. This configuration just prefixes every request's path with the string you specified as the origin path, and sends it on to the server. The browser address bar does not change, because this is not a redirect. The host header sent by the browser is replaced with the hostname of the origin server when the request is sent to the origin.
What you are trying to do is not possible. Route53 is a DNS system, and you can not configure a hostname (e.g. new-domain-app.com) to point to URL (e.g. http://example-app.com/test) using DNS.
However, you are probably using a wrong tool for the job. If example-app.com/test is indeed a simple, static, single page site, then you do not need to host it inside Rails app. Instead, you can host it on AWS S3 bucket, and then you can point new-domain-app.com to that bucket using Route53.
See the following for details:
http://docs.aws.amazon.com/AmazonS3/latest/dev/WebsiteHosting.html
http://docs.aws.amazon.com/Route53/latest/DeveloperGuide/RoutingToS3Bucket.html
DNS knows about Domains, not url's. DNS simply converts names to IP addresses.
You can't do what you are asking for just using DNS and ELB, however, what you can do is have a seperate VHOST for new-domain-app.com that points to your example-app.com site and accomplishes what you want using some sort of redirection rule that only fires for new-domain-app.com.
I'm not sure that this qualifies as an SO question, and more likely is a serverfault question. Specifics about your webserver and OS platform would be helpful in getting more specific advice.
So here's some details:
You already have example-app.com setup and working
You create a CNAME entry pointing new-domain-app.com to example-app.com or you can make an A record pointing to the same IP. If you already have example-app.com pointing to a different IP address, then use a subdomain (test.example-app.com) to isolate it.
Setup a new vhost on your server that basically duplicates the existing vhost for new-domain-app.com. The only thing you need to change is the server name configuration.
Why does this work? Because HTTP 1.1 included the HOST header that browsers send along, and web servers use in vhosting to determine which virtual host to route an incoming request to. When it sees that the client browser wanted "example-app.com" it routes the request to the appropriate vhost.
Rather than having to do some fancy proxying, which certainly can be used to get to a similar result, you can just add a redirection rule that looks for requests for the host example-app.com and redirects those to example-app.com. In apache that uses mod_rewrite which people often utilize by putting rules in the ubiquitous .htacess file, but can also be done in nginx and other common web servers. The specifics are slightly different for each.
I have gone through the process of creating a CloudFront distribution with the Origin Domain Name pointing to my main Rails application where assets (images, css, js, ect) are located at /assets.
However, by default, the CloudFront distribution is mirroring the entire domain (including dynamic pages).
How can I limit it to just the /assets sub-tree?
PS This is the article I am following:
https://devcenter.heroku.com/articles/using-amazon-cloudfront-cdn
Thanks!
Since the default cache behavior can't (afaik) be removed, this seems like a clever "serverless" solution:
Create a bucket in S3. The name won't matter. Don't put anything in it.
Add a second origin to your CloudFront distribution, selecting the new bucket as the origin.
Create a second cache behavior with path pattern /assets/* pointing to your original origin.
Change the default cache behavior to use the new S3 origin (the unused, empty bucket).
CloudFront will forward requests for /assets/* to your existing server, where they will be handled as now, but all other requests will be sent to the empty bucket, which has no content and no permissions, so the response will be 403 Forbidden.
Optionally, add an appropriate "robots.txt" file to that otherwise-empty bucket, and make it publicly readable, so CloudFront will serve it up to any crawlers that visit your CloudFront distribution, disallowing them from indexing, which should hopefully prompt them to remove any already-indexed results and not try to index the assets or any other paths they might have already learned by crawling the previously-exposed content at the "wrong" URL.
How do they differ in terms of backend/frontend architecture and implementation?
I know how to use domain.com/xyz where the page is rendered on the basis of information of xyz, how would that be done with xyz.domain.com
For example tumblr urls are like username.tumblr.com. How do they differ from, say, facebook pages where urls are like facebook.com/username?
Domains
Domains are an autonomous administrative structure. Assets within the same organization can be compiled into a domain. Public facing domains (somehow, be it direct or through security measures) connect to the internet. They don't even need a web server.
tumblr .com
------ ----
2nd level TLD - Top level domain
domain
Anything after (to the left of) tumblr is what is called a sub-domain or lower level domain. Sub-domains represent a smaller autonomous administrative organization within the main domain, i.e. Microsoft has a HR department so hr.microsoft.company would be synonymous to this situation.
Paths
username in facebook.com/username represents a path to a resource on the domain facebook.com, most likely (duh) on their webserver at port 80. I realize this is an oversimplification since Facebook probably uses a complex structure to deliver their content, but nevertheless, I'm hitting in the general area.
Going along with the HR analogy, they may maintain a series of forms for employees to access. Those would be stored as a resource rather than their own separate administrative structure.
hr.microsoft.company/forms/i9_tax.form
The Difference
The difference between a path and a subdomain is that a path represents a resource on its domain's webserver, while a subdomain is content, either from the same web server or a different one from the 2nd level domain, but with it's own DNS record. A subdomain on the same web server as the 2nd level domain would be a "CNAME" (canonical name) record within the DNS database, while a completely difference web server would be an "A" (hostname/alias) record.
So domain.com/index.html points to the index.html file within domain.com's public html directory where as xyz.domain.com points to either the hostname of a completely different web server or to a directory within domain.com's file structure like domain.com/useassubdomain/xyz, but (again) with it's own DNS record. Both can be configured like any ole' webpage (as long as the servers running them support it).
A few reasons you would choose to utilize a subdomain over a resource of the original domain's subdirectories is because you want to:
Distinguish regions or language (ja.wikipedia.org)
Distinguish a branch with different goals than the larger organization (windowsupdate.microsoft.com)
Sub-brands
Delegate ownership or administration of content (including custom content like tumblr)
I plan to use S3 + Cloudfront for hosting static images for an e-commerce web app. Each user-uploaded image can be used for several end points.
Example user actions in the back office :
Uploads image flower.jpg in his media library
Creates a flower product with id 1
Creates another flower product with id 2
Assign image flower.jpg to illustrate both product
I was thinking about a convention over configuration mechanism such as :
Uploaded images have a unique name, like flower.jpg in this case
When used to illustrate any item, use a convention like : point p1.jpg and p2.jpg to flower.jpg, the same way symlinks work
All three following URLs would return the same file :
http://aws-s3/my_app/flower.jpg
http://aws-s3/my_app/p1.jpg
http://aws-s3/my_app/p2.jpg
Can I do that with AWS ?
I did not find any such thing in the API docs, except for the temporary public URL, which comes with two no-go : 1, they expire, 2, I cannot chose the URL
Can I do that with another CDN ?
Thanks.
I believe that to accomplish such a thing your best bet is going to be to use EC2 (pricing) with S3.
My reasoning is that S3, as you say, doesn't allow for redirect URLs. To accomplish what you want, you would need to actually upload the file to each place, which would greatly increase your costs.
However, what you can do is use EC2 as a webserver. I'll leave it up to you to decide on your configuration, but you can do anything on EC2 you could do on any server - like set up redirects.
For reference, here's a good tutorial on setting up Apache on Ubuntu Server, and here's one on setting up Apache redirects.
I think, now you can use S3, Route53 and Cloudfront to achieve the same. The question seems to however aged, but thought it may be useful for someone looking now.
Can one of the Amazon services (their S3 data service, or otherwise) be used to offload server of static files for a Ruby on Rails app, but still support the app's authentication & authorization?
That is such that when the user browser downloaded the initial HTML for one page of the Ruby on Rails application, when it went back for static content (e.g. an image or CSS file), that this request would be:
(a) routed directly to the Amazon service (no RoR cycles used to serve it, or bandwidth), BUT
(b) the browser request for this item (e.g. an image) would still have to go through an authentication/authorization layer based on the user model in the Ruby on Rails application - in other words to ensure not just anyone could get the image...
thanks
The answer is a yes with a but. You can use a feature of S3 that allows you to create links to secure S3 objects that has a small time to live, default is 5 minutes. This will work for any S3 object that is uploaded as private. This means that the browser will only have X seconds or whatever to request the file from S3. Example code from docs for the AWS gem:
S3Object.url_for('beluga_baby.jpg', 'marcel_molina')
You can also specify an expires_in or expires option per file. The bad thing is that you would need to create a helper for your stylesheet, image, and js links to create the proper S3 URLs.
I would recommend that you setup a domain name for your S3 bucket, like "examples3.amazonaws.com" and put all your standard image files and CSS there as public. Then set that as the asset host in your rails config. Then, only use the secure links for static files that really need it.