How to set up app to disallow Cloudfront from fetching anything? - ruby-on-rails

I use rails 5.2 and cloudfront for assets
How to set up app to disallow Cloudfront from fetching anything except for assets?

CloudFront doesn't have an explicit way to "allow only" certain path prefixes, since they will ultimately match the default * cache behavior if they don't match any others, but there are several ways of working around this, depending on the level of sophistication and complexity that suits your taste... but all of them would start with this step:
create a new cache behavior using the desired path pattern, such as /assets/* and select your existing origin to handle these requests.
At this point, CloudFront still works as before, it's just internally considering the asset requests to match one behavior and everything else to match the other.
So, what we need next is something different for the "other."
The simplest solution is to create a second Origin, using the Origin Domain Name invalid.invalid. This is a syntactically valid hostname that points to a nonexistent target (the .invalid TLD is reserved for such purposes).
After creating this origin, edit your default cache behavior to use this new origin.
With this change in place and propagated, CloudFront will process /assets/* requests as before, but will throw an error on any other path. (The error is 502 Bad Gateway, if I remember correctly).
This accomplishes the simple purpose of blocking all other requests.
If you want to be a bit more proactive, and actually redirect requests back to the main site, you can accomplish this by creating an empty bucket in S3, and select the "Redirect requests" option. In the "target bucket or domain" box, put your main web site hostname. Then take the "Endpoint" shown in the Static website hosting box and use that as your origin hostname in CloudFront, for the default cache behavior. Any requests that arrive at CloudFront (for other than /assets/* will receive a redirect back to the main site.
This option may be the better option if your CDN has been inadvertently picked up by search engines, because the links will redirect back to the main site.

Related

Rails APIs and path based load balancer routing

We're breaking our monolithic Rails application in to microservices. Our services are hosted on AWS and are behind ALBs. We cannot use host based routing as we are multi-tenant via subdomain, and it would be an SSL nightmare to maintain the required certs for each tenant/environment/service combination. So we are using path-based API routing with rules on the load balancer. A request looks like this:
Client -> www.example.com/api/:service_name/the_rest_of_the_path -> ALB -> route to rails service by name of :service_name
Because ALB cannot modify the path of a request before it sends it on to the serive, when it reaches the Rails services the path is still /api/:service_name/the_rest_of_the_path . This means in order to route to the proper controllers/actions in this case, we'd need to actually create a rails scope on namespace of /api/:service_name . This would work in theory but it has two drawbacks.
Firstly it means local developers have to deal with ALB/client specific concerns -- the path used for external service/cluster routing for ALB.
The second is that it couples the application to that path. If the load balancer decided the path should be /:service_name/the_rest_of_the_path instead then it would mean changing the application code in conjunction with the load balancer rules to accommodate it. It's not optimal and I'd prefer to avoid it if at all possible.
I thought then perhaps we could introduce a webserver to the mix, in between the load balancer and the application layer. I worked on a proof of concept for this and had it stripping out /api/:service_name before it got to the service -- leaving the Rails app with just "the_rest_of_the_path" which is all it cares about. Great! Perfect! Or so I thought.
It works well enough to route initial requests to, It however falls flat when any sort of redirects or links are used by taking the current path (as Rails sees it) in to consideration.
In the event /api/:service_name is stripped off before it hits the service, any subsequent links or redirects made from the Rails server itself naturally do not include it in there any longer. You may be on www.example.com/api/:service_name/foo/bar but Rails only thinks you're at /foo/bar. When it tries to tack something on to the path for a redirect or link like /foo/bar/baz, it loses the thing that identifies what service to send it to so the route dies at the load balancer.
This has particularly been an issue with Omniauth/Oauth2 flows for us. Omniauth wants to live at /auth/:provider by default. If the request path is actually /api/:service_name/auth/:provider then it won't match and the Oauth flow wont initiate. Further if there is a failure with the Oauth flow, Omniauth will hard redirect to www.example.com/auth/failure -- which of course does not resolve as the LB does not know where to route the request to.
If we provide a path_prefix to Omniauth as /api/:service_name/auth then it wont match when testing locally at /auth and it won't initiate the flow there.
We won't have control over all of the gems we use and where they redirect to so my question is: Is there a proper way of hanging Rails API microservices off a path on a load balancer, and not have to pull teeth to preserve the necessary prefix in all routes and links and redirects? Something that is essentially a global base href that we can set there, but not set locally so that we can continue to develop at localhost:3000/path instead of remembering to use (and coupling with) an LB path like localhost:3000/api/:service_name/path ?

Rails ActiveStorage: how to avoid one redirect for each image?

If you use ActiveStorage and you have a page with N images you get N additional requests to your Rails app (i.e. N redirects). That means wasting a lot of server resources if you have tens of images on a page.
I know that the redirect is useful for signed URLs. However I wonder why Rails does not precompute the final signed URL and embed that into the HTML page... In this way we could keep the advantages of signed URLs / protected files, without making N additional calls to the Rails server.
Is it possible to include the final URL / pre-signed URL of image variants directly in the HTML (thus avoiding the redirect)? Otherwise, why is that impossible?
After days of reasoning and tests, I am really excited of my final solution, which I explain below. This is an opinionated approach to images and may not represent the current Rails Way™️, however it has incredible advantages for websites that serve many public images, in particular:
When you serve a page with N images you don't get 1 + N requests to your app server, instead you get only 1 request for the page
The images are served through a CDN and this improves the loading time
The bucket is not completely public, instead it is protected by Cloudflare
The images are cached by Cloudflare, which greatly reduce your S3 bill
You greatly reduce the number of API requests (i.e. exists) to S3
This solution does not require large changes to Rails, and thus it is straightforward to switch back to Rails default behavior in case of problems
Here's the solution:
Create an s3 bucket and configure it to host a public website (i.e. call it storage.example.com) - you can even disable the public access at bucket level and allow access only to the Cloudflare ips using a bucket policy
Go to Cloudflare and configure a CNAME for storage.example.com that points to your domain; you need to use Flexible SSL (you can use a page rule for the subdomain); use page rules to set heavy caching: set Cache Everything and set a very long value (e.g. 1 year) for Browser Cache TTL and Edge Cache TTL
In you Rails application you can keep using private storage / acl, which is the default Rails behavior
In your Rails application call #post.variant(...).processed after every update or creation of #post; then in your views use 'https://storage.example.com/' + #post.variant(...).key' (note that we don't call processed here in the views to avoid additional checks in s3); you can also have a rake task that calls processed on each object, in case you need to regenerate the variants; this is works perfectly if you have only a few variants (e.g. 1 image / variant per post) that are changed infrequently
Most of the above steps are optional, so you can combine them based on your needs.
You can use the service_url to create direct links to your resources.
We don't use Rails views in our project so my knowledge about the view layer is rusty. I think you could put it in a dedicated helper and then use it from your views.

Route 53 - Special domain for a single page on existing server

I have a complex web app at example-app.com, hosting fully on AWS using ELB and Route 53 for DNS. It's a Rails app.
I'm running an experiment that I'm using in the rails app, at example-app.com/test. I want to set up new-domain-app.com, to point at example-app.com/test, and have the URL cloacked to always be new-domain-app.com. It's a single page site, so it shouldn't require any navigation.
I'm having a lot of trouble figuring out how to set up my DNS on Route 53 to accomplish this. Does anyone have good ideas on what this Route 53 configuration should look like?
AWS offers a very simple way to implement this -- with CloudFront. Forget about the fact that it's marketed as a CDN. It's also a reverse proxy that can prepend a fixed value onto the path, and send a different hostname to the back-end server than the one typed into the browser, which sounds like what you need.
Create a CloudFront web distribution.
Configure the new domain name as an alternate domain name for the distribution.
For the origin server, put your existing hostname.
For the origin path, put /test -- or whatever string you want prefixed onto the path sent by the browser.
Configure the cache behavior as needed -- enable forwarding of the query string or cookies if needed and any headers your app wants to see, but not Host.
Point your new domain name at CloudFront... But before you do that, note that your CloudFront distribution has a dxxxexample.cloudfront.net hostname. After the distribution finishes setting up (the "In Progress" status goes away, usually in 5 to 20 minutes) your site should be accessible at the cloudfront.net hostname.
How this works: When you type http://example.com into the browser, CloudFront will add the origin path onto the path the browser sends, so GET / HTTP/1.1 becomes GET /test/ HTTP/1.1. This configuration just prefixes every request's path with the string you specified as the origin path, and sends it on to the server. The browser address bar does not change, because this is not a redirect. The host header sent by the browser is replaced with the hostname of the origin server when the request is sent to the origin.
What you are trying to do is not possible. Route53 is a DNS system, and you can not configure a hostname (e.g. new-domain-app.com) to point to URL (e.g. http://example-app.com/test) using DNS.
However, you are probably using a wrong tool for the job. If example-app.com/test is indeed a simple, static, single page site, then you do not need to host it inside Rails app. Instead, you can host it on AWS S3 bucket, and then you can point new-domain-app.com to that bucket using Route53.
See the following for details:
http://docs.aws.amazon.com/AmazonS3/latest/dev/WebsiteHosting.html
http://docs.aws.amazon.com/Route53/latest/DeveloperGuide/RoutingToS3Bucket.html
DNS knows about Domains, not url's. DNS simply converts names to IP addresses.
You can't do what you are asking for just using DNS and ELB, however, what you can do is have a seperate VHOST for new-domain-app.com that points to your example-app.com site and accomplishes what you want using some sort of redirection rule that only fires for new-domain-app.com.
I'm not sure that this qualifies as an SO question, and more likely is a serverfault question. Specifics about your webserver and OS platform would be helpful in getting more specific advice.
So here's some details:
You already have example-app.com setup and working
You create a CNAME entry pointing new-domain-app.com to example-app.com or you can make an A record pointing to the same IP. If you already have example-app.com pointing to a different IP address, then use a subdomain (test.example-app.com) to isolate it.
Setup a new vhost on your server that basically duplicates the existing vhost for new-domain-app.com. The only thing you need to change is the server name configuration.
Why does this work? Because HTTP 1.1 included the HOST header that browsers send along, and web servers use in vhosting to determine which virtual host to route an incoming request to. When it sees that the client browser wanted "example-app.com" it routes the request to the appropriate vhost.
Rather than having to do some fancy proxying, which certainly can be used to get to a similar result, you can just add a redirection rule that looks for requests for the host example-app.com and redirects those to example-app.com. In apache that uses mod_rewrite which people often utilize by putting rules in the ubiquitous .htacess file, but can also be done in nginx and other common web servers. The specifics are slightly different for each.

How do I limit AWS CloudFont so that it only serves requests from a single directory on my domain?

I have gone through the process of creating a CloudFront distribution with the Origin Domain Name pointing to my main Rails application where assets (images, css, js, ect) are located at /assets.
However, by default, the CloudFront distribution is mirroring the entire domain (including dynamic pages).
How can I limit it to just the /assets sub-tree?
PS This is the article I am following:
https://devcenter.heroku.com/articles/using-amazon-cloudfront-cdn
Thanks!
Since the default cache behavior can't (afaik) be removed, this seems like a clever "serverless" solution:
Create a bucket in S3. The name won't matter. Don't put anything in it.
Add a second origin to your CloudFront distribution, selecting the new bucket as the origin.
Create a second cache behavior with path pattern /assets/* pointing to your original origin.
Change the default cache behavior to use the new S3 origin (the unused, empty bucket).
CloudFront will forward requests for /assets/* to your existing server, where they will be handled as now, but all other requests will be sent to the empty bucket, which has no content and no permissions, so the response will be 403 Forbidden.
Optionally, add an appropriate "robots.txt" file to that otherwise-empty bucket, and make it publicly readable, so CloudFront will serve it up to any crawlers that visit your CloudFront distribution, disallowing them from indexing, which should hopefully prompt them to remove any already-indexed results and not try to index the assets or any other paths they might have already learned by crawling the previously-exposed content at the "wrong" URL.

How does HTML5 AppCache handle redirects?

If I include in my application cache manifest:
/example.html
and this redirects to
https://s3.amazonaws.com/longURL/example.html?dynamicauthenticationparameters
will this work?
The current draft HTML5 specification seems to be silent on redirects for content files (as opposed to the manifest itself) apart from referring to a manual redirect flag, which apparently is set but (as far as I can tell) never actually used.
(The intention is to avoid proxying some S3 content, but to still make it available offline using the cache mechanism. JavaScript and LocalStorage would presumably be a workaround if the above can't be done).
Any pointers to the relevant part of a spec and/or current browser implementation behavior would be helpful.
The current specification now states that if the resource is redirected to a different origin, then this is treated as a failure and the local cached copy (or fallback) is used instead.
In section 5.6.4 of http://www.w3.org/TR/2011/WD-html5-20110525/offline.html it states that:
Redirects are fatal because they are either indicative of a network
problem (e.g. a captive portal); or would allow resources to be added
to the cache under URLs that differ from any URL that the networking
model will allow access to, leaving orphan entries; or would allow
resources to be stored under URLs different than their true URLs. All
of these situations are bad.
So sadly you can't serve some pages from Amazon S3 or Cloudfront.

Resources