"Request has expired" when using S3 with Active Storage - ruby-on-rails

I'm using ActiveStorage for the first time.
Everything works fine in development but in production (Heroku) my images disappear without a reason.
They were showing ok the first time, but now no image is displayed. In the console I can see this error:
GET https://XXX.s3.amazonaws.com/variants/Q7MZrLyoKKmQFFwMMw9tQhPW/XXX 403 (Forbidden)
If I try to visit that URL directly I get an XML
<Error>
<Code>AccessDenied</Code>
<Message>Request has expired</Message>
<X-Amz-Expires>300</X-Amz-Expires>
<Expires>2018-07-24T13:48:25Z</Expires>
<ServerTime>2018-07-24T15:25:37Z</ServerTime>
<RequestId>291D41FAC6708334</RequestId>
<HostId>lEVGuwA6Hvlm/i40PeXaje9SEBYks9+uk6DvBs=</HostId>
</Error>
This is what I have in the view
<div class="cover" style="background-image: url('<%= rails_representation_path(experience.thumbnail) %>')"></div>
This is what I have in the model
def thumbnail
self.cover.variant(resize: "300x300").processed
end
In simple words, I don't want images to expire but to be always there.
Thanks

ActiveStorage does not support non-expiring link. It uses expiring links (private), and support uploading files only as private on your service.
It was a problem for me too, and did 2 patches (caution) for S3 only, one simple ~30lines that override ActiveStorage to work only with non-expiring (public) links, and another that add an acl option to has_one_attached and has_many_attached methods.
Hope it helps.

Your question doesn't say so, but it's common to use a CDN like AWS CloudFront with a Rails app. Especially on Heroku you probably want to conserve compute power.
Here is what happens in that scenario. You render a page as usual, and all the images are requested from the asset host, which is the CDN, because that's how it is configured to integrate. Its setup to fetch anything it doesn't find in cache from origin, which is your application again.
First all image requests are passed through. The ActiveStorage controller creates signed URLs for them, and the CDN passes them on, but also caches them.
Now comes the problem. The signed URL expires in 5 minutes by default, but the CDN caches usually much longer. This is because usually you use digest assets, meaning they are invalidated not by time but by name, on any change.
The solution is simple. Increase the expiry of the signed URL to be longer than the cache's TTL. Now the cache drops the cached signed URL before it becomes invalid.
Set the URL expiry using ActiveStorage::Service.url_expires_in in 5.2 or directly in Rails.application.config.active_storage.service_urls_expire_in in an initializer see this answer for details.
To set cache TTL in CloudFront: open the AWS console, pick the distribution, open the Behavior tab, scroll down to these fields:
Then optionally issue an invalidation to force re-caching of all contents.
Keep in mind there is a security trade-off. If the image contents are private, then they don't belong into a CDN most likely, and shouldn't have long lasting temp URLs either. In that case choose a solution that exempts attachments from CDN altogether. Your application will have to handle the additional load of signing all attached assets' URLs on top of rendering the relevant page.
Further keep in mind, that this isn't necessarily a good solution, but more of a workaround. With the above setup you will cache redirects, and the heavier requests will hit your storage bucket directly. The usual scenario for CDNs is large media, not lightweight redirects. You do relieve the app of handling a lot of requests though. How much that is a valid optimization should be looked into.

I had this same issue, but after I corrected the time on my computer, the problem was resolved. It was a server time difference, that the aws servers did not recognize.

#production.rb
Change
config.active_storage.service = :local
To
config.active_storage.service = :amazon
Should match aws/amazon whatever you defined it as in storage.yml

Related

Automatically upload assets to S3 without assets_sync

I just did my first deploy to Heroku and besides my images, the assets work. I was reading about how to move the assets to s3 (and then cache them with cloudfront) when I found this gist:
https://gist.github.com/schneems/9374188
"I hate asset_sync"
Using asset sync can cause failures, is difficult to debug, un-needed, and adds extra complexity. Don't use it. Instead use https://devcenter.heroku.com/articles/using-amazon-cloudfront-cdn
The problem is, I can't find how to sync assets automatically like the gem does. Whats the best alternative to using the asset_sync gem?
Although an old question, in case anyone finds this question and is hoping for an answer here are my own findings.
Cloudfront has, for some time now, allowed users to set an origin value on their configuration. You want to set that to your application host. If you were deploying to a site accessible via https://myapp.com then you would use that as your Cloudfront origin. Then any cache misses from Cloudfront will route to your application layer at https://myapp.com, appending whatever path info is present in the request (e.g. /assets/css/whatever.css. This means that your application must be able to serve these static assets. If it can, then you're all set. If not, check the Rails guide for how to enable that.
Caveat! You cannot use a non-publicly-accessible URL for the origin. What does that mean? If you are provisioning your own pre-production application instances that are hidden behind a VPC, for example, then you cannot use those instances for your origin. Cloudfront cannot be granted special access to your instances. There's a workaround if you read Cloudfront's documentation on serving private content; basically you make your application publicly-accessible to anyone with the appropriate links but you enforce application-level constraints to disallow access to anyone not using a specially signed URL or cookie.
This is very old, but the answers here are not satisfying. The gist and articles linked do not intend to say the asset_sync gem is bad. The article intends to say you shouldn't be serving your assets from anywhere (S3 or not) but your application. Uploading your assets anywhere runs into the same issues mentioned in the gist, no matter if you're using this particular gem or not.
The suggestion is serving your assets from your application and then use a CDN such as Cloudfront to to ease the load on your server. You do not need to upload anything to Cloudfront. Instead, Cloudfront will ask your application for assets on-demand and cache automatically.
However, Cloudfront (or any CDN) do not address all use cases of asset_sync. If you're running into slug size issues and you've already cleaned repos and unnecessary dependencies, you might still need to use asset_sync to get your slug under 500MB and keep Heroku happy. Of course you can forego asset_sync and use a custom implementation to upload to S3 (or anywhere else).

PDF caching on heroku with cloudflare

I'm having a problem getting the caching I need to work using CloudFlare.
We use CloudFlare for caching all our assets on S3 which works 100% using a separate subdomain cdn
We also use CloudFlare for our main site (hosted on Heroku) as well, e.g. www
My problem is I can't get CloudFlare to cache PDFs that are generated from our Rails app. I'm using the WickedPDF gem to dynamically generate certain PDFs for invoices, etc. I don't want to upload these as files to say S3 but we would like to have CloudFlare cache these so they don't get generated each and every time, as the time spent generating these PDFs is a little intensive.
CloudFlare is turned on and is "accelerating" for the subdomain in question and we're using SSL, but PDFs never seem to cache properly.
Is there something else we need to do to ensure these get cached? Or maybe there's another solution that would work for Heroku? (eg we can't use Page caching since it relies on the filesystem) I also checked the WickedPDF documentation so see if we could do anything else, but found nothing about expire controls.
Thanks,
We should actually cache it as long as the resources are on-domain & not being delivered through a third-party resource in some way.
Keep in mind:
1. Our caching depends on the number of requests for the resources (at least three).
2. Caching is very much data center dependent (in other words, if your site receives a lot of traffic at a data center it is going to be cached; if your site doesn't get a lot of traffic in another data center it may not cache).
I would open a support ticket if you're still having issues.

Rails implementation for securing S3 documents

I would like to protect my s3 documents behind by rails app such that if I go to:
www.myapp.com/attachment/5 that should authenticate the user prior to displaying/downloading the document.
I have read similar questions on stackoverflow but I'm not sure I've seen any good conclusions.
From what I have read there are several things you can do to "protect" your S3 documents.
1) Obfuscate the URL. I have done this. I think this is a good thing to do so no one can guess the URL. For example it would be easy to "walk" the URL's if your S3 URLs are obvious: https://s3.amazonaws.com/myapp.com/attachments/1/document.doc. Having a URL such as:
https://s3.amazonaws.com/myapp.com/7ca/6ab/c9d/db2/727/f14/document.doc seems much better.
This is great to do but doesn't resolve the issue of passing around URLs via email or websites.
2) Use an expiring URL as shown here: Rails 3, paperclip + S3 - Howto Store for an Instance and Protect Access
For me, however this is not a great solution because the URL is exposed (even for just a short period of time) and another user could perhaps in time reuse the URL quickly. You have to adjust the time to allow for the download without providing too much time for copying. It just seems like the wrong solution.
3) Proxy the document download via the app. At first I tried to just use send_file: http://www.therailsway.com/2009/2/22/file-downloads-done-right but the problem is that these files can only be static/local files on your server and not served via another site (S3/AWS). I can however use send_data and load the document into my app and immediately serve the document to the user. The problem with this solution is obvious - twice the bandwidth and twice the time (to load the document to my app and then back to the user).
I'm looking for a solution that provides the full security of #3 but does not require the additional bandwidth and time for loading. It looks like Basecamp is "protecting" documents behind their app (via authentication) and I assume other sites are doing something similar but I don't think they are using my #3 solution.
Suggestions would be greatly appreciated.
UPDATE:
I went with a 4th solution:
4) Use amazon bucket policies to control access to the files based on referrer:
http://docs.amazonwebservices.com/AmazonS3/latest/dev/index.html?UsingBucketPolicies.html
UPDATE AGAIN:
Well #4 can easily be worked around via a browsers developer's tool. So I'm still in search of a solid solution.
You'd want to do two things:
Make the bucket and all objects inside it private. The naming convention doesn't actually matter, the simpler the better.
Generate signed URLs, and redirect to them from your application. This way, your app can check if the user is authenticated and authorized, and then generate a new signed URL and redirect them to it using a 301 HTTP Status code. This means that the file will never go through your servers, so there's no load or bandwidth on you. Here's the docs to presign a GET_OBJECT request:
https://docs.aws.amazon.com/sdk-for-ruby/v3/api/Aws/S3/Presigner.html
I would vote for number 3 it is the only truly secure approach. Because once you pass the user to the S3 URL that is valid till its expiration time. A crafty user could use that hole the only question is, will that affect your application?
Perhaps you could set the expire time to be lower which would minimise the risk?
Take a look at an excerpt from this post:
Accessing private objects from a browser
All private objects are accessible via
an authenticated GET request to the S3
servers. You can generate an
authenticated url for an object like
this:
S3Object.url_for('beluga_baby.jpg', 'marcel_molina')
By default
authenticated urls expire 5 minutes
after they were generated.
Expiration options can be specified
either with an absolute time since the
epoch with the :expires options, or
with a number of seconds relative to
now with the :expires_in options:
I have been in the process of trying to do something similar for quite sometime now. If you dont want to use the bandwidth twice, then the only way that this is possible is to allow S3 to do it. Now I am totally with you about the exposed URL. Were you able to come up with any alternative?
I found something that might be useful in this regard - http://docs.aws.amazon.com/AmazonS3/latest/dev/AuthUsingTempFederationTokenRuby.html
Once a user logs in, an aws session with his IP as a part of the aws policy should be created and then this can be used to generate the signed urls. So in case, somebody else grabs the URL the signature will not match since the source of the request will be a different IP. Let me know if this makes sense and is secure enough.

How to proxy files from S3 through rails application to avoid leeching?

In order to avoid hot-linking, S3 bandwidth leeching, etc I would like to make my bucket private and serve the files through a Rails app. Concept in general sounds very easy, but I am not entirely sure which approach would be the best for the situation.
I am using paperclip for general asset management. Is there any build-in way to achieve this type of proxy?
In general I can easily parse the url's from paperclip and point them back to my own controller. What should happen from this point? Should I simply use Net::HTTP to download the image, and then serve it with send_data? In between I want to log referer and set proper Control-Cache headers, since I have a reverse-proxy in front of the app. Is Net::HTTP + send_data resonable way in this case?
Maybe whole ideas is really bad for some reasons I am not aware at this moment? I general I believe that reveling the direct S3 links to public bucket is dangerous and yield in some serious problems in case of leeching / hot-linking...
Update:
If you have any other ideas which can reduce S3 bill and prevent hot-linking leeching in anyway please share, even if they are not directly related to Rails.
Use (a private bucket|private files) and use signed URLs to the files stored on S3.
The signature includes an expiration time (e.g. 10 minutes from now, whatever you would like to set), as well as a cryptographic hash. S3 will refuse to serve files if the signature is invalid, or if the expiration time has passed.
This is useful because only you can create valid URLs to your private files in S3, and you can control how long the URLs remain valid. This prevents leeching, because leechers can't sign their own URLs and, if they get a URL that you signed, that URL will expire very shortly and after that can not be used.
Since there wasn't a nuts and bolts answer above, here's a small code sample of how to stream a file that's stored on S3.
render :text => proc { |response, output|
AWS::S3::S3Object.stream(path, bucket) do |segment|
output.write segment
output.flush # not sure if this is needed
end
}
Depending on your webserver this may (mongrel) or may not (webrick) work, so don't get too frustrated if it doesn't stream in development.
Provide temporary pre-signed URLs:
def show
redirect_to Aws::S3::Presigner.new.presigned_url(
:get_object,
bucket: 'mybucket',
key: '/folder/file.pdf'
expires_in: 60)
end
S3 still distributes the content so you offload the work from Rails (which is very slow at it), handles HTTP caching, HEAD operations, and uses Amazon CDN.
I'd probably avoid to do this -- at least until I'd have no other choice.
You need to take into account that you'll probably also add to the bandwidth bill if you download the image each time. Also, by processing each image through a script you'll also need more CPU and RAM required to do this. Not the greatest outlook -- IMHO.
I would probably enable the access logs for Amazon S3 and write a tool small to analyze usage and change the permissions on the bucket/object in case usage is goes the roof. Run this as a cronjob every 10 minutes or so and you should be save?
You could also use s3stat. They also offer a free plan.
Edit: As per my recommendation for Varnish, I'm adding a link to a blog entry about preventing hotlinking using Varnish.

Why would you upload assets directly to S3?

I have seen quite a few code samples/plugins that promote uploading assets directly to S3. For example, if you have a user object with an avatar, the file upload field would load directly to S3.
The only way I see this being possible is if the user object is already created in the database and your S3 bucket + path is something like
user_avatars.domain.com/some/id/partition/medium.jpg
But then if you had an image tag that tried to access that URL when an avatar was not uploaded, it would yield a bad result. How would you handle checking for existence?
Also, it seems like this would not work well for most has many associations. For example, if a user had many songs/mp3s, where would you store those and how would you access them.
Also, your validations will be shot.
I am having trouble thinking of situations where direct upload to S3 (or any cloud) is a good idea and was hoping people could clarify either proper use cases, or tell me why my logic is incorrect.
Why pay for storage/bandwidth/backups/etc. when you can have somebody in the cloud handle it for you?
S3 (and other Cloud-based storage options) handle all the headaches for you. You get all the storage you need, a good distribution network (almost definitely better than you'd have on your own unless you're paying for a premium CDN), and backups.
Allowing users to upload directly to S3 takes even more of the bandwidth load off of you. I can see the tracking concerns, but S3 makes it pretty easy to handle that situation. If you look at the direct upload methods, you'll see that you can force a redirect on a successful upload.
Amazon will then pass the following to the redirect handler: bucket, key, etag
That should give you what you need to track the uploaded asset after success. Direct uploads give you the best of both worlds. You get your tracking information and it unloads your bandwidth.
Check this link for details: Amazon S3: Browser-Based Uploads using POST
If you are hosting your Rails application on Heroku, the reason could very well be that Heroku doesn't allow file-uploads larger than 4MB:
http://docs.heroku.com/s3#direct-upload
So if you would like your users to be able to upload large files, this is the only way forward.
Remember how web servers work.
Unless you're using a sort of async web setup like you could achieve with Node.JS or Erlang (just 2 examples), then every upload request your web application serves ties up an entire process or thread while the file is being uploaded.
Imagine that you're uploading a file that's several megabytes large. Most internet users don't have tremendously fast uplinks, so your web server spends a lot of time doing nothing. While it's doing all of that nothing, it can't service any other requests. Which means your users start to get long delays and/or error responses from the server. Which means they start using some other website to get the same thing done. You can always have more processes and threads running, but each of those costs additional memory which eventually means additional $.
By uploading straight to S3, in addition to the bandwidth savings that Justin Niessner mentioned and the Heroku workaround that Thomas Watson mentioned, you let Amazon worry about that problem. You can have a single-process webserver effectively handle very large uploads, since it punts that actual functionality over to Amazon.
So yeah, it's more complicated to set up, and you have to handle the callbacks to track things, but if you deal with anything other than really small files (and even in those cases), why cost yourself more money?
Edit: fixing typos

Resources