I would like to know why if someone is facing the same problem to get your Rails Assets files from AWS S3 bucket!
and why keep showing this access denied when i try to get the css upload by AssetSync
Thank you vey much
By default, objects on S3 are "private" -- they are only accessible if you prove to "own" those objects by providing some credentials in the query string.
To make the objects publicly accessible (ie, without having to sign the requests), you need to attach a policy to the bucket.
To add that permission, go to S3 on the AWS Management Console, click on your bucket, select Properties, and there you will see "Permissions". Try that.
Related
The problem at hand
I have a rails app.
Users will be uploading files. Anywhere between 1 file to 3000 files. Sometimes they are zip files, and sometimes they are not. I do not want hold up the server with these files uploads, so I am looking for a solution around this problem.
The zipped files will have to be unzipped.
I then want to check whether: the user has previously uploaded the same files? i.e. if the user has already uploaded the same file(2) one week ago, then this is a problem: (i) either we don’t allow that particular file to be uploaded, or we ask the user: are you sure you want to upload the same file again?
Then I want to store the keys/links to the files within the appropriate models/records on the back end.
Was wondering what the best workflow for handling the above could be: i.e a very general overview: in other words, could AWS Lambda / Google cloud computing etc. etc be best employed to handle the above problem? How would we use the Shrine gem, to best handle this situation? Would it make sense to use AWS Lambda rather than using background jobs?
My preferences are to use the Shrine gem for uploading.
My Ideas:
In the client side, the user drags and drops the files the user
wants to upload.
All the files are then uploaded (whether zipped or otherwise) to a temporary bucket location via the Shrine gem.
IF the zip files are uploaded then perhaps an AWS lambda function must be triggered to unzip the files. If that’s the case,then at the end of the day, the keys for these files must somehow be returned to the client, to handle validation issues – but then how would the AWS lambda function be able to return this request to the original client side where the request was originated? Or rather,should the AWS lambda function be generated from the client side,passing in the IDs of the unzipped blobs?
Then we need to run some validations: we want to handle the situation where there are duplicate files. We will need to check with our rails backed as to whether those files have already been uploaded.
After those validation issues are handled, then user submits the form, and all the keys are stored within the appropriate records.
These ideas are by no means prescriptive
Am seeking some very general advise on what the best way is of doing this all. I am by no means constrained to AWS: I could use Google or Azure just as easily. Any guidance on the above would be much appreciated.
Specific questions:
How would the AWS lambda function get triggered?
How would be be able to return the keys of the uploaded files back to the client?
What do I mean by general overview?
Here are some examples of general overviews:
(1) Uploading & Unzipping files to S3 through Rails hosted on Heroku?
(2) https://www.quora.com/How-do-I-extract-large-zip-files-in-AWS-Lambda
Any pointers in the right direction would be much appreciated.
Cheers!
This isn't a really difficult problem to solve if you are willing to change the process flow a little bit.
In the client side, the user drags and drops the files the user wants to upload.
When the user requests the upload operation to begin you can make HTTP GET requests to an API Gateway endpoint, backed with a Lambda. The Lambda can query for previous files uploaded by the client and send back a result set showing what files already exist. You then filter those out and send only what is considered new from the client to the server. This will save the user time in waiting for the upload to happen and save you time on the S3/Lambda side of not having to store duplicates or process them. This isn't a substitute for server-side validation though, you'll still want to do that. For legit clients, this will save you and them a lot of bandwidth and storage.
All the files are then uploaded (whether zipped or otherwise) to a temporary bucket location via the Shrine gem.
This works. As they enter the temp bucket, use a Lambda with an S3 event to process the files, unzip files, push any metadata needed into DynamoDb and delete the files from the temp bucket. In the temp bucket, I would place the files into a folder that is unique per request and user. I would take the user/client Id and a UUID of some kind and make that your folder name. Such as Johnathon+3b5339b8-c8db-4d5c-b678-406fcf073f4f, or encode this value into a Base64 string and make that your folder name. Store this in DynamoDb with each file uploaded into your permanent bucket with the Hash Key being the userid/clientid, a Sort Key being the full folder path + file name and an extra attribute of IsProcessed. The IsProcessed attribute will be updated by your Lambda that is processing the files and moving them to their permanent S3 bucket. If there are errors, you can put the error in this field. If it is successful then you put it in this field.
the keys for these files must somehow be returned to the client, to handle validation issues – but then how would the AWS lambda function be able to return this request to the original client side where the request was originated? Or rather,should the AWS lambda function be generated from the client side,passing in the IDs of the unzipped blobs?
The original API request to push the files to the temp S3 bucket would be able to return back to the client the folder name johnathon+3b5339b8-c8db-4d5c-b678-406fcf073f4f to the client. So let's say you made a HTTP POST to /jobs. You would return back 201 Created with a HTTP Header of Location /jobs/johnathon+3b5339b8-c8db-4d5c-b678-406fcf073f4f. Your client can then start polling /jobs/johnathon+3b5339b8-c8db-4d5c-b678-406fcf073f4f for the status of the process.
Your response back to /jobs/johnathon+3b5339b8-c8db-4d5c-b678-406fcf073f4f can return the DynamoDB records. This would include all DynamoDB records for the HashKey matching the folder name. Your client side can look at all of the objects in the result set and check the IsProcessed attribute to see if everything worked out ok, or if there were issues.
Then we need to run some validations: we want to handle the situation where there are duplicate files. We will need to check with our rails backed as to whether those files have already been uploaded.
Handle this with the Lambda that is executed by the temporary bucket. Grab the files from the temp bucket folder, handle your business logic and back-end queries then push them to their final permanent bucket.
After those validation issues are handled, then user submits the form, and all the keys are stored within the appropriate records.
All of this would happen asynchronously, starting when the user submits the form. The client side needs to be able to handle this by making HTTP GET requests to the endpoint mentioned above, checking for the status of the process. This gives you some more flexibility as you can also publish SNS messages on failures as well, such as sending an email to the clients if they upload 3,000 files and you need to spend 30 minutes processing them.
I have a website hosted on Heroku, and using Ruby on Rails with the paperclip gem.
I am trying to prevent hotlinking to all my files in my S3 bucket, so I have everything on private and only allow user to access using an expiring URL
I want to provide a more user-friendly page when user tries to reuse an expired URL. Currently it is showing the message below:
<Error>
<Code>AccessDenied</Code>
<Message>Request has expired</Message>
<X-Amz-Expires>300</X-Amz-Expires>
<Expires>2016-04-15T19:41:33Z</Expires>
<ServerTime>2016-04-15T19:41:39Z</ServerTime>
<RequestId>D5DD935553A2CF88</RequestId>
<HostId>
55+rFtFbksDMyBWf5cWwgJ+aWvJKwe5umSXgTEWYKgfoT5QR5sbJY9fRNFIiBAqd35OR2MoiCzQ=
</HostId>
</Error>
Is there a way to customize the error page on S3?
S3 offers custom error pages through the web site endpoints -- but not the REST endpoints... but signed URLs only work on the REST endpoints, and not the web site endpoints.
So, no, there is not a way to directly solve this using only S3.
One option is to use CloudFront, which offers the ability to replace the standard error pages with a custom static page, but the error content is lost and all you have is a static page. You also have to use the CloudFront URL signing mechanism, which is different than S3 (though it also has some advantages, such as wildcard support in a signed URL).
In this answer to a question that is similar, but not a complete duplicate I demonstrated the way I've used an XSL transform to "style" the S3 error XML, by modifying the XML returned to the browser, injecting a link to the XSL stylesheet, and letting the browser do the rest of the work... see the screen shots.
I'm quite pleased with the solution, though it has what some people would consider a drawback -- it requires all of the S3 requests be served via a proxy server running HAProxy in EC2. There's a small additional cost for the EC2 instance, but no added cost for the bandwidth, since the transfer from S3 into EC2 is free, and the transfer from EC2 to the Internet is the same price as transfer from S3 to the Internet. With this setup, the S3 signed URLs still work. The additional advantages in my application us that this allows me to use my SSL certs with S3 static content (although this capability is also available through CloudFront), and the fact that the proxy's access logs are in real-time.
I'm giving Amazon Web Services a try for the first time and getting stuck on understanding the credentials process.
From a tutorial from awsblog.com, I gather that I can upload a file to one of my AWS "buckets" as follows:
s3 = Aws::S3::Resource.new
s3.bucket('bucket-name').object('key').upload_file('/source/file/path')
In the above circumstance, I'm assuming he's using the default credentials (as described here in the documentation), where he's using particular environment variables to store the access key and secret or something like that. (If that's not the right idea, feel free to set me straight.)
The thing I'm having a hard time understanding is the meaning behind the .object('key'). What is this? I've generated a bucket easily enough but is it supposed to have a specific key? If so, how to I create it? If not, what is supposed to go into .object()?
I figure this MUST be out there somewhere but I haven't been able to get it (maybe I'm misreading the documentation). Thanks to anyone who gives me some direction here.
Because S3 doesn't have traditional directories, what you would consider the entire 'file path' in your client machines, i.e. \some\directory\test.xls becomes the 'key'. The object is the data in the file.
Buckets are unique across S3, and the keys must be unique within your bucket.
As far as the credentials, there are multiple ways of providing them - one is to actually supply the id and secret access key right in your code, another is to store them in a config file somewhere on your machine (this varies by OS type), and then when you are running your code in production, i.e. on an EC2 instance, the best practice is to start your instance with a IAM Role assigned, and then anything that runs on that machine automatically has all of the permissions of that role. This is the best/safest option for code that runs in EC2.
I want to download some reports from Google Cloud Storage and I'm trying the Gcloud gem. I managed to successfully connect and now I am able to list my buckets, create one, etc.
But I can't find a way to programically get files from buckets, which are shared with me. I got and address like gs://pubsite... and I need to connect to that bucket, to download some files. How can I achieve that?
Do I need to have billing enabled?
In order to list all the object in a bucket you can use Google Cloud Storage Object list API.
You need to provide the Bucket ID and should have the access to the bucket to read the objects. You can try the API before implementing it in your code.
I hope that helps.
You do not need billing enabled to download objects from a GCS bucket. Operations on GCS buckets are billed to the project that owns the bucket. You only need to enable billing in order to create a new bucket.
Downloading a single file using the Gcloud gem looks like this:
require "gcloud"
gcloud = Gcloud.new
storage = gcloud.storage
bucket = storage.bucket "pubsite"
file = bucket.file "somefile.png"
file.download "/tmp/somefile.png"
There are some examples at http://googlecloudplatform.github.io/gcloud-ruby/docs/v0.2.0/Gcloud/Storage.html
I am creating an iOS app with S3 currently without distributions (CloudFront) as a test before I divulge into creating a full pledged app. In the S3 Management Console, I have made my bucket in Singapore, where I live, so CloudFront isn't really needed for this demo. I have to set an endpoint like this:
[s3Client setEndpoint: [AmazonEndpoints s3Endpoint: AP_SOUTHEAST_1]];
Which points to Singapore, endpoint is the place the bucket needs to send the data off to right? (Where the user is)
So now I have two questions
If I am using CloudFront, do I need to set an endpoint? How do I even use CloudFront in iOS, I generate a signed URL then what?
If a user is using the app in a random country lets say, what endpoint, if I need to set (with CloudFront), would I set it to? Would I find their current country via the locale and find which endpoint it is closest to?
Thanks!
A set of files in CloudFront is called a "distribution." When you set up a distribution, you specify one or more "origins", which is/are the canonical source of the files you're serving to your users.
In your case, create a new distribution and specify the S3 bucket as the origin. Then in your application, you'd reference it as: http://xxxxxxx.cloudfront.net/hello.png rather than http://mybucket.s3.amazonaws.com/hello.png. Cloudfront will automatically fetch hello.png from the S3 bucket the first time someone requests it and cache it.
CloudFront automatically (and near-instantaneously) detects which edge location is closest to the user by routing them based on network latency. You don't have to do any of these calculations yourself.
I'd recommend that you read the caveats that I've listed here though before using CloudFront in your app.
I agree with #jamieb. You should create a new Cloudfront distribution and set the S3 bucket as the origin. Then, you will no longer use the s3 bucket link, you will now use the cloudfront link to view the image. Cloudfront will pull image from S3 and store it as a cache for however long you determine. For example, if the image is going to be looked at constantly by different people in the same region, you are going to want it cached in the edge location in that region, so when a new user in that region looks it up, they get the image much more quickly.