I would like to use multipart upload to allow uploading big files to the bucket split to chunks. Chunks should be uploaded in parallel, this the reason for use multipart upload (as far I know, resumable upload doesn't offer this feature).
https://cloud.google.com/storage/docs/multipart-uploads
The flow should looks in following way:
backend generates signedUrl for starting the upload
client calls previously generated url
backend generates signedUrl for uploading all the chunks
client uses previously generated url to upload multiple chunks
after all chunks are generated client confirms end of upload and chunks are merged to the file.
The problem I encountered, is following:
To perform PUT request to upload a chunk, two arguments needs to be provided: uploadId and partNumber. According to documentation, these arguments should be provided as query params. But signed url doesn't work if in the paths arguments differ from signature provided during generation. So for example, if i generate signed url with partNumber=1, I can't use it to upload chunk with partNumber=2. Is there any way to generate signed url with variable query params? Ideally something like https://.../uploadId=*&partNumber=*? Or the only option is to generate signed url for every chunk to match the signature?
Regards
I've checked the documentation, but didn't found anything useful. Unfortunately there is not much examples for multipart upload.
Related
The problem at hand
I have a rails app.
Users will be uploading files. Anywhere between 1 file to 3000 files. Sometimes they are zip files, and sometimes they are not. I do not want hold up the server with these files uploads, so I am looking for a solution around this problem.
The zipped files will have to be unzipped.
I then want to check whether: the user has previously uploaded the same files? i.e. if the user has already uploaded the same file(2) one week ago, then this is a problem: (i) either we don’t allow that particular file to be uploaded, or we ask the user: are you sure you want to upload the same file again?
Then I want to store the keys/links to the files within the appropriate models/records on the back end.
Was wondering what the best workflow for handling the above could be: i.e a very general overview: in other words, could AWS Lambda / Google cloud computing etc. etc be best employed to handle the above problem? How would we use the Shrine gem, to best handle this situation? Would it make sense to use AWS Lambda rather than using background jobs?
My preferences are to use the Shrine gem for uploading.
My Ideas:
In the client side, the user drags and drops the files the user
wants to upload.
All the files are then uploaded (whether zipped or otherwise) to a temporary bucket location via the Shrine gem.
IF the zip files are uploaded then perhaps an AWS lambda function must be triggered to unzip the files. If that’s the case,then at the end of the day, the keys for these files must somehow be returned to the client, to handle validation issues – but then how would the AWS lambda function be able to return this request to the original client side where the request was originated? Or rather,should the AWS lambda function be generated from the client side,passing in the IDs of the unzipped blobs?
Then we need to run some validations: we want to handle the situation where there are duplicate files. We will need to check with our rails backed as to whether those files have already been uploaded.
After those validation issues are handled, then user submits the form, and all the keys are stored within the appropriate records.
These ideas are by no means prescriptive
Am seeking some very general advise on what the best way is of doing this all. I am by no means constrained to AWS: I could use Google or Azure just as easily. Any guidance on the above would be much appreciated.
Specific questions:
How would the AWS lambda function get triggered?
How would be be able to return the keys of the uploaded files back to the client?
What do I mean by general overview?
Here are some examples of general overviews:
(1) Uploading & Unzipping files to S3 through Rails hosted on Heroku?
(2) https://www.quora.com/How-do-I-extract-large-zip-files-in-AWS-Lambda
Any pointers in the right direction would be much appreciated.
Cheers!
This isn't a really difficult problem to solve if you are willing to change the process flow a little bit.
In the client side, the user drags and drops the files the user wants to upload.
When the user requests the upload operation to begin you can make HTTP GET requests to an API Gateway endpoint, backed with a Lambda. The Lambda can query for previous files uploaded by the client and send back a result set showing what files already exist. You then filter those out and send only what is considered new from the client to the server. This will save the user time in waiting for the upload to happen and save you time on the S3/Lambda side of not having to store duplicates or process them. This isn't a substitute for server-side validation though, you'll still want to do that. For legit clients, this will save you and them a lot of bandwidth and storage.
All the files are then uploaded (whether zipped or otherwise) to a temporary bucket location via the Shrine gem.
This works. As they enter the temp bucket, use a Lambda with an S3 event to process the files, unzip files, push any metadata needed into DynamoDb and delete the files from the temp bucket. In the temp bucket, I would place the files into a folder that is unique per request and user. I would take the user/client Id and a UUID of some kind and make that your folder name. Such as Johnathon+3b5339b8-c8db-4d5c-b678-406fcf073f4f, or encode this value into a Base64 string and make that your folder name. Store this in DynamoDb with each file uploaded into your permanent bucket with the Hash Key being the userid/clientid, a Sort Key being the full folder path + file name and an extra attribute of IsProcessed. The IsProcessed attribute will be updated by your Lambda that is processing the files and moving them to their permanent S3 bucket. If there are errors, you can put the error in this field. If it is successful then you put it in this field.
the keys for these files must somehow be returned to the client, to handle validation issues – but then how would the AWS lambda function be able to return this request to the original client side where the request was originated? Or rather,should the AWS lambda function be generated from the client side,passing in the IDs of the unzipped blobs?
The original API request to push the files to the temp S3 bucket would be able to return back to the client the folder name johnathon+3b5339b8-c8db-4d5c-b678-406fcf073f4f to the client. So let's say you made a HTTP POST to /jobs. You would return back 201 Created with a HTTP Header of Location /jobs/johnathon+3b5339b8-c8db-4d5c-b678-406fcf073f4f. Your client can then start polling /jobs/johnathon+3b5339b8-c8db-4d5c-b678-406fcf073f4f for the status of the process.
Your response back to /jobs/johnathon+3b5339b8-c8db-4d5c-b678-406fcf073f4f can return the DynamoDB records. This would include all DynamoDB records for the HashKey matching the folder name. Your client side can look at all of the objects in the result set and check the IsProcessed attribute to see if everything worked out ok, or if there were issues.
Then we need to run some validations: we want to handle the situation where there are duplicate files. We will need to check with our rails backed as to whether those files have already been uploaded.
Handle this with the Lambda that is executed by the temporary bucket. Grab the files from the temp bucket folder, handle your business logic and back-end queries then push them to their final permanent bucket.
After those validation issues are handled, then user submits the form, and all the keys are stored within the appropriate records.
All of this would happen asynchronously, starting when the user submits the form. The client side needs to be able to handle this by making HTTP GET requests to the endpoint mentioned above, checking for the status of the process. This gives you some more flexibility as you can also publish SNS messages on failures as well, such as sending an email to the clients if they upload 3,000 files and you need to spend 30 minutes processing them.
I have a rails-api project, which provide the api to access my data.
I use carrierwave to store my file, my model called User and file attribute called image.
So, the image attribute contained the file_name, url and some other info.
In order to translate the file through the api, I added the gem carrierwave-base64.
I understand the Upload process. The client app encode the file to base64 code, then sent to backend by a json message. For example:
{user: {email: "test#email.com", image: "data:image/jpg;base64,#{base64_image}"}
So when the backend receive the json request, the carrierwave will parse the base64 code to a file and store it to local or S3
What I do not understand, is the Download process:
When I request the user info, what I assume is that the image file would be transfered as a base64 code in a json message, and then the client app will encode the base64 code to a file(image), and then display.
But actually, what I can provide for the json data, is the file url, not the base64 code.
The reason I want to get the file(image) from the api-server is because I don't want to the client app directly access s3 by url. So every time when the client app want to get a file, it will request the api-server, and api-server will get the file and transfer to the client.
Does anyone can explain how to do the download?
Or if I was thought in a wrong strategy, that I need another api endpoint to response a file object, not just accompany with user model.
Cheers.
Restricting Access to Objects Stored on Amazon S3
https://github.com/thoughtbot/paperclip/wiki/Restricting-Access-to-Objects-Stored-on-Amazon-S3
you did a good thing with uploading ,But while downloading you need to send URL no base64 and its traditional
Also for securrity purpose you can put public read permission on s3 while uploading and use expiring_url(60, :thumb) for your clients
In this URL get expired in time that you have specified
I'm trying to make an app where I take pictures from users add them to a canvas, draw stuff in them, then convert them to a base64 string and upload them.
For this purpose I'm considering the possibility to use a cdn but can't find information on what I can upload to them and how the client side uploading works. I'd like to be able to send the image as base64 and the name to be given to the file, so that when it arrives to the origin cdn, the base64 image is decoded and saved under the specified name (which I will add to the database on the server).Is this possible?Can I have some kind of save.php file on the origin cdn where I write my logic to save the file and to which I'll send XHR requests? Or how this whole thing work?I know this question may sound trivial but I'm looking for it for hours and still didn't find anything which explains in detail how the client side uploading work for CDNs.
CDNs usually do not provide such uploading service for client side, so you can not do it in this way.
Based on this sample http://aws.amazon.com/articles/0006282245644577, it is clear how to use multipart upload using the AWS iOS SDK. However, it seems that my uploads are not stitched together correctly when I try to resume an interrupted upload.
I use the code below to resume an upload. Is this the correct way to set the upload id of a multipart upload?
S3InitiateMultipartUploadRequest *initiateRequest = [[S3InitiateMultipartUploadRequest alloc] initWithKey:key inBucket:bucketName];
S3InitiateMultipartUploadResponse *initiateResponse = [self.amazonS3Client initiateMultipartUpload:initiateRequest];
self.multipartUpload = [initiateResponse multipartUpload];
// Set upload id to resume upload
self.multipartUpload.uploadId = uploadId;
I'd appreciate any help or pointers.
Your code should be robust enough to handle cases where you may need to track which parts were uploaded. Part Uploads of the multipart upload can be done in many ways (either in parallel, multithreaded manner or one after the other in sequence).
Whatever the above approach may be, you can use the listParts API to determine how many parts were successfully uploaded. Since you would already have the upload ID your design must support the ability to continue from the following part upload.
GET /ObjectName?uploadId=UploadId HTTP/1.1
Host: BucketName.s3.amazonaws.com
Date: Date
Authorization: Signature
Another useful resource to help optimize multipart uploads: http://aws.typepad.com/aws/2010/11/amazon-s3-multipart-upload.html
I've a file I want to send to the ebay system to support the LMS.
All the samples I've found include the use of the API, but the environment I'm working in doesn't have the ability to use it (the api).
So I'm forced to send the file with an HTTP post. But the doc's seem lacking.
Has anyone constructed/found an example of a HTTP post that will send a given file.
EDIT:
Oh.. what I see in the samples I have found is an area that seems it's supposed to have the data, but in the sample, there's nothing I'd consider real data.
Are you talking about the file transfer service or the bulk upload service? Don't you just generate an xml document and post the url like in this example:
http://developer.ebay.com/DevZone/file-transfer/CallRef/uploadFile.html#Samples