Cannot upload files bigger than 8GB to Amazon S3 by multi-part upload Java API due to broken pipe - upload

I implemented S3 multi-part upload in Java, both high level and low level version, based on the sample code from
http://docs.amazonwebservices.com/AmazonS3/latest/dev/index.html?HLuploadFileJava.html and http://docs.amazonwebservices.com/AmazonS3/latest/dev/index.html?llJavaUploadFile.html
When I uploaded files of size less than 4 GB, the upload processes completed without any problem. When I uploaded a file of size 13 GB, the code started to show IO exception, broken pipes. After multiple retries, it still failed.
Here is the way to repeat the scenario. Take 1.1.7.1 release,
create a new bucket in US standard region
create a large EC2 instance as the client to upload file
create a file of 13GB in size on the EC2 instance.
run the sample code on either one of the high-level or low-level API S3 documentation pages from the EC2 instance
test either one of the three part size: default part size (5 MB) or set the part size to 100,000,000 or 200,000,000 bytes.
So far the problem shows up consistently. I did a tcpdump. It appeared the HTTP server (S3 side) kept resetting the TCP stream, which caused the client side to throw IO exception - broken pipe, after the uploaded byte counts exceeding 8GB . Anyone has similar experiences when upload large files to S3 using multi-part upload?

Related

aws iot file transfer

I am trying to use AWS IoT to communicate with my beaglebone board, I got the MQTT messages transferring from the board to the server using MQTT. I was wondering if there is a way to transfer files (text or binary) files to the server and from the server to beaglebone using AWS IoT.
The payload of a MQTT message is just a byte stream, so can carry just about anything (up to the max size of 268,435,456 bytes according to the spec [AWS may have other limits in it's implementation]).
You will have to implement your own code to publish files and to subscribe and save files. You will also have to implement a payload format that includes any metadata you might need (e.g. file names)
You can transfer file using MQTT, but you should first divide it to smaller pieces and then send it, but the payload has limitations 128 kB. More information about AWS IoT and its limitations here.
But I would suggest not using MQTT to transfer files, because messaging also cost money, so if the file size is big and you send it periodically, then it may cost you much. You can find AWS IoT Core prices here.
You can upload your file(s) to S3 bucket and then access the file(s) from there.

How to transfer files from iPhone to EC2 instance or EBS?

I am trying to create an iOS app, which will transfer the files from an iPhone to a server, process them there, and return the result to the app instantly.
I have noticed that AWS offers an SDK to transfer files from iOS app to S3, but not to EC2 (or at least to EBS which can be attached to EC2). I wonder why I have to go through S3, when my business logic doesn't warrant storage of files. I have used file system softwares such as s3cmd and s3fs to connect to S3 from EC2, but they are very slow at transferring files to EC2. I am concerned that the route through S3 will kill time, especially when the users expect result in a split second.
Could you please guide me on how can I bypass the S3 route to transfer files in real time from iOS app to EC2 (or EBS)?
Allowing an app to write directly to an instance file system is a non starter, short of treating it as a network drive which would be pretty convoluted, not to mention the security issues youll almost certainly have. This really is what s3 is there for. You say you are seeing bad performance between ec2 and s3, this does not sound at all right, this will be an internal datacenter connection which would be at the very least several orders of magnitude faster than a connection from a mobile device to the amazon datacentre. Are you sure you created your bucket and instance in the same region? Alternatively it might be the clients you're using, dont try and setup file system access, just use the aws cli.
If you are really tied to the idea of going direct to the ec2 instance you will need to do it via some network connection, either by running a web server or perhaps using some variety of copy over ssh if that is available on ios. It does seem pointless to set this up when s3 has already done it for you. Finally depending on how big the files are you may be able to get away with sqs or some kind of database store.
It's okay being a newbie!! I ran up against exactly the same processing problem and solved it by running a series of load-balanced webservers where the mobile calls an upload utility, uploads the file, processes it, and then deploys the results to s3 using a signed URL which the mobile can display. It is fast, reliable and secure. The results are cached using CloudFront so once written, are blazing fast to re-access on the mobile. hope this helps.

Is it dangerous for performance to provide in MVC file download as Stream Forwarding from another Stream source

I want to provide in Azure MVC web site a Download link for files that are stored in Blob storage. I do not want the users see my blob storage Url and I want to provide my own dowload link to provide the name of the file by this as well.
I think this can be done with passing(forwarding) the stream. Found many similar questions here in SO, eg here: Download/Stream file from URL - asp.net.
The problem what I see is here: Imagine 1000 users start downloading one file simultaneously. This will totaly kill my server as there is limited number of threads in the pool right?
I should say, that the files I want to forward are about 100MB big so 1 request can take about 10 minutes.
I am right or can I do it with no risks? Would async method in MVC5 help? Thx!
Update: My azure example is here only to give some background. I am actualy interrested in the theoretical problem of the Long Streaming Methods in MVC.
in your situation Lukas, I'd actually recommend you look at using the local, temporary storage area for the blob and serve it up from there. This will result in a delay in delivering the file the first time, but all subsequent requests will be faster (in my experience) and result in fewer azure storage transaction calls. it also then eliminates the risk of running into throttling on the azure storage account or blob. Your throughput limits would be based on the outbound bandwidth of the vm instance and number of connections it can support. I have a sample for this type of approach at: http://brentdacodemonkey.wordpress.com/2012/08/02/local-file-cache-in-windows-azure/

Mobile Chunked File HTTP Upload

Is it worth the development effort to chunk file (image) uploads from mobile devices over cell connections? By chunk I mean making a series of independent HTTP post requests with different parts of the file and reassembling on the server.
Specifically, I've heard that mobile network connections may drop the connection before a file is uploaded completely (where it doesn't happen over wifi). My hypothesis is that if I chunk the upload, the dropped connection will be less likely. My concern is that in practice, it isn't more reliable, and now I've introduced latency.
If the receiving end of the upload allows you to resume the upload at a particular chunk it could be very useful. This would allow you to minimize the impact of network errors by not having to restart the entire upload when an error is encountered.
For example, if you were using S3 as the backend you could leverage their Multipart Upload

Using EC2 to resize images stored on S3 on demand

We need to serve the same image in a number of possible sizes in our app. The library consists of 10's of thousands of images which will be stored on S3, so storing the same image in all it's possible sizes does not seem ideal. I have seen a few mentions on Google that EC2 could be used to resize S3 images on the fly, but I am struggling to find more information. Could anyone please point me in the direction of some more info or ideally, some code samples?
Tip
It was not obvious to us at first, but never serve images to an app or website directly from S3, it is highly recommended to use CloudFront. There are 3 reasons:
Cost - CloudFront is cheaper
Performance - CloudFront is faster
Reliability - S3 will occasionally not serve a resource when queried frequently i.e. more than 10-20 times a second. This took us ages to debug as resources would randomly not be available.
The above are not necessarily failings of S3 as it's meant to be a storage and not a content delivery service.
Why not store all image sizes, assuming you aren't talking about hundreds of different possible sizes? Storage cost is minimal. You would also then be able to serve your images up through Cloudfront (or directly from S3) such that you don't have to use your application server to resize images on the fly. If you serve a lot of these images, the amount of processing cost you save (i.e. CPU cycles, memory requirements, etc.) by not having to dynamically resize images and process image requests in your web server would likely easily offset the storage cost.
What you need is an image server. Yes, it can be hosted on EC2. These links should help starting off: https://github.com/adamdbradley/foresight.js/wiki/Server-Resizing-Images
http://en.wikipedia.org/wiki/Image_server

Resources