Using EC2 to resize images stored on S3 on demand - image-processing

We need to serve the same image in a number of possible sizes in our app. The library consists of 10's of thousands of images which will be stored on S3, so storing the same image in all it's possible sizes does not seem ideal. I have seen a few mentions on Google that EC2 could be used to resize S3 images on the fly, but I am struggling to find more information. Could anyone please point me in the direction of some more info or ideally, some code samples?
Tip
It was not obvious to us at first, but never serve images to an app or website directly from S3, it is highly recommended to use CloudFront. There are 3 reasons:
Cost - CloudFront is cheaper
Performance - CloudFront is faster
Reliability - S3 will occasionally not serve a resource when queried frequently i.e. more than 10-20 times a second. This took us ages to debug as resources would randomly not be available.
The above are not necessarily failings of S3 as it's meant to be a storage and not a content delivery service.

Why not store all image sizes, assuming you aren't talking about hundreds of different possible sizes? Storage cost is minimal. You would also then be able to serve your images up through Cloudfront (or directly from S3) such that you don't have to use your application server to resize images on the fly. If you serve a lot of these images, the amount of processing cost you save (i.e. CPU cycles, memory requirements, etc.) by not having to dynamically resize images and process image requests in your web server would likely easily offset the storage cost.

What you need is an image server. Yes, it can be hosted on EC2. These links should help starting off: https://github.com/adamdbradley/foresight.js/wiki/Server-Resizing-Images
http://en.wikipedia.org/wiki/Image_server

Related

Serverless machine learning: where should one store their models?

I'm deploying a serverless NLP app, made using BERT. I'm currently using Serverless Framework and AWS ECR to overcome AWS Lambda deployment package limit of 250 MB (PyTorch already occupies more than that space).
I'm quite happy with this solution as it allows me to simply dockerize my app, upload it to ECR and worry about nothing else.
One doubt I have is where should I store the models. My app uses 3 different saved models, each with a size of 422 MB. I have two options:
Copy my models in the docker image itself.
Pros: If I retrain my model it will be automatically updated when I redeploy the app and I don't have to use AWS SDK to load objects from S3
Cons: Docker image size is very large
Store my models in S3:
Pros: Image size is smaller than the other solution (1+ GB vs 3+ GB)
Cons: If I retrain my models I then need to manually update them on S3, as they are decoupled from the app deployment pipeline. Also I need to load them from S3 using AWS SDK (probably adding some overhead?).
So my question ultimately is: of the two solutions, which is the best practice? Why, why not? Is there even a best practice at all or is it based on preferences / need?
There is a third option that might be great for you: Store your models on a EFS volume.
EFS volumes are like additional hard drives that you can attach to your Lambda. They can be pretty much as big as you want.
After you trained your model just copy it to your EFS volume. You configure your Lambda to mount that EFS volume when it boots and voila, your model is available without any fuzz. No copying from S3 or putting it in a Docker image. And the same EFS volume can be mounted to more than one Lambda at the same time.
To learn more read:
Announcement blog post
Documentation
Update 25.08.2021
User #wtfzamba tried this solution and came across a limitation that might be of interested to others:
I did indeed try the solution you suggested. It works well, but only to a point, and I'm referring to performance. In my situation, I need to be able to spin up ~100 lambdas concurrently when I do batch classification, to speed up the process. The problem is that EFS throughput cap is not PER connection, but in total. So the 300MB/s of burst throughput that I was allowed seemed to be shared by each lambda instance, which at that point timed out even before being able to load the models into memory.
Keep this in mind when you choose this option.

How to transfer files from iPhone to EC2 instance or EBS?

I am trying to create an iOS app, which will transfer the files from an iPhone to a server, process them there, and return the result to the app instantly.
I have noticed that AWS offers an SDK to transfer files from iOS app to S3, but not to EC2 (or at least to EBS which can be attached to EC2). I wonder why I have to go through S3, when my business logic doesn't warrant storage of files. I have used file system softwares such as s3cmd and s3fs to connect to S3 from EC2, but they are very slow at transferring files to EC2. I am concerned that the route through S3 will kill time, especially when the users expect result in a split second.
Could you please guide me on how can I bypass the S3 route to transfer files in real time from iOS app to EC2 (or EBS)?
Allowing an app to write directly to an instance file system is a non starter, short of treating it as a network drive which would be pretty convoluted, not to mention the security issues youll almost certainly have. This really is what s3 is there for. You say you are seeing bad performance between ec2 and s3, this does not sound at all right, this will be an internal datacenter connection which would be at the very least several orders of magnitude faster than a connection from a mobile device to the amazon datacentre. Are you sure you created your bucket and instance in the same region? Alternatively it might be the clients you're using, dont try and setup file system access, just use the aws cli.
If you are really tied to the idea of going direct to the ec2 instance you will need to do it via some network connection, either by running a web server or perhaps using some variety of copy over ssh if that is available on ios. It does seem pointless to set this up when s3 has already done it for you. Finally depending on how big the files are you may be able to get away with sqs or some kind of database store.
It's okay being a newbie!! I ran up against exactly the same processing problem and solved it by running a series of load-balanced webservers where the mobile calls an upload utility, uploads the file, processes it, and then deploys the results to s3 using a signed URL which the mobile can display. It is fast, reliable and secure. The results are cached using CloudFront so once written, are blazing fast to re-access on the mobile. hope this helps.

ImageResizer subfolders how to choose

We are using ImageResizer plugin with Diskcahce plugin for caching, http://imageresizing.net/plugins/diskcache.
Our web sites uses millions of image. Total size of all the images is close to 2 TB.
So to maintain disk caching, how many subfolders do we have to specify?
As per the documentation I have read here : http://imageresizing.net/plugins/diskcache
given a desired cache size of 100,000 items, this should be set to 256.
But in our case, cache size will be quite more.
Can we use this plugin ?
If yes, how many subfolders do we have to use?
Thanks in advance..
Set the subfolders count to ((max images)/400), then round up to the nearest power of two. 16387 will support around 6 million images. You're going to have a hard time scaling NTFS past 4 million or so; the Windows filesystem gets bogged down.
If you can, Prefer "Proxy" of images.
The proxy will be faster (no ASP.net Serverside code will be used) than create and fill a large folder of rarely-accessed files and manage it.
IIS Proxy, Varnish Proxy, NGINX, or any other proxy can maintain a cache, and It will works better than imageresizer/asp.net-mvc.

Cropping and resizing images on the fly with node.js

I run a node.js server on Amazon EC2. I am getting a huge csv file with data containing links to product images on a remote host. I want to crop and store the images in different sizes on Amazon S3.
How could this be done, preferably just with streams, without saving anything to disc?
I don't think you can get around saving the full-size image to disk temporarily, since resizing/cropping/etc would normally require having the full image file. So, I say ImageMagick.

Rails gallery app hosting

I'm building a rails gallery app for a company that doesn't want to host it themselves. They estimate that it could potentially get 1000 or so unique visits a day. What I'm pondering is... should the images be on a static file server such as S3 or rackspace cloudfile, or should they just be left on the same server as the app? There is plenty of room on the app server for them. But will the cacheing play a huge negative?
What are your thoughts?
Also, I haven't decided on a host... though I was leaning toward webbynode... but should i be looking at something else instead?
(They want the hosting to be less than $35/month.)
Thanks.
The main variable you need to consider is latency. Since your traffic is relatively low, you can self-host for $0 extra, or host on S3 and pay for bandwidth. The benefit of S3 is better latency for users across disparate locations.
If it were me, I'd self-host to keep the complexity of the app low. Then only move to S3 if you really need the optimization of the CDN.

Resources