I run a node.js server on Amazon EC2. I am getting a huge csv file with data containing links to product images on a remote host. I want to crop and store the images in different sizes on Amazon S3.
How could this be done, preferably just with streams, without saving anything to disc?
I don't think you can get around saving the full-size image to disk temporarily, since resizing/cropping/etc would normally require having the full image file. So, I say ImageMagick.
Related
I want to upload a significant amount of images for processing to use in a Docker instance.
As I have observed this is normally done by a download script (where the images are downloaded to the instance).
I have several terabytes of images so I do not want to download them each time. Is there a way to get my images to a specific location in the Docker instance?
What is the standard way of doing this?
I am building a python cloud video pipeline that will read video from a bucket, perform some computer vision analysis and return frames back to a bucket. As far as I can tell, there is not a Beam read method to pass GCS paths to opencv, similar to TextIO.read(). My options moving forward seem to download the file locally (they are large), use GCS fuse to mount on a local worker (possible?) or write a custom source method. Anyone have experience on what makes most sense?
My main confusion was this question here
Can google cloud dataflow (apache beam) use ffmpeg to process video or image data
How would ffmpeg have access to the path? Its not just a question of uploading the binary? There needs to be a Beam method to pass the item, correct?
I think that you will need to download the files first and then pass them through.
However instead of saving the files locally, is it possible to pass bytes through to opencv. Does it accept any sort of ByteStream or input stream?
You could have one ParDo which downloads the files using the GCS API, then passes it to a opencv through a stream, ByteChannel stdin pipe, etc.
If that is not available, you will need to save the files to disk locally. Then pass opencv the filename. This could be tricky because you may end up using too much disk space. So make sure to garbage collect the files properly and delete the files from local disk after opencv processes them.
I'm not sure but you may need to also select a certain VM machine type to ensure you have enough disk space, depending on the size of your files.
Can somebody explain to me why I should possibly use ImageMagick (or the fork GraphicsMagick) in a CMS instead of simply displaying and sizing my images via CSS?
The browser methods cannot fail and are automatically updated on client side, the ImageMagick-binary can fail on the server and I have to maintain it by hand to not become obsolete.
ImageMagick offers lots of image manipulations not available via normal HTML/CSS operations. So for some tasks, your browser just can't do the modifications.
One very important task is simply file size: if a user uploads a 20MB image you don't want to deliver that to your clients with a 3G mobile connection and let them scale the image down: you want to have your server do that task once and then serve images that are substantially smaller in size.
I am currently using Amazon S3 server, in that i am able to upload images from iPhone.
Is there a possibility to manipulate (cropping, transformations, effects, face detection) the images that i get from amazon server.
There are no services in Amazon Web Services that provide image manipulation.
If you wish to manipulate images, you will need to write your own code (eg on a web server, using graphics libraries) or within your iPhone app.
We need to serve the same image in a number of possible sizes in our app. The library consists of 10's of thousands of images which will be stored on S3, so storing the same image in all it's possible sizes does not seem ideal. I have seen a few mentions on Google that EC2 could be used to resize S3 images on the fly, but I am struggling to find more information. Could anyone please point me in the direction of some more info or ideally, some code samples?
Tip
It was not obvious to us at first, but never serve images to an app or website directly from S3, it is highly recommended to use CloudFront. There are 3 reasons:
Cost - CloudFront is cheaper
Performance - CloudFront is faster
Reliability - S3 will occasionally not serve a resource when queried frequently i.e. more than 10-20 times a second. This took us ages to debug as resources would randomly not be available.
The above are not necessarily failings of S3 as it's meant to be a storage and not a content delivery service.
Why not store all image sizes, assuming you aren't talking about hundreds of different possible sizes? Storage cost is minimal. You would also then be able to serve your images up through Cloudfront (or directly from S3) such that you don't have to use your application server to resize images on the fly. If you serve a lot of these images, the amount of processing cost you save (i.e. CPU cycles, memory requirements, etc.) by not having to dynamically resize images and process image requests in your web server would likely easily offset the storage cost.
What you need is an image server. Yes, it can be hosted on EC2. These links should help starting off: https://github.com/adamdbradley/foresight.js/wiki/Server-Resizing-Images
http://en.wikipedia.org/wiki/Image_server