We have 20 different websites which are on two servers on AWS. All websites are using a web application called DesignMaker (MVC application using ImageMagick to do image composition and alterations) to do heavy image processing for users images. Users can upload images to that application and start to do some design with their uploaded images. you can assume that all the image processing is optimized in the code.
My concerns here is to remove load of heavy image process from Cpu of the main servers and put it on another server. So the first things that comes to my mind is to separate that application and convert it to a web service that runs on other servers. In this way we put the load of image processing on other machines. Please tell me if I have missed something.
Is calling an API to do some image processing a good approach?
What are other alternatives?
You're right to move image processing from your web thread, this is just bad practice.
If this was me (and I have done this in a few projects I've worked on) I would upload the image from the MVC app to a AWS S3 container then fire off a message using SQS or some queuing platform. Then have an Elastic Beanstalk instance listening for messages on the queue, when it gets the message it then picks up the images from S3 and processes it however you want.
(I'm an Azure guy so forgive me if I've picked the wrong AWS services, but the pattern is the same)
Related
I am building a web app with a ReactJS for my front-end and a rails back-end API. I have to display only 4 images in total in the whole app; these 4 images are picked among a group of approximatively 50 images, and that group isn't going to grow much (maximum 10 more images per year). The 4 images are supposed to change every 3 to 7 days.
So I was thinking, in terms of productivity, performance and price, what is the best way to handle my images between the following:
Create a local, static img folder in my React front-end, with all
the images, and import them in my components.
Use a Image upload/storage service like e.g. Cloudinary, Imgx, AWS S3... with my rails back-end to serve my images.
Or maybe there is even a better solution than these two ?
Due to the nature of the software you're describing, I'd suggest you go with creating local and static images in your react front-end app.
The main reason for this is that:
You've mentioned that it's not going to be more than 10 images per year, so it's easy to handle it manually whenever you need to update it.
You won't be depending on a third-party in terms of storage (unlike using AWS S3, or any other provider, where you'll be unnecessarily depending on it)
The images will work independently of the backend API server, so even if there's some kind of failure in the backend, the platform will be even more robust, by not depending on the backend server to show these images.
This will also reduce the bandwidth used between server & client, every image request will be "hitting" the client app, which should have been automatically cached the JS, CSS and Images file, so it'd be automatically optimized for better scaling.
I have a simple setup going for an API i'm building in rails.
A zip is uploaded via a POST, and I take the file, store it in rails.root/tmp using carrierwave and then background an s3 upload with sidekiq.
the reason i store the file temporarily is because i can't send a complex object to sidekiq, so i store it and send the id, and let sidekiq find it and do work with it, then delete the file once it's done.
the problem is that once it's time for my sidekiq worker to find the file by its path, it can't because it doesn't exist. i've read that heroku's ephemeral file system deletes its files when things are reconfigured/servers are restarted, etc.
none of these things are happening, however and the file doesn't exist. so my theory is that the sidekiq worker is actually trying to open the path that gets passed to it on its own filesystem since it's a separate worker and that file doesn't exist. can someone confirm this? if that's the case, are the any alternate ways to do this?
If your worker is executed on another dyno than your web process, you are experiencing this issue because of dyno isolation. read more about this here: https://devcenter.heroku.com/articles/dynos#isolation-and-security
Although it is possible to run sidekiq workers and the web process on the same machine (maybe not on heroku, i am not sure about that), it is not advisable to design your system architecture like that.
If your application grows or experiences temporarily high loads, you may want to spread the load across multiple servers, and usually also run your workers on separate servers than your web process in order to not block the web process in case that your workers are keeping the server busy.
In all those cases you can never share data on the local filesystem between the web process and the worker.
I would recommend to consider directly uploading the file to S3 using https://github.com/waynehoover/s3_direct_upload
This also takes a lot of load off your web server
I am currently using an AWS micro instance as a web server for a website that allows users to upload photos. Two questions:
1) When looking at my CloudWatch metrics, I have recently noticed CPU spikes, the website receives very little traffic at the moment, but becomes utterly unusable during these spikes. These spikes can last several hours and resetting the server does not eliminate the spikes.
2) Although seemingly unrelated, whenever I post a link of my website on Twitter, the server crashes (i.e.,Error Establishing a Database Connection). Once restarting Apache and MySQL, the website returns to normal functionality.
My only guess would be that the issue is somehow the result of deficiencies with the micro instance. Unfortunately, when I upgraded to the small instance, the site was actually slower due to fact that the micro instances can have two EC2 compute units.
Any suggestions?
If you want to stay in the free tier of AWS (micro instance), you should off load as much as possible away from your EC2 instance.
I would suggest you to upload the images directly to S3 instead of going through your web server (see some example for it here: http://aws.amazon.com/articles/1434).
S3 can also be used to serve most of your web pages (images, js, css...), instead of your weak web server. You can also add these files in S3 as origin to Amazon CloudFront (CDN) distribution to improve your application performance.
Another service that can help you in off loading the work is SQS (Simple Queue Service). Instead of working with online requests from users, you can send some requests (upload done, for example) as a message to SQS and have your reader process these messages on its own pace. This is good way to handel momentary load cause by several users working simultaneously with your service.
Another service is DynamoDB (managed NoSQL DB service). You can put on dynamoDB most of your current MySQL data and queries. Amazon DynamoDB also has a free tier that you can enjoy.
With the combination of the above, you can have your micro instance handling the few remaining dynamic pages until you need to scale your service with your growing success.
Wait… I'm sorry. Did you say you were running both Apache and MySQL Server on a micro instance?
First of all, that's never a good idea. Secondly, as documented, micros have low I/O and can only burst to 2 ECUs.
If you want to continue using a resource-constrained micro instance, you need to (a) put MySQL somewhere else, and (b) use something like Nginx instead of Apache as it requires far fewer resources to run. Otherwise, you should seriously consider sizing up to something larger.
I had the same issue: As far as I understand the problem is that AWS will slow you down when you reach a predefined usage. This means that they allow for a small burst but after that things will become horribly slow.
You can test that by logging in and doing something. If you use the CPU for a couple of seconds then the whole box will become extremely slow. After that you'll have to wait without doing anything at all to get things back to "normal".
That was the main reason I went for VPS instead of AWS.
I am using PhantomJS to dynamically generate 10 large images of websites at a time in each request. Therefore it is important that I cache these images and check if they are cached so I can serve them up. I've never cached images before, so I have no idea how to do this.
Some other information:
PhantomJS writes images to your local filesystem at the path you specify.
I want to cache these images but also need to balance that with updating the cache if the websites have updated.
I will be running these image generation processes in parallel.
I'm thinking of using Amazon's Elastic MapReduce to take advantage of Hadoop and to help with the load. I've never used it before, so any advice here would be appreciated.
I am pretty much a complete noob with this, so in depth explanations with examples would be really helpful.
What's your front-end web server? Since PhantomJS can write images to your local filesystem at any path you specify, you should specify the document root of your web server so you're serving them statically. This way Rails doesn't even have to be involved.
I'm developing a Ruby on Rails application that needs to allow the user to simultaneously upload 16 high-quality images at once.
This often means somewhere around 10-20 megabytes (sometimes more), but it's the number of connections that are becoming the most pertinent issue.
The images are being sent to Amazon S3 from Paperclip, which unfortunately opens and closes a new connection for each of the 16 files. Needless to say, I need to move the system to run as background processes to keep my web server from locking up like it already is with no traffic.
My question is, out of all the Rails-based systems to use for background jobs (Starling, BackgroundRb, Spawn, etc.), if there is one that might fit the bill for this scenario better than the others (I'm new to building an in-the-background system anyway, so all of the available systems are equally new to me)?
There's no shortage of rails plugins to do async processing, and basically all of them work fine. Personally I like Delayed Job's api best.
I wouldn't use Starling or other actual queue daemons since for this task using the database to store any necessary state should be just fine.
This might help!
http://aaronvb.com/blog/2009/7/19/paperclip-amazon-s3-background-upload-using-starling-and-workling
EDIT:
It's not possible, through a normal html multipart form, to send files to the background. They have to be done through that request. If you are looking for a way around that, you can try SWFUpload and then once that's done use a background process to handle the Amazon S3 uploads.
this is also a good survey blog post http://4loc.wordpress.com/2010/03/10/background-jobs-in-ruby-on-rails/
I like swfupload, we use it on some S3 apps that we wrote. It's proven to be very fast and stable. You can have actions fired off via Ajax after the uploads, etc… We have had a ton of uploads go through it with 0 failures.