Is it possible to host the application in one server and queue jobs in another server?
Possible examples:
Two different EC2 instances, one with the main server and the second with the queueing service.
Host the app in Heroku and use an EC2 instance with the queueing service
Is that possible?
Thanks
Yes, definitely. We have delayed_job set up that way where I work.
There are a couple of requirements for it to work:
The servers have to have synced clocks. This is usually not a problem as long as the server timezones are all set to the same.
The servers all have to access the same database.
To do it, you simply have the same application on both (or all, if more than two) servers, and start workers on whichever server you want to process jobs. Either server can still queue jobs, but only the one(s) with workers running will actually process them.
For example, we have one interface server, a db server and several worker servers. The interface server serves the application via Apache/Passenger, connecting the Rails application to the db server. The workers have the same application, though Apache isn't running and you can't access the application through http. They do, on the other hand, have delayed_jobs workers running. In a common scenario, the interface server queues up jobs in the db, and the worker servers process them.
One word of caution: If you're relying on physical files in your application (attachments, log files, downloaded XML or anything else), you'll most likely need a solution like S3 to keep those files. The reason for this is that the individual servers might not have the actual files. An example of this is if your user were to upload their profile picture on your web-facing server, the files would likely be stored on that server. If you then have another server to resize the profile pictures, the image wouldn't exist on the worker server.
Just to offer another viable option: you can use a new worker service like IronWorker that relies completely on an elastic farm of cloud servers inside EC2.
This way you can queue/schedule jobs to run and they will parallelize across tons of threads spanning multiple servers - all without worrying about the infrastructure.
Same deal with the database though - it needs to be accessible from the outside.
Full disclosure: I helped build IW.
Related
I am quite new to Quartz.NET, but was able to create a running solution for my problem.
There are remote server instances, which are executed as windows services. The jobstore for these instances is an AdoJobStore with SQLLite backend.
The client application is able to run jobs remotely through remote scheduler proxies.
Now i have to combine the remote execution with clustering. Right here I am struggling with the instantiating of scheduler proxies for remote servers. When a scheduler is created on client, side addresses and ports are configured explicit with the properties of the scheduler factory.
In architecture with a cluster consisting of several remote services and one client, which has to start jobs on these servers with the Quartz.NET feature load balancing, an explicit start of each of the jobs to a specific server address makes no sense to me.
So, how should the client app give the jobs to the cluster and how has the cluster to be configured (for example a list of server ip addresses and port to be used)?
In addition: how have the Quartz.NET server instances to share the database and how will this work for server less SQLLite?
Thanks for any tip useful for further reading I have to do,
Mario
Meanwhile I was able to get my system to work. The answer to my question “Combination of remoting & clustering” is: Do not combine these features, as it is not necessary.
For implementation of a distributed cluster, don’t use remoting at all (hard to find when your first development step was creating a client with a single remote server).
Distribution of jobs and therefore all “connecting” of instances is done by using the same database, which has to be centralized for that reason (using SQL Express now).
Don’t start your local (client) scheduler instance.
Don’t care about all the local working threads appearing even when all the work should be carried out by the remoted servers in the cluster. My expectation would have been to use a scheduler with 0 threads in the local application, as you do not want to start any job within this app.
Problem unsolved: There seems to be no way to register a listener which will be called when a job is executed in the cluster. So I have to build my own feedback channel job --> starting app in order to track the status of jobs (start time, finish time, node where execution has taken place, ..).
Unsolved problem: When the local (WPF) application is closed by the user an endless loop in SimpleThreadPool
while (runnable == null && run)
{
Monitor.Wait(lockObject, 500);
}
prevents the process from being exited.
I have a platform (based on Rails 4/Postgres) running on an auto scaling Elastic Beanstalk web environment. I'm planning on offloading long running tasks (sync with 3rd parties, delivering email etc) to a Worker tier, which appears simple enough to get up and running.
However, I also want to run periodic batch processes. I've looked into using cron.yml and the scheduling seems pretty simple, however the batch process I'm trying to build needs to access the data from the web application to be able to work.
Does anybody have any opinion of the best way of doing this? Either a shared RDS database between web and worker tier, or perhaps a web service that the worker tier can access?
Thanks,
Dan
Note: I've added an extra question, which more broadly describes my
requirements as it struck me that this might not be the best approach.
What's the best way to implement this shared batch process with Elastic Beanstalk?
Unless you need a full relational database management system (RDBMS), consider using S3 for shared persistent data storage across your instances.
Also consider Amazon Simple Queue Service (SQS):
SQS is a fast, reliable, scalable, fully managed message queuing
service. SQS makes it simple and cost-effective to decouple the
components of a cloud application. You can use SQS to transmit any
volume of data, at any level of throughput, without losing messages or
requiring other services to be always available.
These three terms have given a lot of importance in understanding different app server in heroku tutorials but I can't understand the purpose and definition of these three terms.
Can anybody have info about that. Kindly share
Thanks
The Heroku reference guide has a lot of information on all of this, and lots more, but in answer to your question;
A dyno is effectively a small virtual server instance set up to run one app (it's behind an invisible load balancer, so you can have any number of them running side-by-side). You don't need to worry about the server admin side of things, as it just takes your source code from a Git push and runs it.
A worker is a type of dyno, usually designed to process tasks in the background (in contrast to a web dyno, which just serves web pages. For example, Rails has ActiveJob, which plugs into something like Resque or Sidekiq, completes tasks which would slow down the web interface if it has to complete them, like sending e-mails, or geocoding addresses.
Zero-downtime deploys is really marketing speak for "if you push your code, it will wait until the new version is up and running before swapping the web interface to use it". It means you don't need to do anything, and your web app won't go offline while it is switched over to.
I have a big application. One of the part of this is highload processing with user files. I decide to provide for this one dedicate server. There will be nginx for distribution content and some programs (non rails) for processing files.
I have two question:
What better to use on this server? (Rails or something else, maybe Sinatra)
If I'll use Rails how to deploy? I can't find any instruction. If I have one app and two servers how to deploy it and delegate task for each other?
ps I need to authorize user on both servers. In Rails I use Devise.
You can use Rails for this. If both servers will act as a web client to the end user then you'll need some sort of load balancer in front of the two servers. HAProxy does a great job on this.
As far as getting the two applications to communicate with each other, this will be less trivial than you may think. What you should do is use a locking mechanism on performing the tasks. Delayed_job by default will lock a job in the queue so that any other works will not try and work on the same job. You can use callbacks from ActiveJob to notify the user via web sockets whenever their job is completed.
Anything that will take time or calling an external API should usually be placed into a background processing queue so that you're not holding up the user.
If you cannot spin up more than the two servers, you should make one of them the master or at least have some clear roles of the two servers. For example, one server may be your background processing and memcache server while the other is storing your database and handles your web sockets.
There are a lot of different ways of configuring the services and anything including and beyond what I've mentioned is opinionated.
Having separate servers for handling tasks is my preference as it makes them easier to manage from a Sys Admin perspective. For example, if we find that our web sockets server is hammered, we can simply spin up a few more web socket servers and throw them into a load balancer pool. The end user would not be negatively impacted from your networking changes. Whereas, if you have your servers performing dual roles outside of your standard Rails installation, you may find yourself cloning and wasting resources. Each of my web servers usually also perform background tasks on low-intermediate priority queues while a dedicated server is left for handling mission critical jobs.
I have a simple setup going for an API i'm building in rails.
A zip is uploaded via a POST, and I take the file, store it in rails.root/tmp using carrierwave and then background an s3 upload with sidekiq.
the reason i store the file temporarily is because i can't send a complex object to sidekiq, so i store it and send the id, and let sidekiq find it and do work with it, then delete the file once it's done.
the problem is that once it's time for my sidekiq worker to find the file by its path, it can't because it doesn't exist. i've read that heroku's ephemeral file system deletes its files when things are reconfigured/servers are restarted, etc.
none of these things are happening, however and the file doesn't exist. so my theory is that the sidekiq worker is actually trying to open the path that gets passed to it on its own filesystem since it's a separate worker and that file doesn't exist. can someone confirm this? if that's the case, are the any alternate ways to do this?
If your worker is executed on another dyno than your web process, you are experiencing this issue because of dyno isolation. read more about this here: https://devcenter.heroku.com/articles/dynos#isolation-and-security
Although it is possible to run sidekiq workers and the web process on the same machine (maybe not on heroku, i am not sure about that), it is not advisable to design your system architecture like that.
If your application grows or experiences temporarily high loads, you may want to spread the load across multiple servers, and usually also run your workers on separate servers than your web process in order to not block the web process in case that your workers are keeping the server busy.
In all those cases you can never share data on the local filesystem between the web process and the worker.
I would recommend to consider directly uploading the file to S3 using https://github.com/waynehoover/s3_direct_upload
This also takes a lot of load off your web server