I use the zf2 module SlmQueue to queue some processes in my application.
My problem is, that I'm not able to install some systems like supervisord on my server.
Is there a way to process the jobs of a queue via cronjob? I read something about it was possible in an earlier verions of SlmQueue, but I don't know how it could work.
Those are two different aspects: where to store messages vs how to consume messages.
If you don't have any dedicated queue system, store messages in MySQL (or flat files or any other available).
Then install a cronjob which will consume and handle messages from your job source every now and then (mutatis mutandis):
*/10 * * * * php public/index.php queue mysql default
I don't know if this command handles concurrency. If not you should write a wrapper script which will create a lock file to prevent running this command multiple times.
Related
I had a few questions around how to make the imports available across the Dask workers.
1) I see that using upload_file functionality you can make files available for workers .
Other than this what are the other options to get this done?
2) If we include the upload_file for all the imports, For every service call in backend will it keep on uploading to workers ? will it get removed after the task is executed?
Yes, there are lots of ways to do this, depending on how you are deploying dask.
A couple of examples:
all workers have access to NFS, so put your code files there and include it in the python path
workers are accessed via SSH, so use scp to copy your code to all worker machines
you are deploying via docker/kubernetes, so include the code in the image
you are deploying via dask-yarn: look up conda-pack
upload_file puts the code into a temporary location which is on the worker's python path. The file will persist there until at least when the worker process ends, it will not be re-uploaded between tasks. It will be imported by the code as normal python modules (i.e., importing again will use the cached version). New workers that join after the upload_file command will not have a copy of the file.
Using simple server
I was using a simple node (centos or ubuntu) to run my web application and also configured some cron jobs there to run schedule tasks. In that moment everything worked.
Using Docker Swarm Cluster
I migrated my application to Docker Swarm cluster. Now the crons are running in multiple containers at same time and that is critical for me. I know Docker is working on new feature called jobs but I need a solution for now. I will like to know if there is any way to just run one kind of cron job process.
Blocker
The crons are running tasks like:
create report about process.
send notification to another services.
updating data in the application.
The crons need to be run on the server because were configured to use interfaces and endpoint using php command.
My Problem
I created multiple instance of the same docker service to provide availability. All the instances are running in a cluster of 3 nodes and each of them are running its cron jobs at same time in parallel and I will like to run just one job per docker service.
Maybe it would be a solution to create periodically a docker service with restart condition none and replicas 1 or create a cron container with replicas 1 and restart condition any, it would be the scheduler and attach a volume with the required cron scripts.
There are multiple options.
Use a locking mechanism locking over NFS, or a database (MySQL, Redis, etc). You execute each job like this: /usr/local/bin/locker /path/to/script args. It may good to provide the locker options to wait for the lock or fail immediately if the lock is not available (blocking, non-blocking). Therefore, if the job is long-running, only the first one will acquire the lock, and others will fail. You may want to reuse some existing software simplifying the hard job of creating reliable locks.
Use a leader selection. When running in a swarm, there must be a mechanism to query the list of containers. List only cron containers, and sort them alphabetically. If the current container's id is the first one, then allow executing. first=$(get-containers cron | sort | head -n 1); if [[ "$current-id" == "$first" ]]; then ... fi
Run the cron outside of the cluster but use it to trigger jobs within the cluster over the load balancer. The load balancer would pick exactly one container to execute the job. For example curl -H 'security-key: xxx' HTTP://the.cluster/my-job.
I'm sure there are swarm-specific tools/methods available. A link.
Here in my company we have our regular application in aws ebs with some background jobs. The problem is, these jobs are starting to get heavier and we were thinking in separate them from the application. The question is: Where should we do it?
We were thinking in doing it in aws lambda, but then we would have to port our rails code to python, node or java, which seems to be a lot of work. What are other options for this? Should we just create another ec2 environment for the jobs? Thanks in advance.
Edit: I'm using shoryuken gem: http://github.com/phstc/shoryuken integrated with SQS. But its currently with some memory leak and my application is going down sometimes, I dont know if the memory leak is the cause tough. We already separated the application between an API part in EBS and a front-end part in S3.
Normally, just another EC2 instance with a copy of your Rails app, where instead of rails s to start the web server, you run rake resque:work or whatever your job runner start command is. Both would share the same Redis instance and database so that your web server writes the jobs to the queue and the worker picks them up and runs them.
If you need more workers, just add more EC2 instances pointing to the same Redis instance. I would advise separating your jobs by queue name, so that one worker can just process fast stuff e.g. email sending, and others can do long running or slow jobs.
We had a similar requirement, for us it was the sidekiq background jobs, they started to get very heavy, so we split it into a separate opsworks stack, with a simple recipe to build the machine dependencies ( ruby, mysql, etc ), and since we don't have to worry about load balancers and requests timing out, it's ok for all machines to deploy at the same time.
Also another thing you could use in opsworks is using scheduled machines ( if the jobs are needed at certain times during the day ), having the machine get provisioned few minutes before the time of the task, and then after the task is done you could make it shutdown automatically, that would reduce your cost.
EB also has a different type of application, which is the worker application, you could also check that out, but honestly I haven't looked into it so I can't tell you what are the pros and cons of that.
We recently passed on that route. I dockerized our rails app, and wrote a custom entrypoint to that docker container. In summary that entrypoint parses commands after you run docker run IMAGE_NAME
For example: If you run: docker run IMAGE_NAME sb rake do-something-magical entrypoint understands that it will run rake job with sandbox envrionment config. if you only run: docker run IMAGE_NAME it will do rails s -b 0.0.0.0
PS: I wrote custom entrypoint because we have 3 different environments, that entrypoint downloads environment specific config from s3.
And I set up an ECS Cluster, wrote an task-runner job on Lambda this lambda function schedules a task on ecs cluster, and we trigger that lambda from CloudWatch Events. You can send custom payloads to lambda when using CloudWatch Events.
It sounds complicated but implementation is really simple.
You may consider to submit your task to AWS SQS services, then you may use elasticbeantaslk worker enviroment to process your backgrown task.
Elasticbeanstalk supports rail application:
http://docs.aws.amazon.com/elasticbeanstalk/latest/dg/create_deploy_Ruby_rails.html
Depending on what kind of work these background jobs perform, you might want to think about maybe extracting those functions into microservices if you are running your jobs on a difference instance anyway.
Here is a good codeship blog post on how to approach this.
For simple mailer type stuff, this definitely feels a little heavy handed, but if the functionality is more complex, e.g. general notification of different clients, it might well be worth the overhead.
So we use Sidekiq as our queue managing system in our Rails application.
We also use Sidetiq to manage scheduled and recurring tasks.
At the moment there is around 200-300 scheduled tasks that will run anytime from couple of minutes to 30 days.
I would transfer just Redis database rdb file but due to some configuration changes, Rails project path has changed (hence tasks will not be able to run anymore)
What would be a preferred way to transfer whole scheduled tasks queue to work with new project path and manually is not the case.
Ruby 2.1.6
Rails 3.2.22
Sidekiq 3.4.2
Redis 2.8.4
Use DUMP and RESTORE:
redis-cli -h source_host dump schedule | head -c-1 | redis-cli -h dest_host restore 0 schedule
http://redis.io/commands/restore
You can copy your redis dump file as you said. It's not clear for me why you are excluding that option.
Doing it manually (just create a ruby script for it), moving the scheduled tasks should be pretty easy. The only thing you have to do is moving the redis sets retry and schedule
My main concern was I thought that I couldn't just copy redis database because path of my project changes but as it turned out, it wasn't an issue.
Fastest way to replicate db for me was to first bind redis on old server to either it's IP or 0.0.0.0
and then on new server run
redis-cli slaveof OLD_SERVER_IP 6379
and then when everything is copied (copying is done in matter of seconds), run
redis-cli slaveof no one
Tada. Your Redis db is fully replicated.
I posted this question originally on the Docker forums, but didn't get any response there.
I'm wondering what the best way would be to model a set of services let's call them db, web, and batch. db is simply a running database server instance (think MySQL). web is a web application that needs to connect to the database. batch is a batch application that needs to connect to that same database (it can/will run in parallel with web). db needs to be running, for either web or batch to run. But web or batch can be run independently of each other (one or both can be running at the same time). If both are running at once, they need to be talking to the same database instance (so db is actually using volumes_from a separate data volume container). So if the use case was simpler (think just db and web, which always run together), then they would simply both be defined as services in the same compose file, with web having a link to db.
As far as I understand it, these can't all be defined in the same Docker compose configuration. Instead, I would need three different configurations. One for db, which is launched first, one for web (which uses external_links to find db), and a third for batch (which also uses external_links to find db). Is that correct, or is there some mechanism available that I'm not considering? Assuming a multi-configuration setup is needed, is there a way to "lazily" initialize the db composition if it's not running, when either the web or batch compositions are launched?
If web has a link defined to db in a docker-compose file, db will always start first.
As far as I know, Docker will never know when the database will be up. It will be your web container's responsibility to properly start and retry until the base is up (with a timeout).
For your batch service, assuming that you don't want to start it everytime you start your web and db containers (using a docker-compose up or run), you can try extending your service. See the docs for more informations on this.
Either you applications in the web and batch images known how to handle database down time and are able to wait for the db service to come up and auto-reconnect ; either you have to make a shell script that will be run when the docker container is started to wait for the db to be available before starting the app.
Depending on the docker images you are using for the web and batch services, you would have to override CMD, ENTRYPOINT or both.
This question has examples of shell script which waits for a MySQL service to be up.
And here are other technics for testing if a network port is opened.