Scheduler Design Help Needed - system-design

I am designing a system, which can have thousand of cron jobs. I am creating a UI where one can create cron job, and specify the interval in it. What system design you will suggest making it scalable?

I would recommend to have a closer look on existing systems. Maybe they already fulfill your needs. Have a look at apache airflow (https://airflow.apache.org/), Azkaban (https://azkaban.github.io/) or oozie (http://oozie.apache.org/).
If you really want to go for your own system please tell us a little more about the requirements.

Related

ROS/ROS2 for multi agent system

Does anybody know how to use ROS/ROS2 for the multi-agent system? I know there are other software for multi agent, but I heard that ROS is suitable for this. Does anybody know the specific ideas?
ROS is a middleware framework for creating a distributed system of nodes based on the publish/subscribe methodology. It can certainly be used for a multi-agent system. You should read through the ros wiki. It has a lot of great info and is a very easy way to start learning the ideas.
we're currently working on a BDI framework for ROS2 targeting Multi Agent Systems (MAS), thus facilitating its development. Repository is here and user's documentation here. The plans are dynamically computed via a PDDL 2.1 based planning system (which is PlanSys2). It's still under development, so there can be bugs here and there. We're currently try to solve them and then the idea is to lean toward a more flexible reasoning behaviour, while keeping in consideration real time constraints and/or computational feasibility of the plan execution.
If that might fulfill your needs, give it a look and share your feedback!

Advice on using .Net WorkFlow State Machine. What would you do?

So I've been tasked at work to write windows services to replace some old legacy VB6 WinForms apps currently running as services, consistently repeating tasks day-to-day. To give some general background, they have there own state machines built in to handle decision basing and not utilizing threading.
A lot of the senior developers here thought it would be worth a try to look into WorkFlow to replace the state machines rather than write my own business logic and try threading it programmaticly. So it's WF vs. the "Old College Try" I suppose.
My concern is that there aren't many books on the topic, and since it was implemented in .Net I've heard very little about it being used. I brought this up at work and another developer mentioned that it's because Biz Talk never really caught on and it was designed for that.
So is it broken? Do you think it will be supported long enough to not worry so much? I don't want an ill-functioning process injected into my services, my new babies at work, and then have WF's keel over. Leaving me with having to replace them with my own code in the event of an emergency; which does not seem like much of a grand scenario to me.
Any suggestions, recommendations would be super.
Workflow Foundation is used in Microsoft SharePoint, so I think they will continue supporting it.
There is an open source project called Stateless by Nicholas Blumhardt. It is quite flexible and very light weight. See my SO answer for details.
I chose this over Windows Workflow simply because I could define a state as State and thereby persist the state of my workflows back to the database using SubSonic. Configuration consists of one XML file. If I need to add tasks, I simply add nodes to the XML.
The each state can have a series of triggers that once satisfied will advance to appropriate state. This framework is a single assemble and fits nicely in your domain logic.

Cloud-aware programming and help choosing a good framework

How can i write a cloud-aware application? e.g. an application that takes benefit of being deployed on cloud. Is it same as an application that runs or a vps/dedicated server? if not then what are the differences? are there any design changes? What are the procedures that i need to take if i am to migrate an application to cloud-aware?
Also i am about to implement a web application idea which would need features like security, performance, caching, and more importantly free. I have been comparing some frameworks and found that django has least RAM/CPU usage and works great in prefork+threaded mode, but i have also read that django based sites stop to respond with huge load of connections. Other frameworks that i have seen/know are Zend, CakePHP, Lithium/Cake3, CodeIgnitor, Symfony, Ruby on Rails....
So i would leave this to your opinion as well, suggest me a good free framework based on my needs.
Finally thanks for reading the essay ;)
I feel a matrix moment coming on... "what is the cloud? The cloud is all around us, a prison for your program..." (what? the FAQ said bring your sense of humour...)
Ok so seriously, what is the cloud? It depends on the implementation but usual features include scalable computing resource and a charge per cpu-hour, storage area etc. So yes, it is a bit like developing on your VPS/a normal server.
As I understand it, Google App Engine allows you to consume as much as you want. The back-end resource management is done by Google and billed to you and you pay for what you use. I believe there's even a free threshold.
Amazon EC2 exposes an API that actually allows you to add virtual machine instances (someone correct me please if I'm wrong) having pre-configured them, deploy another instance of your web app, talk between private IP ranges if you wish (slicehost definitely allow this). As such, EC2 can allow you to act like a giant load balancer on the front-end passing work off to a whole number of VMs on the back end, or expose all that publicly, take your pick. I'm not sure on the exact detail because I didn't build the system but that's how I understand it.
I have a feeling (but I know least about Azure) that on Azure, resource management is done automatically, for you, by Microsoft, based on what your app uses.
So, in summary, the cloud is different things depending on which particular cloud you choose. EC2 seems to expose an API for managing resource, GAE and Azure appear to be environments which grow and shrink in the background based on your use.
Note: I am aware there are certain constraints developing in GAE, particularly with Java. In a minute, I'll edit in another thread where someone made an excellent comment on one of my posts to this effect.
Edit as promised, see this thread: Cloud Agnostic Architecture?
As for a choice of framework, it really doesn't matter as far as I'm concerned. If you are planning on deploying to one of these platforms you might want to check framework/language availability. I personally have just started Django and love it, having learnt python a while ago, so, in my totally unbiased opinion, use Django. Other developers will probably recommend other things, based on their preferences. What do you know? What are you most comfortable with? What do you like the most? I'd go with that. I chose Django purely because I'm not such a big fan of PHP, I like Python and I was comfortable with the framework when I initially played around with it.
Edit: So how do you write cloud-aware code? You design your software in such a way it fits on one of these architectures. Again, see the cloud-agnostic thread for some really good discussion on ways of doing this. For example, you might talk to some services on GAE which scale. That they are on GAE (example) doesn't really matter, you use loose coupling ideas. In essence, this is just a step up from the web service idea.
Also, another feature of the cloud I forgot to mention is the idea of CDN's being provided for you - some cloud implementations might move your data around the globe to make it more efficient to serve, or just because that's where they've got space. If that's an issue, don't use the cloud.
I cannot answer your question - I'm not experienced in such projects - but I can tell you one thing... both CakePHP and CodeIgniter are designed for PHP4 - in other words: for really old technology. And it seems nothing is going to change in their case. Symfony (especially 2.0 version which is still in heavy beta) is worth considering, but as I said on the very beginning - I can not support this with my own experience.
For designing applications for deployment for the cloud, the main thing to consider if recoverability. If your server is terminated, you may lose all of your data. If you're deploying on Amazon, I'd recommend putting all data that you need persisted onto an Elastic Block Storage (EBS) device. This would be data like user generated content/files, the database files and logs. I also use the EBS snapshot on a 5 day rotation so that's backed up itself. That said, I've had a cloud server up on AWS for over a year without any issues.
As for frameworks, I'm giving Grails a try at the minute and I'm quite enjoying it. Built to be syntactically similar to Rails but runs on the JVM. It means you can take advantage of all the Java goodness, like threading, concurrency and all the great libraries out there to build your web application.

Message Queues in Ruby on Rails

What message queues are people using for their Rails apps and what was the driving force behind the decision to choose it. Does the latest Twitter publicity over their in house queue Starling falling down affect any existing design decisions.
I am working on an app that will need a message queue to process some background tasks, I haven't done much of this, and most of the stuff I have seen in the past has been about Starling and Workling, and to be honest the application is not very big and this solution would probably suffice, but I'd love to get experience integrating the best solution possible as I'm sure I will integrate one into a bigger app at some point.
What message queues would you suggest for a Rails app???
EDIT: Thanks for the suggestions, I'm going to look at a few of them this weekend.
EDIT Again: I've had a look around and a little overwhelmed for choice. I am however going to go about integrating RabbitMQ with Workling into the app I am building, then if I ever need some knowledge about a fast queue then I will have this and know whether or not it fits my needs.
EDIT: Finding more and more that DJ suits me just fine, if I ever "outgrow" it on a site I'd say that Resque is where I would head.
EDIT: (Dec 2014) So it's been a long time since I asked this, but I see it still gets some views or some votes, so I figured I'd update it on my approach now when it comes to my choice of background workers.
In my opinion currently the best way to run background jobs in Ruby is using Sidekiq. A lot of people have really lauded Sidekiq for it's threaded workers rather than process per worker which can use significantly less memory than the likes of Resque, which I was using before Sidekiq. This is good but for me this wasn't the killer feature. By using Sidetiq with Sidekiq, the scheduling of jobs is so trivial that I switched over and have never looked back from it, by far the easiest scheduling of jobs that I have used and has made Sidekiq a breeze to use.
As an update -- GitHub have moved to Resque on Redis instead of Delayed job. However they still recommend delayed_job for smaller setups:
https://github.com/resque/resque
Chris Wanstrath from github was at the SF Ruby meetup recently, talking about their queue. They tried Starling, beanstalk, and some other variants before settling on Shopify's delayed_job. They are pretty aggressive with their use of backgrounding.
Here's a blog post from last year that talks about their move to DJ.
Where I am now we rolled our own several years ago, but I'm taking some ideas from DJ to improve the handling.
I would recommend delayed-job as a dead simple solution if you don't expect any heavy load. Pros: easy to setup, easy to monitor, simple code, doesn't have any external dependencies. Previously we used ActiveMessaging (with ActiveMQ and stomp), but it was an overkill for our project, so we switched to delayed_job for its simplicity.
Anyway, if you need very mature and fast solution, ActiveMQ is a very good choice. If you don't want to spend too much time on maintaining full-scale message queueing solution you don't really need, delayed_job is a way to go. Here's a good article about Scribd experience with ActiveMQ.
Here are a few Ruby/Rails solutions, one or more of these may be a good fit depending on your needs:
http://xph.us/software/beanstalkd
http://rubyforge.org/forum/forum.php?forum_id=19781
http://backgroundrb.rubyforge.org
And, a hosted solution from Amazon which would make a great queue for sharing between Ruby/Rails and other components of a larger system:
http://aws.amazon.com/sqs
Hope this helps!
The Messaging Server you might want to go for is RabbitMQ. Erlang coolness, AMQP, good Ruby libs.
http://www.bestechvideos.com/2008/12/09/rabbitmq-an-open-source-messaging-broker-that-just-works
Rany Keddo gave a useful presentation about Starling + Workling at RailsConf Europe last year. He compared the different solutions available at the time.
Twitter's latest move away from Starling + Workling probably doesn't mean much to the regular rails app. They have a lot more issues of scale and probably have legacy issues with their datastore that prevents them from scaling past their current implementation.
Beanstalkd is a good alternative, simply because it runs as a daemon and has wrappers in other scripting languages (if you happen to change direction in the future or have different components written in other languages).
This link also has a good comparison of pros-cons of the various rails solutions available.
I use background_job which like delayed_job is a database-based queue.
A database makes an OK queue as long as you're not doing too much traffic in and out.
The reason I like background_job (and delayed_job) is that they do not require a separate process. They can run through cron. For me, this is of key importance because my messaging needs are even simpler than my meager sysadmin skills.

Make recommendations on building (or setting up) an RRD Tool based web app for website monitoring that is simpler than Cacti?

I think Cacti is great except for the fact that it takes hours to configure it. There is a lot that you can do with it but I find it a little overly complicated. A script collecting disk utilization recently broke on me (for no apparent reason), I spent 3 hours and got no where.
I would like a tool like Cacti but super easy to setup. I have some familiarity wit RRD so a little bit of manual work is okay.
To make this more programming related: An alternative to a different software package would be to develop something custom built. Has anybody attempted this? What pieces to you use to built which parts?
There are a slew of tools out there:
Cacti
Ganglia
Zabbix
Hyperic
Monit
Reconnoiter
Graphite
Each focus on different aspects of usability. Reconnoiter and Graphite were born out of specialized needs and wanting greater resolution than RRD can provide.
I suggest you do take another look at Cacti before building yourself. Create templates in cacti for your hosts and your graphs. Then use the built in CLI tools to automate. This way for each host, you do not need to click through the GUI.
I think this is what I want:
http://collectd.org
Collectd in combination with drraw looks like it will fit my needs.
I don't know if this will meet your needs, but you might also want to look at RRDUtil:
http://www.tnpi.biz/internet/manage/rrdutil/
Unfortunately they are all very time-consuming to learn and configure. You have to spend time to understand all the principles used and read sample configurations.
No short-cuts on this chore :-D
We use gmond and ganglia.

Resources