Any additional considerations when using Faye in a Node.js cluster?

Any additional considerations when using Faye in a Node.js cluster? - faye

We're planning to run an Express-based server on Node.js in "cluster mode" using Node.js' cluster support. So there will be 1 master process and 'n' (where 'n' is calculated based on the number of CPUs) child processes running on a single machine. We already have a testbed set up using Faye for pubsub in non-cluster mode and it works great.
Are there any additional considerations we need to be aware of when using Faye on top of a Node cluster? For example, since there will be 'n' HTTP server instances, will it be a problem creating a Faye NodeAdapter in each Node process and attaching it to the HTTP server instance in that process?
Thanks.
-brian

I just realized that the answer to my question is fairly obvious. One thing to be aware of is that Faye will need to access shared state across multiple server instances (processes). In a single-server config, you could probably get away with using Faye's memory engine. In a clustered config, you'd need to use Faye's redis engine or some other engine that allows state to be shared by different processes. I'd prefer not to introduce another persistence component just for this purpose so I may look into implementing my own on top of my current persistent store (Neo4j).

Related

Upgrade micro-service without breaking current execution

Suppose you have a micro-service architecture with a topology of two services A and B on which both has 3 instances running each.
A its a web service receiving web requests, and B its a cli based application listening for events from a queue
Now you want to deploy a new version of B, but since the instances of B can be processing info at the moment.
How can be deployed, replacing old instances for new ones without breaking current execution?
There is any tool, patterns or strategy that handle this scenarios?

You need a simple strategy where you stop serving new requests for B for that instance which is about to go under deployment.
If it's consuming events using rest then you can use load balancer, if you have load balancer then using consul, consul template you can detach that instance from load balancer. Keep some approx time say 5 mins (which you need to evaluate) and then start the deployment.
Using this approach is necessary if you are not sure how to find out if the current instance has done all the processing of existing events.
If these events are consumed using MQ then you can have an endpoint upon called which will disable the new event consumption. And then have the same wait and deploy strategy.

Combination of remoting and clustering

I am quite new to Quartz.NET, but was able to create a running solution for my problem.
There are remote server instances, which are executed as windows services. The jobstore for these instances is an AdoJobStore with SQLLite backend.
The client application is able to run jobs remotely through remote scheduler proxies.
Now i have to combine the remote execution with clustering. Right here I am struggling with the instantiating of scheduler proxies for remote servers. When a scheduler is created on client, side addresses and ports are configured explicit with the properties of the scheduler factory.
In architecture with a cluster consisting of several remote services and one client, which has to start jobs on these servers with the Quartz.NET feature load balancing, an explicit start of each of the jobs to a specific server address makes no sense to me.
So, how should the client app give the jobs to the cluster and how has the cluster to be configured (for example a list of server ip addresses and port to be used)?
In addition: how have the Quartz.NET server instances to share the database and how will this work for server less SQLLite?
Thanks for any tip useful for further reading I have to do,
Mario

Meanwhile I was able to get my system to work. The answer to my question “Combination of remoting & clustering” is: Do not combine these features, as it is not necessary.
For implementation of a distributed cluster, don’t use remoting at all (hard to find when your first development step was creating a client with a single remote server).
Distribution of jobs and therefore all “connecting” of instances is done by using the same database, which has to be centralized for that reason (using SQL Express now).
Don’t start your local (client) scheduler instance.
Don’t care about all the local working threads appearing even when all the work should be carried out by the remoted servers in the cluster. My expectation would have been to use a scheduler with 0 threads in the local application, as you do not want to start any job within this app.
Problem unsolved: There seems to be no way to register a listener which will be called when a job is executed in the cluster. So I have to build my own feedback channel job --> starting app in order to track the status of jobs (start time, finish time, node where execution has taken place, ..).
Unsolved problem: When the local (WPF) application is closed by the user an endless loop in SimpleThreadPool
while (runnable == null && run)
{
Monitor.Wait(lockObject, 500);
}
prevents the process from being exited.

How to scale your 1 server Rails application

I have a rails application running on a single VPS that uses passenger, apache and MySQL. I am moving this to Amazon AWS with the following simple setup:
ELB > Web Server > MySQL
Lets say I am expecting a huge spike in daily users and want to start to scale this out on Amazon AWS using multiple instances. Where does a newbie start on this journey? Do I simply create an AMI from my production configured web server and get the ASG to launch these when required?
I understand that AWS increases the number of instances using auto scale groups as the load demands it, but do I need to architect anything differently in my Rails application for it to run at scale across multiple interfaces?

The problem with scaling horizontally is that it really depends on the application. There's no "just-add-water" ways to do it.
But there are some generic recipes you can follow in the beginning:
Extract MySQL server into a separate instance, which is capable of holding a higher load. Then create as many worker (i.e. app) instances that connect to the MySQL database as you need. You can keep doing so before your MySQL server gets saturated with requests, and can no longer keep up with the load.
When you're done with step 1, you can add MySQL replicas and setup a master-slave replication. This will leave you with a MySQL cluster, where one server can accept writes and all the others are read-only. After your set it up, change your application to send SELECT's to read-only replicas and INSERT/DELETE/UPDATE's to the writeable master server. This approach is based on the fact that most of the applications do reads way more often than writes. It can be not the case for you, but if it is, it'll keep your afloat pretty long. Right before you saturate MySQL master server write performance.
Once you've squeezed everything from step 2, you can go ahead and shard the data. This is now becoming more and more dependent on your application. But I will provide a blind example in order to convey the idea. Say, you have a user-centric application (e.g. a private photo-album, with no sharing capabilities), and each user has a name. In this case you can make two completely independent clusters, where the first one will serve users with names starting A-M, and the second one will serve ones with N-Z. It essentially makes the load twice as less, but complicates the whole architecture.
Though generic, these recipes can help you build a pretty solid application capable of serving millions of users daily before you're forced to bring up more exotic ways of scaling.
Hope this helps!

Ruby on Rails on few servers

I have a big application. One of the part of this is highload processing with user files. I decide to provide for this one dedicate server. There will be nginx for distribution content and some programs (non rails) for processing files.
I have two question:
What better to use on this server? (Rails or something else, maybe Sinatra)
If I'll use Rails how to deploy? I can't find any instruction. If I have one app and two servers how to deploy it and delegate task for each other?
ps I need to authorize user on both servers. In Rails I use Devise.

You can use Rails for this. If both servers will act as a web client to the end user then you'll need some sort of load balancer in front of the two servers. HAProxy does a great job on this.
As far as getting the two applications to communicate with each other, this will be less trivial than you may think. What you should do is use a locking mechanism on performing the tasks. Delayed_job by default will lock a job in the queue so that any other works will not try and work on the same job. You can use callbacks from ActiveJob to notify the user via web sockets whenever their job is completed.
Anything that will take time or calling an external API should usually be placed into a background processing queue so that you're not holding up the user.
If you cannot spin up more than the two servers, you should make one of them the master or at least have some clear roles of the two servers. For example, one server may be your background processing and memcache server while the other is storing your database and handles your web sockets.
There are a lot of different ways of configuring the services and anything including and beyond what I've mentioned is opinionated.
Having separate servers for handling tasks is my preference as it makes them easier to manage from a Sys Admin perspective. For example, if we find that our web sockets server is hammered, we can simply spin up a few more web socket servers and throw them into a load balancer pool. The end user would not be negatively impacted from your networking changes. Whereas, if you have your servers performing dual roles outside of your standard Rails installation, you may find yourself cloning and wasting resources. Each of my web servers usually also perform background tasks on low-intermediate priority queues while a dedicated server is left for handling mission critical jobs.

RabbitMQ with EventMachine and Rails

we are currently planning a rails 3.2.2 application where we use RabbitMQ. We would like to run several kind of workers (and several instances of a worker) to process messages from different queues. The workers are written in ruby and are laying in the lib directory of the rails app.
Some of the workers needs the rails framework (active record, active model...) and some of them don't. The first worker should be called every minute to check if updates are available. The other workers should process the messages from their queues when messages (which are send by the first worker) are present and do some (time consuming) stuff with it.
So far, so good. My problem is, that I only have little experiences with messaging systems like RabbitMQ and no experiences with the rails interaction between them. So I'm wondering what the best practices are to get the two playing with each other. Here are my requirements again:
Rails 3.2.2 app
RabbitMQ
Several kind of workers
Several instances of one worker
Control the amount of workers out of rails
Workers are doing time consuming tasks, so they have to be async
Only a few workers needs the rails framework. The others are just ruby files with some dependencies like Net or File
I was looking for some solution and came up with two possibilities:
Using amqp with EventMachine in a new thread
Of course, I don't want my rails app to be blocked when a new worker is created. The worker should run in another thread and do its work asynchronously. And furthermore, it should not start a new instance of my rails application. It should only require the things the worker needs.
But in some articles they say that there are some issues with Passenger. And another fact that I don't like is, that we are using webbrick for development and we ought to include workarounds for that too. It would be possible to switch to another webserver like thin, but I don't have any experience with that either.
Using some kind of daemonizing
Maybe its possible to run workers as a daemon, but I don't know how much overhead this would come up with, or how I can control the amount of workers.
Hope someone can advise a good solution for that (and I hope I made myself clear ;)

It seems to me that AMQP is a big shot to kill your problem. Have you tried to use Resque? The backed Redis database has some neat features (like publish/subscribe and blocking list pop) which make it very interesting as a message queue, and Resque is very easy to use in any Rails app.
The workers are daemonized, and you decide which worker of your pool listens to which queue, so you can scale each type of job as needed.
Using EM reactor inside a request/response cycle is not recommended, because it may conflict with an existing event loop (for instance if your app is served by thin), in any case you have to configure it specifically for your web server, OTOS it may be interesting to have an evented queue consumer, if your jobs have blocking IO and are not processor-bound.
If you still want to do it with AMQP, see Starting the event loop and connecting in Web applications and configure for your web server accordingly. Or use bunny to push synchronously in the queue (and whichever job consumer you deam useflu, like workling for instance)

we are running slightly different -- but similar technology stack.
daemon kit is used for eventmachine side of the system... no rails, but shared models (mongomapper & mongodb). EM is pulling messages off the queues, and doing whatever logic is required (we have ruleby in the mix, but if-then-else works too).
mulesoft ESB is our outward-facing message receiver and sender that helps us deal with the HL7/MLLP world. But in v1 of the app, we used some java code in ActiveMQ to manage HL7 messages.
the rails app then just serves up stuff for the user to see -- again, using the shared models.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart