Is this a good reason to use a service bus, alternatives please - asp.net-mvc

I'm in the planning phase of our new site - it's an extension of some mobile apps we've built. We want to provide our users with a central point for communication and also provide features for users who don't want to/can't use the mobile apps. One of the features we're looking at adding is a reputation system similar in nature to the SO badge system. We're designing the system to use SOA.
I don't want to have to code all of this logic into the main app as discreet chunks. I'm thinking of creating a means to accomplish this which will allow us to define new thresholds and rules for gaining reputation and have them injected into some service. The two ways I've thought of doing this so far are:
To look for certain traits in a users actions and respond, this would mean having a service running that can run through the 'plugged in' award definitions and check for thresholds that have been met and respond appropriately.
To fire events when the user performs actions - listen out for those events and respond appropriately. Because the services which will be carrying out these actions are running in separate app domains potentially on separate servers the only way I can see having a central message bus to listen and respond to these events is by using something like MassTransit, nServiceBus or Rhino.Esb.
I know that using a service bus can very easily be inappropriately designed into an application that simply doesn't need it and most times - unless you're integrating disparate, heterogenous systems - you most likely won't need one when designing a new system but I'm a bit lost for options as to the best way to do this. I don't like the idea of having a service hammer the Db all the time in the background. But it does sound like it might be a lot simpler early on - later on - I dread to think!
Has anyone here designed a system like this? How did you accomplish this? We're designing for high throughput as we expect there will be times when the system will need to be able to cope with bursts of users.

I've designed a system that had similar requirements. To achieve this the key elements were:
Plugins
Event messaging - using Emesary
The basic concept is that the core is not aware of exactly which module will perform any given task.
The messages are defined and at points within the system they are dispatched. The sender is not aware if the message is required. This effectively decouples vast chunks of the system.
So to perform a job some code is plugged in, that registers with the event messaging bus and will receive messages. When it receives a message that it needs to process it will process it.
The Emesary code is extremely small and efficient in the first instance I've called it (Emesary and you're free to use it; or from Emesary CodePlex
As the system becomes more complex it is possible that there are lots of events flying about, if you get more than 20k a second it was always in my design to add filtering and routing (implemented by the recipient interface being extended to allow a recipient to specify messages it wants to receive during registration). I've never needed to add this filtering because Emesary is sufficiently efficient that it is the processing of the messages that takes the time.
I've build a version of Emesary which bridges two Notifiers across disparate systems using WCF, Corba and TCP/IP. I investigated using RabbitMQ and decided it was possible to use this underneath Emesary if needed.
Base Class Diagram
Scalable server.
This is a fairly complex example however it shows where Emesary fits in. In this diagram anything with a drop shadow can have multiple instances and this is managed outside of what I'm trying to explain here.

Related

Distributing an Erlang Chat system

I just finished Erlang in Practice screencasts (code here), and have some questions about distribution.
Here's the is overall architecture:
Here is how to the supervision tree looks like:
Reading Distributed Applications leads me to believe that one of the primary motivations is for failover/takeover.
However, is it possible, for example, the Message Router supervisor and its workers to be on one node, and the rest of the system to be on another, without much changes to the code?
Or should there be 3 different OTP applications?
Also, how can this system be made to scale horizontally? For example if I realize now that my system can handle 100 users, and that I've identified the Message Router as the main bottleneck, how can I 'just add another node' where now it can handle 200 users?
I've developed Erlang apps only during my studies, but generally we had many small processes doing only one thing and sending messages to other processes. And the beauty of Erlang is that it doesn't matter if you send a message within the same Erlang VM or withing the same Computer, same LAN or over the Internet, the call and the pointer to the other process looks always the same for the developer.
So you really want to have one application for every small part of the system.
That being said, it doesn't make it any simpler to construct an application which can scale out. A rule of thumb says that if you want an application to work on a factor of 10-times more nodes, you need to rewrite, since otherwise the messaging overhead would be too large. And obviously when you start from 1 to 2 you also need to consider it.
So if you found a bottleneck, the application which is particularly slow when handling too many clients, you want to run it a second time and than you need to have some additional load-balancing implemented, already before you start the second application.
Let's assume the supervisor checks the message content for inappropriate content and therefore is slow. In this case the node, everyone is talking to would be simple router application which would forward the messages to different instances of the supervisor application, in a round robin manner. In case those 1 or 2 instances are not enough, you could have the router written in a way, that you can manipulate the number of instances by sending controlling messages.
However for this, to work automatically, you would need to have another process monitoring the servers and discovering that they are overloaded or under utilized.
I know that dynamically adding and removing resources always sounds great when you hear about it, but as you can see it is a lot of work and you need to have some messaging system built which allows it, as well as a monitoring system which can monitor the need.
Hope this gives you some idea of how it could be done, unfortunately it's been over a year since I wrote my last Erlang application, and I didn't want to provide code which would be possibly wrong.

Why or when should I use messages queues such as RabbitMQ, ZeroMQ in Erlang?

Hello awesome Erlang community!
I'm making a little project that contains a Client and a Backend. (Complicated.. right?) :)
I'm making it in erlang.
The client and backend will be two separate processes and I'm wondering if I would need to (or should I) use some sort of message queue to get them to interact?
I know I can get them to interact using their PIDs and send messages using the "!" operator.
I guess what I'm trying to say is I'm struggling with finding an answer for this question:
"Why or when should I use message queues such as RabbitMQ, ZeroMQ in Erlang"?
You want to use a messaging library when you need something that the native message passing facility won't provide.
These include:
If you need to guarantee that your messages are processed at
least once, exactly once etc. (i.e. transaction)
If your system load is such that it would be convenient if you could
hold your messages on disk instead of memory (persistence)
You need other bells and whistles like security, interop with other
systems, complex messaging pattern (routing) etc.
I would go for a messaging component when you need to decouple the different layers of my system. Also, a messaging component allows you to be able to do different integration patters with your messages/requests like topic/fanout/route based on headers...
A messaging system is also used for scalibility purposes, so you can have multiple instances of the same process running simultaneously consuming from the same queue.
Last thing I want to mention is that RabbitMQ is a message broker but ZeroMQ is not, it is a messaging library.
If you can sacrifice reliability for performance, use ZeroMq.
If you need reliability (message persistence, etc), and can give up some performance, use a brokered solution like RabbitMq.

What is the performance overhead of Apache ActiveMQ vs. raw sockets?

We're looking to implement ActiveMQ to handle messaging between two of our servers, over a geographically diverse environment (Australia to the UK and back, via the internet).
I've been looking for some vague indicators of performance round the net but so far have had no luck.
My question: compared to a DIY TCP/SSL implementation of basic messaging, how would ActiveMQ perform? Similar systems of our own can send and receive messages across Australia in 100-150ms, over a SSL layer with an already established connection.
Also, does ActiveMQ persist its TLS/SSL connections, thus saving a substantial amount of time that would already be used in connection creation/teardown?
What I am hoping is that it will at least perform better than HTTPS, at a per-request level.
I am aware that performance can vary remarkably, depending on hardware, networks, code and so on. I'm just after something to start with.
I know the above is a little fuzzy - if you need any clarification please let me know and I will only be too happy to oblige.
Thank you.
What Tim means is that this is not an apples to apples comparison. If you are solely concerned with the performance of a single point to point connection to transfer data, a direct link will give you a good result (although DIY is still a dubious design decision). If you are building a system that requires the transfer of data and you have more complex functional requirements, then a broker-based messaging platform like ActiveMQ will come into play.
You should consider broker-based messaging if you want:
a post-office style system where a producer sends a message, and knows that it will be consumed at some point, even if there is no consumer there at that time
to not care where the consumer of a message is, or how many of them there are
a guarantee that a message will be consumed, even if the consumer that first handle it dies mid-way through the process (transactions, redelivery)
many consumers, with a guarantee that a message will only be consumed once - queues
many consumers that will each react to a single message - topics
These patterns are pretty standard, and apply to all off the shelf messaging products. As a general rule, DIY in this domain is a bad idea, as messaging is complex (see http://www.ohloh.net/p/activemq/estimated_cost for an estimate of how long it would take you do do same); and has many existing implementations of various flavours (some without a broker) that are all well used, commercially supported and don't require you to maintain them. I would think very hard before going down to the TCP level for any sort of data transfer as there is so much prior art.

Passing messages between remote MailboxProcessors?

I'm using MailboxProcessor classes in order to keep separate agents that do their own thing. Normally agents can communicate with one another in the same process, but I want agents to talk to one another when they are on separate processes or even different machines. What kind of mechanism is best for implementing communication between them? Is there some standard solution?
Please note that I'm using Ubuntu instances to run the agents.
I think you're going to have write your own routines to serialize messages, pass them accross the process boundaries and then dispatch them on the other side. This will also require a implementation of a ID system where each mailbox has an ID and processes can send messages to IDs instead of just Mailbox.Send. This is not easy, as local boxes will be able to access local memory, but remote mailboxes will not.
I would look at something like RPyC (http://rpyc.wikidot.com/) as it provides a protocol somewhat like you are looking for.
Basically the answer is 'no' there isn't really a good way to do this.

What are the requirements for an application health monitoring system?

What, at a minimum, should an application health-monitoring system do for you (the developer) and/or your boss (the IT Manager) and/or the operations (on-call) staff?
What else should it do above the minimum requirements?
Is monitoring the 'infrastructure' applications (ms-exchange, apache, etc.) sufficient or do individual user applications, web sites, and databases also need to be monitored?
if the latter, what do you need to know about them?
ADDENDUM: thanks for the input, i was really looking for application-level monitoring not infrastructure monitoring, but it is good to know about both
Whether the application is running.
Unusual cpu/memory/network usage.
Report any unhandled exceptions.
Status of various modules (if applicable).
Status of external components (databases, webservices, fileservers, etc.)
Number of pending background tasks (if applicable).
Maybe track usage of the application and report statistics on most/less used functionalities so you know where optimizations are most beneficial.
The answer is 'it depends'. Why do you need to monitor? How large is your operations staff? Do you need reporting? What is the application environment? Who cares if the application fails? Who cares if an exception happens? Are any of the errors recoverable? I could ask questions like these for a long time.
Great question.
We've been looking for some application-level monitoring solution for our needs some time ago without any luck. Popular monitoring solution are mostly addressed to monitor infrastrcture and - in my opinion - they are too complicated for a requirements of most of small and mid-sized companies.
We required (mainly) following features:
alerts - we wanted to know about
incident as fast as possible
painless management - hosted service wouldbe
the best
visualizations - it's good to know what is going on and take some knowledge from the data
Because we didn't find suitable solution we started to write our own. Finally we've ended with up-and-running service called AlertGrid. (You can check it for free of course.)
The idea behind it is to provide an easy way to handle custom monitoring scenarios. Integration API is very simple (one function with two required parameters). At the momment we and others are using it for:
monitor scheduled tasks (cron jobs)
monitor entire application logic execution
alert on errors in applications
we are also working on examples of basic infrastructure monitoring using AlertGrid
This is such an open ended question, but I would start with physical measurements.
1. Are all the machines I think are hosting this site pingable?
2. Are all the machines which should be serving content actually serving some content? (Ideally this would be hit from an external network.)
3. Is each expected service on each machine running?
3a. Have those services run recently?
4. Does each machine have hard drive space left? (Don't forget the db)
5. Have these machines been backed up? When was the last time?
Once one lays out the physical monitoring of the systems, one can address those specific to a system?
1. Can an automated script log in? How long did it take?
2. How many users are live? Have there been a million fake accounts added?
...
These sorts of questions get more nebulous, and can be very system specific. They also usually can be derived reactively when responding to phsyical measurements. Hard drive fill up, maybe the web server logs got filled up because a bunch of agents created too many fake users. That kind of thing.
While plan A shouldn't necessarily be reactive, it is the way many a site setup a monitoring system.
Minimum: make sure it is running :)
However, some other stuff would be very useful. For example, the CPU load, RAM usage and (in multiuser systems) which user is running what. Also, for applications that access network, a list of network connections for each app. And (if you have access to client computer(s)) it would be cool to be able to see the 'window title' of the app - maybe check each 2-3 minutes if it changed and save it. Also, a list of files open by the application could be very useful, but it is not a must.
I think this is fairly simple - monitor so that you can be warned early enough before something goes wrong. That means monitor dependencies and the application itself.
It's really hard to provide specifics if you're not going to give details on the application you're monitoring, so I'd say use that as a general rule.
At a minimum you want to know that the system is healthy. This is subjective in what defines your system is healthy. Is it computers are up, the needed resources exist, the data is flowing through the system, the data is properly producing results, etc, etc.
In my project we do monitoring of most of this and then some. It really comes down to what is the highest level that you can use to analyze that everything is working. In our case we need to know down to the data output. If you just need to know down to the are these machines up it saves you on trying to show an inexperienced end user what is wrong.
There are also "off the shelf" tools that will do a lot of the hard work for you if you are just looking too hard into data results. I particularly liked Nagios when I was looking around but we needed more than it could easily show so I wrote our own monitoring system. Basically we also watch for "peculiarities" in the system, memory / cpu spikes, etc...
thanks everyone for the input, i was really looking for application-level monitoring not infrastructure monitoring, but it is good to know about both
the difference is:
infrastructure monitoring would be servers plus MS Exchange Server, Apache, IIS, and so forth
application monitoring would be user machines and the specific programs that they use to do their jobs, and/or servers plus the data-moving/backend applications that they run to keep the data flowing
sometimes it's hard to draw the line - an oversimplified definition might be "if your team wrote it, it's an application; if you bought it, it's infrastructure"
i think in practice it is best to monitor both
What you need to do is to break down the business process of the application and then have the software emit events at major business components. In addition, you'll need to create end to end synthetic transactions (eg. emulating end users clicking on a website). All that data would be fed into an monitoring tool. In the past, I've done JMX for applications of which flowed into Tivoli Monitoring's JMX Adapter and then I've done scripts that implement a "fake user" and then pipe in the results into Tivoli Monitoring's Script Adapter. Tivoli Monitoring takes the data and then creates application health and performance charts from that raw data.

Resources