I'd like to decouple a number of business objects that my website is using to support actions of the users.
My website is a SaaS/B2B site and I do not anticiapte to have a need for "mega scale". My primary issue is a need to decouple business objects from each other, and perform occasional longer-running operations asynchronously - outside of execution of threads that handle user traffic.
Having said that, I really do not want to have a separate set of servers that process my messages, and would prefer for web servers to just host MassTransit or other Bus software) internaly in memory. Assured message delivery (at this point) is also not yet my most important concenrn. I plan to "outsorce" a number of supporting business actions to the bus so that they do not pollute my main business services/objects.
Is this possible? Do I need Loopback for now as a transport or do I need full RabbitMq? Will RabbitMQ require me to install yet another set of servers to host it?
TIA
Loopback is just for testing. Installing RMQ is the right path. You don't NEED different servers for it, but would suggest it. If you off load work to a bus, you don't really want that contending with resources for the website. Given that, you can run RMQ locally without any issue. It message volume is low, so is resource usage in RMQ. When you reacher higher volumes, IO can be a problem with RabbitMQ (or any MQ).
Related
What are the core differences between service worker and AppCache. What are the pros and cons of each and when to prefer one over another .
The primary difference is that AppCache is a high-level, declarative API, with which you specify the set of resources you'd like the browser to cache; whereas Service Worker is a low-level, imperative, event-driven API with which you write a script that can intercept fetch events and cache their responses along with doing other things (like displaying push notifications).
The pros and cons are largely a function of API design: theoretically, AppCache is easier to use, while having more limited use cases; whereas Service Worker is harder to use, but is more flexible.
Nevertheless, AppCache is considered hard to use in practice due to poor design (see Application Cache Is A Douchebag for a list of design issues). And it has been deprecated, so it is being removed from browsers (per Using the application cache).
Thus the only reason to prefer AppCache is to offline an app on browsers that don't yet support Service Worker, as Kenneth Ormandy recommends in Don’t Wait for ServiceWorker: Adding Offline Support with One-Line.
Compare Can I use Service Workers? to Can I use Offline web applications? to see the differences in browser support. But note that browsers that support Service Worker, like Chrome and Firefox, are removing support for AppCache, so you'll need to implement both to offline your app across all browsers that support either standard.
In addition of what Myk Melez said, One of the main benefits of Service Workers against Application Cache is that Application Cache only works when user is disconnected from the network, so you can not manage situations of:
1- "slow network" - Your connection signal is strong, however some external entities (server, routes, etc) are delaying the transmission to your specific application.
2- "Lie-fi" (your phone shows is connected to a wi-fi or a cell network with low signal) so it seems to be connected when actually is not.
Service Workers is like a middle ware giving you control over the requests the browser is making, you can actually intercept the request and respond wherever you want, no matter you are connected or not. So you can implement "offline first" principle.
Recently I was reading about concept of layers in FoundationDB. I like their idea, the decomposition of storage from one side and access to it from other.
There are some unclear points regarding implementation of the layers. Especially how they communicate with the storage engine. There are two possible answers: they are parts of server nodes and communicate with the storage by fast native API calls (e.g. as linked modules hosted in the server process) -OR- hosted inside client application and communicate through network protocol. For example, the SQL layer of many RDBMS is hosted on the server. And how are things with FoundationDB?
PS: These two cases are different from the performance view, especially when the clinent-server communication is high-latency.
To expand on what Eonil said: the answer rests on the distinction between two different sense of "client" and "server".
Layers are not run within the database server processes. They use the FDB client API to make requests of the database, and do not (with one exception*) get to pierce the transactional key-value abstraction.
However, there is nothing stopping your from running the layers on the same physical (or virtual) server machines as the database server processes. And, as that post from the community site mentions, there are use cases where you might very much wish to do this in order to minimize latencies.
*The exception is the Locality API, which is mostly useful in exactly those cases where you want to co-locate client-side layers with the data on which they operate.
Layers are on top of client-side library feature.
Cited from http://community.foundationdb.com/questions/153/what-layers-do-you-want-to-see-first
That's a good question. One reason that it doesn't always make sense
to run layers on the server is that in a distributed database, that
data is scattered--the servers themselves are a network hop away from
a random piece of data, just like the client.
Of course, for something like an analytics layer which is aware of
what data each server contains, it makes sense to run a distributed
version co-located with each of the machines in the FDB cluster.
I have a grails app running with a single rabbit node. It is great. I want to fire up the same app a second time on the same machine on a different port. Currently, both apps answer jobs from both apps. I want their rabbits to be independent. What is the easiest way to ensure that each app only responds to the messages it sends? Multiple rabbit queues?
You can provide a virtualhost entry in the grails configuration:
rabbitmq.connectionfactory.virtualHost The name of the virtual host to connect to
Define two different vhosts in RabbitMQ, and each grails app will have their very own configured area to use. Messages sent through one vhost will only be available on that vhost, effectively separating the two grails apps without having to change queue setup or other internal parts of each app - just the configuration of the connection.
Remember that access control is performed on a per vhost basis, so you'll have to give your user access to each vhost in rabbitmq.
As #fiskfisk said, multiple vhosts is an option, and would work particularly well if you have a complex set of queues, exchanges, and bindings. There are some downsides to using a new vhost for the second application, including duplication of access control management, as well as some minor performance overhead.
If you have a fairly simple queue/exchange/binding setup, I would suggest pointing the second app at a queue with a different name, or giving your app the ability to be runtime-configured to either use a different queue, or to leverage the topic-based routing within RabbitMQ and have each app flag their messages with an app-specific prefix (or something similar).
One advantage of using topic routing to differentiate apps is that you can easily dip into the full stream of messages and do other things with that stream that you didn't foresee initially, including things like archival logging or audit logging, as well as other metrics collection or analysis.
tl;dr;
For long-term flexibility, have each instance of your application send messages to queues based on topic-routing.
For quick-and-dirty / get-it-working-yesterday, use a separate vhost for each instance of your application.
I am putting together a REST API and as I'm unsure how it will scale or what the demand for it will be, I'd like to be able to rate limit uses of it as well as to be able to temporarily refuse requests when the box is over capacity or if there is some kind of slashdotted scenario.
I'd also like to be able to gracefully bring the service down temporarily (while giving clients results that indicate the main service is offline for a bit) when/if I need to scale the service by adding more capacity.
Are there any best practices for this kind of thing? Implementation is Rails with mysql.
This is all done with outer webserver, which listens to the world (i recommend nginx or lighttpd).
Regarding rate limits, nginx is able to limit, i.e. 50 req/minute per each IP, all over get 503 page, which you can customize.
Regarding expected temporary down, in rails world this is done via special maintainance.html page. There is some kind of automation that creates or symlinks that file when rails app servers go down. I'd recommend relying not on file presence, but on actual availability of app server.
But really you are able to start/stop services without losing any connections at all. I.e. you can run separate instance of app server on different UNIX socket/IP port and have balancer (nginx/lighty/haproxy) use that new instance too. Then you shut down old instance and all clients are served with only new one. No connection lost. Of course this scenario is not always possible, depends on type of change you introduced in new version.
haproxy is a balancer-only solution. It can extremely efficiently balance requests to app servers in your farm.
For quite big service you end-up with something like:
api.domain resolving to round-robin N balancers
each balancer proxies requests to M webservers for static and P app servers for dynamic content. Oh well your REST API don't have static files, does it?
For quite small service (under 2K rps) all balancing is done inside one-two webservers.
Good answers already - if you don't want to implement the limiter yourself, there are also solutions like 3scale (http://www.3scale.net) which does rate limiting, analytics etc. for APIs. It works using a plugin (see here for the ruby api plugin) which hooks into the 3scale architecture. You can also use it via varnish and have varnish act as a rate limiting proxy.
I'd recommend implementing the rate limits outside of your application since otherwise the high traffic will still have the effect of killing your app. One good solution is to implement it as part of your apache proxy, with something like mod_evasive
What, at a minimum, should an application health-monitoring system do for you (the developer) and/or your boss (the IT Manager) and/or the operations (on-call) staff?
What else should it do above the minimum requirements?
Is monitoring the 'infrastructure' applications (ms-exchange, apache, etc.) sufficient or do individual user applications, web sites, and databases also need to be monitored?
if the latter, what do you need to know about them?
ADDENDUM: thanks for the input, i was really looking for application-level monitoring not infrastructure monitoring, but it is good to know about both
Whether the application is running.
Unusual cpu/memory/network usage.
Report any unhandled exceptions.
Status of various modules (if applicable).
Status of external components (databases, webservices, fileservers, etc.)
Number of pending background tasks (if applicable).
Maybe track usage of the application and report statistics on most/less used functionalities so you know where optimizations are most beneficial.
The answer is 'it depends'. Why do you need to monitor? How large is your operations staff? Do you need reporting? What is the application environment? Who cares if the application fails? Who cares if an exception happens? Are any of the errors recoverable? I could ask questions like these for a long time.
Great question.
We've been looking for some application-level monitoring solution for our needs some time ago without any luck. Popular monitoring solution are mostly addressed to monitor infrastrcture and - in my opinion - they are too complicated for a requirements of most of small and mid-sized companies.
We required (mainly) following features:
alerts - we wanted to know about
incident as fast as possible
painless management - hosted service wouldbe
the best
visualizations - it's good to know what is going on and take some knowledge from the data
Because we didn't find suitable solution we started to write our own. Finally we've ended with up-and-running service called AlertGrid. (You can check it for free of course.)
The idea behind it is to provide an easy way to handle custom monitoring scenarios. Integration API is very simple (one function with two required parameters). At the momment we and others are using it for:
monitor scheduled tasks (cron jobs)
monitor entire application logic execution
alert on errors in applications
we are also working on examples of basic infrastructure monitoring using AlertGrid
This is such an open ended question, but I would start with physical measurements.
1. Are all the machines I think are hosting this site pingable?
2. Are all the machines which should be serving content actually serving some content? (Ideally this would be hit from an external network.)
3. Is each expected service on each machine running?
3a. Have those services run recently?
4. Does each machine have hard drive space left? (Don't forget the db)
5. Have these machines been backed up? When was the last time?
Once one lays out the physical monitoring of the systems, one can address those specific to a system?
1. Can an automated script log in? How long did it take?
2. How many users are live? Have there been a million fake accounts added?
...
These sorts of questions get more nebulous, and can be very system specific. They also usually can be derived reactively when responding to phsyical measurements. Hard drive fill up, maybe the web server logs got filled up because a bunch of agents created too many fake users. That kind of thing.
While plan A shouldn't necessarily be reactive, it is the way many a site setup a monitoring system.
Minimum: make sure it is running :)
However, some other stuff would be very useful. For example, the CPU load, RAM usage and (in multiuser systems) which user is running what. Also, for applications that access network, a list of network connections for each app. And (if you have access to client computer(s)) it would be cool to be able to see the 'window title' of the app - maybe check each 2-3 minutes if it changed and save it. Also, a list of files open by the application could be very useful, but it is not a must.
I think this is fairly simple - monitor so that you can be warned early enough before something goes wrong. That means monitor dependencies and the application itself.
It's really hard to provide specifics if you're not going to give details on the application you're monitoring, so I'd say use that as a general rule.
At a minimum you want to know that the system is healthy. This is subjective in what defines your system is healthy. Is it computers are up, the needed resources exist, the data is flowing through the system, the data is properly producing results, etc, etc.
In my project we do monitoring of most of this and then some. It really comes down to what is the highest level that you can use to analyze that everything is working. In our case we need to know down to the data output. If you just need to know down to the are these machines up it saves you on trying to show an inexperienced end user what is wrong.
There are also "off the shelf" tools that will do a lot of the hard work for you if you are just looking too hard into data results. I particularly liked Nagios when I was looking around but we needed more than it could easily show so I wrote our own monitoring system. Basically we also watch for "peculiarities" in the system, memory / cpu spikes, etc...
thanks everyone for the input, i was really looking for application-level monitoring not infrastructure monitoring, but it is good to know about both
the difference is:
infrastructure monitoring would be servers plus MS Exchange Server, Apache, IIS, and so forth
application monitoring would be user machines and the specific programs that they use to do their jobs, and/or servers plus the data-moving/backend applications that they run to keep the data flowing
sometimes it's hard to draw the line - an oversimplified definition might be "if your team wrote it, it's an application; if you bought it, it's infrastructure"
i think in practice it is best to monitor both
What you need to do is to break down the business process of the application and then have the software emit events at major business components. In addition, you'll need to create end to end synthetic transactions (eg. emulating end users clicking on a website). All that data would be fed into an monitoring tool. In the past, I've done JMX for applications of which flowed into Tivoli Monitoring's JMX Adapter and then I've done scripts that implement a "fake user" and then pipe in the results into Tivoli Monitoring's Script Adapter. Tivoli Monitoring takes the data and then creates application health and performance charts from that raw data.