FoundationDB, the layer: Is it hosted on client application or server nodes? - communication

Recently I was reading about concept of layers in FoundationDB. I like their idea, the decomposition of storage from one side and access to it from other.
There are some unclear points regarding implementation of the layers. Especially how they communicate with the storage engine. There are two possible answers: they are parts of server nodes and communicate with the storage by fast native API calls (e.g. as linked modules hosted in the server process) -OR- hosted inside client application and communicate through network protocol. For example, the SQL layer of many RDBMS is hosted on the server. And how are things with FoundationDB?
PS: These two cases are different from the performance view, especially when the clinent-server communication is high-latency.

To expand on what Eonil said: the answer rests on the distinction between two different sense of "client" and "server".
Layers are not run within the database server processes. They use the FDB client API to make requests of the database, and do not (with one exception*) get to pierce the transactional key-value abstraction.
However, there is nothing stopping your from running the layers on the same physical (or virtual) server machines as the database server processes. And, as that post from the community site mentions, there are use cases where you might very much wish to do this in order to minimize latencies.
*The exception is the Locality API, which is mostly useful in exactly those cases where you want to co-locate client-side layers with the data on which they operate.

Layers are on top of client-side library feature.
Cited from http://community.foundationdb.com/questions/153/what-layers-do-you-want-to-see-first
That's a good question. One reason that it doesn't always make sense
to run layers on the server is that in a distributed database, that
data is scattered--the servers themselves are a network hop away from
a random piece of data, just like the client.
Of course, for something like an analytics layer which is aware of
what data each server contains, it makes sense to run a distributed
version co-located with each of the machines in the FDB cluster.

Related

Question regarding Monolithic vs. Microservice Architecture

I'm currently rethinking an architecture I was planning.
So suppose I have a system where there are about 8 different services interacting with a single database. Some services listen and react to database events and do stuff like sending SMS.
Then there's an API layer sitting on top of the database and a frontend connected to this API. So in my understanding this is rather monolithic.
In fact I don't see any advantage of using containers in this scenario. Their real advantage is that they can be swapped out, right? My intuition tells me that there is often no purpose in doing that except maybe some load balancing on API level. Instead many companies just seem to blindly jump on the hype train of containerizing everything.
Now the question arises, is docker the right tool for this context? In each forum people refrain from using docker for the sole purpose of a more resource efficient "VM" aggregating all services within a single container. However this is the only real scenario I'd see any advantages in using docker (the environment, e.g. alpine-linux, is the same on all customer's computers when rolling out the system).
Even docker-compose is not "grouping" containers together as a complete system only exposing port 443 but instead starts an infrastructure of multiple interacting containers. Oftentimes services like Kubernetes are then used for deploying these infrastructures on "nodes", i.e. VMs.
However, in my opinion it would be great to have a single self-contained container without putting them into a VM. This container would include every necessary service only exposing one port, e.g. 443.
Since I'm rather confused now, I'd really appreciate your help here.
Thanks in advance!
Kubernetes does many things and has many useful features. But Kubernetes also require that you architect your apps to follow The Twelve-Factor App principles. An important thing here is that your apps are stateless.
When the app is stateless, it is easy to scale out horizontally - this can also be done automatically when the load increases.
When the app is stateless, it is easy to do Rolling Deployments that upgrade the app to a new version without downtime.
You can run containers on bare metal Linux servers, but this is mostly very big servers. If you use a cloud, you probably want more VM instances, but distributed to 3 Availability Zones - for increased availability.
"Self-contained container - exposing one port". With Kubernetes, you typically use a private network and you only expose services via a single load balancer - typically on a port, but different URLs send traffic to different services.
Some services listen and react to database events and do stuff like sending SMS.
As I said, many things is easier when it is horizontal scalable, but this kind of app - that listen for events and react - is one of few examples where you can not scale horizontally. But it is a good fit for a serverless architecture instead, possibly on Kubernetes using Knative.
Now the question arises, is docker the right tool for this context?
My opinion is that most workload will run in containers. It is more a question about how it should be run in Kubernetes - one or multiple replicas. As stateless Deployments or stateful StatefulSet or some other way.

How do I call Web API from MVC without latency?

I'm thinking about moving my DAL which uses DocumentDb and Azure Table Storage to a separate Web API and host it as a cloud service on Azure.
The primary purpose of doing this is to make sure that I keep a high performance DAL that can scale up easily and independently of my front-end application -- currently ASP.NET MVC 5 running as a cloud service on Azure but I'll definitely add mobile apps as well. With DocumentDb and Azure Table Storage, I'm finding myself doing a lot of data handling in my C# code, therefore, I think it would be a good idea to keep that separate from my front-end application.
However, I'm very concerned about latency issues introduced by HTTP calls from one cloud service to another which would defeat the purpose of separating DAL into its own application/cloud service.
What is the best way to separate my DAL from my front-end application without introducing any latency issues?
I think the trade off between scaling-out/partitioning resources and network latency is unavoidable. That being said, you may find the trade-off well worth it for many reasons (i.e. enabling parallel execution of application tasks, increased reliability, etc.) when working w/ large-scale systems.
Here are some general tips to help you minimize the hit on network latency:
Use caching to avoid cross-service calls whenever possible.
Batch cross-service calls and re-use connections whenever possible to minimize the cost associated w/ traversing the NAT out of one cloud service and through the load balancer into another. Note - your application must also be able to handle dropped connections (inevitable in cloud architecture).
Monitor performance metrics as much as possible to take measurements and identify bottlenecks.
Co-locate your applications layers within the same datacenter to keep cross-service latency to a minimum.
You may also find the following literature useful: http://azure.microsoft.com/en-us/documentation/articles/best-practices-performance/
I recently split out my DAL to a WebAPI that serves data from DocumentDB for both the MVC website and mobile applications for the same reasons stated by the questioner.
The statements from aliuy are valid performance considerations generally accepted as good practice.
But more specifically - in order to call Web API from MVC without latency using Azure cloud services, one should specify same affinity group for each resource (websites, cloud services, etc).
Affinity groups are a way you can group your cloud services by
proximity to each other in the Azure datacenter in order to achieve
optimal performance. When you create an affinity group, it lets Azure
know to keep all of the services that belong to your affinity group as
physically close to each other as possible.
https://azure.microsoft.com/en-us/documentation/articles/virtual-networks-migrate-to-regional-vnet/

Rails app with multiple subdomains on separate machines

I've got a Rails application that has 3 main parts:
www.example.com: This is the main website
api.example.com: The API
dashboard.example.com: The dashboard interface for signed up users
I've currently got them setup in a single rails app with namespaces, and share models. I recently got accepted on the RackSpace startup program which gives me $2000 (!) worth of FREE cloud storage each month so I thought I'd distribute the app up into smaller pieces, so that the subdomains are hosted on separate servers.
How can I do that without deploying the same code to 3 different servers? I reviewed this related question, but it seemed to imply using a single Git repo for all three "projects" and I wasn't sure how that could would get deployed.
There are two very different goals intertwined here:
One is distributing the load over multiples machines. You don't need to break out your application into multiple apps for that. You could use a load balancer for this (incoming requests are spread across multiple machines running the same app), or split your app up.
The other is more about the logical architecture of your app. In general the bigger an app is, the more complicated it is and it's easier to work on simple apps. Therefore you might want to split your app up into smaller apps. This is orthogonal to the hardware issue - you could choose to split your app up threeways but still run them on the same machine (it does of course give you the flexibility to allocate resources differently to your app)
Coming back to you question, one aspect worth exploring is should there be any shared models? To what extent can (for example) your dashboard query your api (possibly private apis) rather than querying the database directly? Splitting up your application into multiple services makes some things easier, some things harder, but it does help you keep individual applications smaller and keep the boundaries between them clean without any hidden dependencies. This logical split doesn't have to correspond the the physical layout: one server may host several applications
Lastly the most direct answer to your question is to write a rails engine. An engine can contain pretty much anything an app can (applications are special cases of engines), for example models, views, controllers routes etc. You'd create one with your common code, package this up as a gem (doesn't have to be a public one) and add it each app's Gemfile.
Load Balancer
For lack of more experience, I would say what my gut is telling me - simply, you don't need any more servers (unless of course you have massive throughput)
You may be better putting this on the SuperUser community or something - I personally believe you'll be better with a load balancer (although I've never used one before), to give you the capacity to spread the load of your apps across mulitple server instances.
Quite how this will work is beyond me (I've never had to use one), but I would certainly look at that far quicker than trying to split your app between servers:
Load balancing is a computer networking method for distributing
workloads across multiple computing resources, such as computers, a
computer cluster, network links, central processing units or disk
drives. Load balancing aims to optimize resource use, maximize
throughput, minimize response time, and avoid overload of any one of
the resources. Using multiple components with load balancing instead
of a single component may increase reliability through redundancy.
Load balancing is usually provided by dedicated software or hardware,
such as a multilayer switch or a Domain Name System server process.
Load balancing is differentiated from channel bonding in that load
balancing divides traffic between network interfaces on per network
socket (OSI model layer 4) basis, while channel bonding implies a
division of traffic between physical interfaces at a lower level,
either per packet (OSI model Layer 3) or an data link (OSI model Layer
2) basis.

MassTransit in ASP.NET MVC site?

I'd like to decouple a number of business objects that my website is using to support actions of the users.
My website is a SaaS/B2B site and I do not anticiapte to have a need for "mega scale". My primary issue is a need to decouple business objects from each other, and perform occasional longer-running operations asynchronously - outside of execution of threads that handle user traffic.
Having said that, I really do not want to have a separate set of servers that process my messages, and would prefer for web servers to just host MassTransit or other Bus software) internaly in memory. Assured message delivery (at this point) is also not yet my most important concenrn. I plan to "outsorce" a number of supporting business actions to the bus so that they do not pollute my main business services/objects.
Is this possible? Do I need Loopback for now as a transport or do I need full RabbitMq? Will RabbitMQ require me to install yet another set of servers to host it?
TIA
Loopback is just for testing. Installing RMQ is the right path. You don't NEED different servers for it, but would suggest it. If you off load work to a bus, you don't really want that contending with resources for the website. Given that, you can run RMQ locally without any issue. It message volume is low, so is resource usage in RMQ. When you reacher higher volumes, IO can be a problem with RabbitMQ (or any MQ).

Should an iphone app communicate directly with a cassandra backend?

Obviously there are multiple steps and phases of implementing such a thing.
I was thinking I would eventually have a webserver that takes http json requests from the ios app, and then queries the cassandra backend and sends results back. I could load balance and all that fancy stuff still, and also provide a logical layer on server side, and keep the client app lightweight.
I'm not sure i understand how cassandra clients fit though. It seems like the cassandra objective c client could eliminate the need for the above approach.
I saw another question and answer but it wasnt clear, perhaps because it varys on the need.
An iPhone app should not directly connect to a Cassandra backend or any other DB store.
First of all, talking to a database often requires adapting a very specific binary protocol (for Cassandra in particular, binary CQL or Thrift). Writing an adapter that would let your Objective-C app communicate in this binary protocol is a major piece of work, and could easily cost more than the rest of your app in effort. If you hide the DB behind a web-server, however, you will be able to select from a variety of existing adapters available in different server-side languages, meaning that you don't need to redo all that low-level work. You'll only be responsible for a relatively small piece of server-side code that would translate your REST queries and forward them to one of the Cassandra adapters (which expose easy-to-use interfaces).
Secondly, if you wanted to connect to a remote database from the phone, your database server would have to open its ports to the internet at large, which is a very bad security practice, even if you use SSL and user credentials. Again, if you hide behind a web server, you will be putting in a layer of technology that has evolved for decades to remain secure on the public internet.
Finally, having your phone talk to Cassandra directly is a poor architectural pattern. When you write apps that communicate on the internet, you want them to know as little as possible about each other, only how to talk to each other (preferably in a standard protocol). That way you can replace or upgrade individual components while keeping everything else the same. This may not sound like a lot, but is actually the main reason why phones, or web browsers, don't directly talk to databases. (If this setup were a good idea in principle, the first two problems could be easily solved given enough engineering effort.)
The approach you first suggested with JSON and the web server is the only correct way to go.
Use something like RESTful API, there are many reasons for that.
if your servers ip addresses change you have to update all client, if you add more nodes you will need to update all clients, if you decide to upgrade your cassandra and some functions change your clients will break and you need to update all clients.

Resources