In which mode neo4j database should be used embedded or rest server?
My main concerns are :
Performance
Horizontal scaling (HA,Clustering) - essential as application is very big.
Transactional support(in frameworks like SDN,Grails Plugin,structr etc.)
Deployment server support like amazon,GrapheneDB etc.
Easiness of switching from one to another
Scaling(size of database)
Disclaimer: I'm one of the founders of GrapheneDB.
I'm not an expert in embedded mode so my answer might be biased but I will try my best:
Embedded is more performant at this time than server
Clustering is supported in embedded as well as in server
Transactional support is available in both modes AFAIK. Spring Data, however has currently bad performance over Rest/server.
From my POV embedded has the disadvantage of being coupled to your app/server deployment.
There is one more option which you haven't brought up, which is using unmanaged server extensions.
Using extensions you can get the best of both modes:
You write your code on top of the Java API and it's executed locally, so you get extremely good performance.
You can run the server in server mode, making operations easier and also enabling you to host on a separate remote host, on any cloud environment.
GrapheneDB supports unmanaged extensions and it's the option we currently recommend for scenarios where extra performance is needed.
Related
Are there major disadvantages using the embedded Firebird 3 in a multi-user application server (Delphi Webbroker) instead of the full blown server install?
The application usually has very short transactions with low data volume.
As far as I am informed accessing one database file with multiple threads through the embedded server is not problematic but user security is not available. As the application server does the rights stuff I do not need Firebird security.
But will I loose performance or things like garbage collection?
Firebird Embedded provides all the features (except network access and authentication) that a normal Firebird server provides. However, because it is in-process, any problems that cause your application to crash, will take Firebird with it and vice versa.
Other possible downsides:
Garbage collection will - as far as I know - always use the 'cooperative' model (where the connection to find old record versions, is the one that cleans it up),
You can't use other tools to access your database remotely which may make administration harder,
You can't put your database on a separate server from your web application (think of security requirements).
Personally, I would only choose Firebird Embedded if the situation calls for it. In all other situations, I will use Firebird Server.
Obviously there are multiple steps and phases of implementing such a thing.
I was thinking I would eventually have a webserver that takes http json requests from the ios app, and then queries the cassandra backend and sends results back. I could load balance and all that fancy stuff still, and also provide a logical layer on server side, and keep the client app lightweight.
I'm not sure i understand how cassandra clients fit though. It seems like the cassandra objective c client could eliminate the need for the above approach.
I saw another question and answer but it wasnt clear, perhaps because it varys on the need.
An iPhone app should not directly connect to a Cassandra backend or any other DB store.
First of all, talking to a database often requires adapting a very specific binary protocol (for Cassandra in particular, binary CQL or Thrift). Writing an adapter that would let your Objective-C app communicate in this binary protocol is a major piece of work, and could easily cost more than the rest of your app in effort. If you hide the DB behind a web-server, however, you will be able to select from a variety of existing adapters available in different server-side languages, meaning that you don't need to redo all that low-level work. You'll only be responsible for a relatively small piece of server-side code that would translate your REST queries and forward them to one of the Cassandra adapters (which expose easy-to-use interfaces).
Secondly, if you wanted to connect to a remote database from the phone, your database server would have to open its ports to the internet at large, which is a very bad security practice, even if you use SSL and user credentials. Again, if you hide behind a web server, you will be putting in a layer of technology that has evolved for decades to remain secure on the public internet.
Finally, having your phone talk to Cassandra directly is a poor architectural pattern. When you write apps that communicate on the internet, you want them to know as little as possible about each other, only how to talk to each other (preferably in a standard protocol). That way you can replace or upgrade individual components while keeping everything else the same. This may not sound like a lot, but is actually the main reason why phones, or web browsers, don't directly talk to databases. (If this setup were a good idea in principle, the first two problems could be easily solved given enough engineering effort.)
The approach you first suggested with JSON and the web server is the only correct way to go.
Use something like RESTful API, there are many reasons for that.
if your servers ip addresses change you have to update all client, if you add more nodes you will need to update all clients, if you decide to upgrade your cassandra and some functions change your clients will break and you need to update all clients.
Recently I was reading about concept of layers in FoundationDB. I like their idea, the decomposition of storage from one side and access to it from other.
There are some unclear points regarding implementation of the layers. Especially how they communicate with the storage engine. There are two possible answers: they are parts of server nodes and communicate with the storage by fast native API calls (e.g. as linked modules hosted in the server process) -OR- hosted inside client application and communicate through network protocol. For example, the SQL layer of many RDBMS is hosted on the server. And how are things with FoundationDB?
PS: These two cases are different from the performance view, especially when the clinent-server communication is high-latency.
To expand on what Eonil said: the answer rests on the distinction between two different sense of "client" and "server".
Layers are not run within the database server processes. They use the FDB client API to make requests of the database, and do not (with one exception*) get to pierce the transactional key-value abstraction.
However, there is nothing stopping your from running the layers on the same physical (or virtual) server machines as the database server processes. And, as that post from the community site mentions, there are use cases where you might very much wish to do this in order to minimize latencies.
*The exception is the Locality API, which is mostly useful in exactly those cases where you want to co-locate client-side layers with the data on which they operate.
Layers are on top of client-side library feature.
Cited from http://community.foundationdb.com/questions/153/what-layers-do-you-want-to-see-first
That's a good question. One reason that it doesn't always make sense
to run layers on the server is that in a distributed database, that
data is scattered--the servers themselves are a network hop away from
a random piece of data, just like the client.
Of course, for something like an analytics layer which is aware of
what data each server contains, it makes sense to run a distributed
version co-located with each of the machines in the FDB cluster.
Let's assume we have a web project in which we want to have ~10000 web clients connected to the server simultaneously. Let's also assume that one client session lasts about 25 minutes.
If we compare LAMP stack or any other popular web stack/framework (Ruby on Rails with Apache on Linux, etc.) to a web project built in Erlang/OTP - what does Erlang/OTP have in terms of fault tolerance that other frameworks don't?
What event can happen to a client that will cause the whole LAMP stack crash, while Erlang/OTP will stand its ground?
Note that a typical LAMP-stack does employ some fault tolerance. In particular, if a request in the LAMP stack fail, only that request will, while the rest of the code will run on. This kind of protection allows you to have faults in a single request without that hurting other requests.
Erlang provides this idea of "able to cope with smaller unforseen errors" at a much finer grained scale. You may have other subsystems in an application, and the same kind of tolerance to errors can be extended to those. You won't get it "for free", but the tooling is there to build a system which is robust. Imagine a client error in the LAMP stack. This will often lead to a disconnect of that client. It may not be so in Erlang and the client can keep on running.
For a system of 10000 clients, Erlang provides the advantage that you can have a process per client. Or perhaps 10 processes per client. That is much harder to pull of in many languages since a process/thread is rather heavy and expensive. Note that interprocess communication between the clients are easy, also if some clients are on another machine (imagine extending to a distributed cluster some day).
If you write your code in a certain way, you can make sure that if a client crashes, for one reason or the other, then its state is properly cleaned up by other processes. That can avoid lots and lots of small nasty leaks in state as well.
what does Erlang/OTP have in terms of fault tolerance that other frameworks don't?
Now, Erlang/OTP has very minimal if not, Zero side effects. Because of its concurrency, a web server like yaws literally spawns a small web server for every connection. If one user is affected by a given web service fault in your application, all other users will never notice and the only that users process may exit.
With OTP, you can build applications with a number of supervisors, such that if a server goes down, its restarted and many other functions and options you may need.
Erlang's distribution allows us to write distributed applications. I have personally used yaws web server in building a web application.
My experience is that which ever web server you may pick, say Mochiweb, a tutorial found here: http://alexmarandon.com/articles/mochiweb_tutorial/, is quite impressive in performance. Infact, a few years back, the oldest version of yaws web Server was bench marked with the (then) newest version of Apache, and the results of the bench mark are very awakening. When i went through a million user comet application with Mochiweb, which has Part 2 and Part 3, i was impressed. These are some of the few examples of powerful web frameworks built for the web.
And by the way, have you heard of these new NO SQL Databases with REST (HTTP) interface developed in Erlang/OTP e.g. Membase Server [home page here: http://www.couchbase.com/products-and-services/membase-server], Couch DB, and Riak, their performance is very impressive, meaning that their web/REST interface is very stable and due to their impressive , documented write through put, they have proved that their underlying Technology (Erlang/OTP), was built not only for high availability and fault-tolerant systems only, but for the web too! Just read through this document: http://blog.couchbase.com/why-membase-uses-erlang
Many more web frameworks built in erlang and are very impressive can be found listed on the wiki page here: http://en.wikipedia.org/wiki/Erlang_(programming_language). A probably better summary of its features that make it powerful on the web can be found in here: http://cs.nyu.edu/~lerner/spring10/projects/Erlang.pdf
I'm really impressed with the power of cloud computing when it comes to the possibility to scale up and down your facilities depending on your load.
How can I shift my paradigm and learn to write my applications in that way? Write it once and forget(no matter of the future load) would be the best solution.
How can I practice my skills in that area?
Setup virtualization environment when I can add another VMs into the private cloud(via command line?) on some smart algorithms to foresee the load for some period of time?
Ideally I want to practice it without buying actual Cloud computing services and just on my hardware.
The only thing I want to practice here is app/web role and/or message queue systems scaling when current workers have too many jobs in queue. So let's rule out database scaling from the question's goal as too big topic.
One option I will throw out is to use a native Cloud execution framework. You might look at CloudIQ Platform. One component is CloudIQ Engine. It allows you to develop cloud native apps in C/C++, Java and .NET. You get the capabilities of scale up by simply adding workers to your cloud. The framework automatically distributes your applications to the new machine(s), and once installed, will begin sending work to them as requests come in. So in effect the cloud handles your queueing issue for you.
Check out the Download and Community links for more information.
You should try AWS- Amazon's offering a free tier that gives you storage, messaging and micro instances (only linux). you can start developing small try-outs without paying. writing an application that scales isn't that hard- try to break your flow into small, concurrent tasks. client-server applications are even easier- use a load balancer to raise\terminate servers by demand.