Asking the following question in the context that any user could decide to upload a large volume of large files and affect availability of the system for all people using the system.
Is it advisable from a performance perspective to allow WebDav & IMAP traffic in Alfresco?
From a performance perspective WebDAV is not a very expensive protocol for Alfresco as long your users don't use sync tools.
IMAP instead should be used only for very limited use cases or turned off because the implementation is traversing over the whole mount point again and again since Alfresco doesn't support idle mode. Traversing over the tree is like a stress test for your database.
Related
In which mode neo4j database should be used embedded or rest server?
My main concerns are :
Performance
Horizontal scaling (HA,Clustering) - essential as application is very big.
Transactional support(in frameworks like SDN,Grails Plugin,structr etc.)
Deployment server support like amazon,GrapheneDB etc.
Easiness of switching from one to another
Scaling(size of database)
Disclaimer: I'm one of the founders of GrapheneDB.
I'm not an expert in embedded mode so my answer might be biased but I will try my best:
Embedded is more performant at this time than server
Clustering is supported in embedded as well as in server
Transactional support is available in both modes AFAIK. Spring Data, however has currently bad performance over Rest/server.
From my POV embedded has the disadvantage of being coupled to your app/server deployment.
There is one more option which you haven't brought up, which is using unmanaged server extensions.
Using extensions you can get the best of both modes:
You write your code on top of the Java API and it's executed locally, so you get extremely good performance.
You can run the server in server mode, making operations easier and also enabling you to host on a separate remote host, on any cloud environment.
GrapheneDB supports unmanaged extensions and it's the option we currently recommend for scenarios where extra performance is needed.
Recently I was reading about concept of layers in FoundationDB. I like their idea, the decomposition of storage from one side and access to it from other.
There are some unclear points regarding implementation of the layers. Especially how they communicate with the storage engine. There are two possible answers: they are parts of server nodes and communicate with the storage by fast native API calls (e.g. as linked modules hosted in the server process) -OR- hosted inside client application and communicate through network protocol. For example, the SQL layer of many RDBMS is hosted on the server. And how are things with FoundationDB?
PS: These two cases are different from the performance view, especially when the clinent-server communication is high-latency.
To expand on what Eonil said: the answer rests on the distinction between two different sense of "client" and "server".
Layers are not run within the database server processes. They use the FDB client API to make requests of the database, and do not (with one exception*) get to pierce the transactional key-value abstraction.
However, there is nothing stopping your from running the layers on the same physical (or virtual) server machines as the database server processes. And, as that post from the community site mentions, there are use cases where you might very much wish to do this in order to minimize latencies.
*The exception is the Locality API, which is mostly useful in exactly those cases where you want to co-locate client-side layers with the data on which they operate.
Layers are on top of client-side library feature.
Cited from http://community.foundationdb.com/questions/153/what-layers-do-you-want-to-see-first
That's a good question. One reason that it doesn't always make sense
to run layers on the server is that in a distributed database, that
data is scattered--the servers themselves are a network hop away from
a random piece of data, just like the client.
Of course, for something like an analytics layer which is aware of
what data each server contains, it makes sense to run a distributed
version co-located with each of the machines in the FDB cluster.
We intend to design a system with three "tiers".
HQ, with a single server
lots of "nodes" on a regional basis
users, with iPads.
HQ communicates 2-way with the nodes which communciate 2-way with the users. Users never communicate with HQ nor vice-versa.
The powers that be decree a Windows app from HQ (using Delphi) and a native desktop app for the users' iPads. They have no opinion on the nodes.
If there are compelling technical arguments, I might be able to beat them down from "decree" to "prefer" on the Windows program (and, for isntance, make it browser based). The nodes have no GUI, they just sit there playing middle-man.
What's the best way for these things to communicate (SOAP/HTTP/AJAX/jQuery/home-brewed-protocol-on-top-of-TCP/something-else?) Is it best to use the same protocol end to end, or different protocols for hq<-->node and node<-->iPad?
Both ends of each of those two interfaces might wish to initiate a transaction (which I can easily do if I roll my own protocol), so should I use push/pull/long-poll or what?
I hope that this description makes sense. Please ask questions if it does not. Thanks.
Update:
File size is typcially below 1MB with nothing likely to be above 10MB or even 5MB. No second file will be sent before a first file is acknowledged.
Files flow "downhill" from HQ to node to iPad. Files will never flow "uphill", but there will be some small packets of data (in addition to acks) which are initiated by user action on the iPad. These will go to the local node and then to the HQ. We are probably talking <128 bytes.
I suppose there will also be general control & maintenance traffic at a low rate, in all directions.
For push / pull (publish / subscribe or peer to peer communication), cross-platform message brokers could be used. I am not sure if there are (iOS) client libraries for Microsoft Message Queue (MSMQ), but I would also evaluate open source solutions like HornetQ, Apache ActiveMQ, Apollo, OpenMQ, Apache QPid or RabbitMQ.
All these solutions provide a reliable foundation for distributed messaging, like failover, clustering, persistence, with high performance and many clients attached. On this infrastructure message with any content type (JSON, binary, plain text) can be exchanged, and on top messages can contain routing and priority information. They also support transacted messaging.
There are Delphi and Free Pascal client libraries available for many enterprise quality open source messaging products. (I am am the author of some of them, supporting ActiveMQ, Apollo, HornetQ, OpenMQ and RabbitMQ)
Check out MessagePack: http://msgpack.org/
Also, here's more RPC discussion on SO:
RPC frameworks available?
MessagePack: fast cross-platform serializer and RPC - please share experience
ICE might be of interest to you: http://zeroc.com/index.html
They have an iOS layer: http://zeroc.com/icetouch/index.html
IMHO there are too little requisites to decide what technology to use. What data are exchanged, how often, what size? Are there request/response time constraints? etc. etc. Never start selecting a technology before you understand your needs deeply.
I am looking for a dedicated server because shared webhosting solutions have some limitations.
I am going to start with one appliation (web server + db) but in the future I will need more resources for more applications. I am starting small so the price is very important right now the quality is more important though.
The requirements are like (not sure what I forgot)
scalable hw resources (memory, hdd, bandwith)
linux/unix based
able to install programs
ssh
ssl/https
backup solution?
unlimited number of outgoing emails
'simple scripts' ?
server user management
Update
Does the location of the server matters as I want to target my 'visitors' world wide?
Well I don't know where you are from and if it matters to you where the server's at. But I am very happy with swiss based hostfactory (I host some ecommerce solutions there). The support team reacts very fast and you'll get full control of the server (rdp access on windows, shell access on linux).
Check it out here: hostfactory
Hardware resources are scalable via the web interface.
Yes - location matters. If you are going with just one server location, you need to make your best guess as to where most of your visitors are going to come from.
The plumbing of the internet tends to be US centric, so if you are not sure, and have no legal restrictions on where your data can live, that may be your best (and often cheapest) option.
I went for linode
I'm really impressed with the power of cloud computing when it comes to the possibility to scale up and down your facilities depending on your load.
How can I shift my paradigm and learn to write my applications in that way? Write it once and forget(no matter of the future load) would be the best solution.
How can I practice my skills in that area?
Setup virtualization environment when I can add another VMs into the private cloud(via command line?) on some smart algorithms to foresee the load for some period of time?
Ideally I want to practice it without buying actual Cloud computing services and just on my hardware.
The only thing I want to practice here is app/web role and/or message queue systems scaling when current workers have too many jobs in queue. So let's rule out database scaling from the question's goal as too big topic.
One option I will throw out is to use a native Cloud execution framework. You might look at CloudIQ Platform. One component is CloudIQ Engine. It allows you to develop cloud native apps in C/C++, Java and .NET. You get the capabilities of scale up by simply adding workers to your cloud. The framework automatically distributes your applications to the new machine(s), and once installed, will begin sending work to them as requests come in. So in effect the cloud handles your queueing issue for you.
Check out the Download and Community links for more information.
You should try AWS- Amazon's offering a free tier that gives you storage, messaging and micro instances (only linux). you can start developing small try-outs without paying. writing an application that scales isn't that hard- try to break your flow into small, concurrent tasks. client-server applications are even easier- use a load balancer to raise\terminate servers by demand.