Is Erlang bad language for this app? - erlang

I am building framework for realtime web applications. I started to do it in Elixir, because
it is modern way how to develop application for Erlang VM. Erlang should be good if you need concurrency, fault tolerant, scalable apps (something like web server etc.). That is exactly what i need.
Question: Realtime framework always need for instance keep information about who is interested in what. This will be accomplished by using publish/subscribe pattern. So i will have 1000 clients subscribing to topic "newest-message". I need to save those clients (pid of process representing each client) somewhere to later access them if content for topic "newest-message" appears.
This is where i am confused if Erlang is really good for my framework.
ETS is probably the only option where to store shared data, but ETS is always copying everything if you save/access records. So that means copy 1000 pids always when i need to access them (instead of just iterating over some list, if i will do it for instance in c/java/python).
This will be probably great bottleneck if still copying many and many records from ETS (many clients, many subscriptions etc), i am right?

Sharing the state may be a sign of bad design. You can for example have process for each queue/topic and it will store its own list of subscribers. You send a message to that topic process and it in turn sends the message to clients. This way, you don't copy entire subscriber list.
If you need to process them in parallel, you can split the subscriber list between more processes.
The fault tolerance of Erlang is achieved, because it doesn't let you share state and you have to put more thought to the design, that will not involve state sharing, but will be efficient. This will pay off in the long run, so Erlang/Elixir is definitely good language for this kind of apps. Just look at RabbitMQ.

In my opnion, if you plan to save states like "who is interested in what" Erlang alone may not be a good idea. Of course, sometimes it is very convenient to pass everything in signals (like you'd do in Erlang), but when there is much content to store - lack of state in Erlang starts to hinder you rather than help.
On the other hand, you can keep a broad piece of convenience of Erlang and use it with a Java application, for example. Erlangs interface for Java enables you to connect both technologies quite easily, and at the same time you can use a Java app to store information for you (and save them somewhere, when necessary) and Erlang for the whole concurrent signaling real time part. Even better than that: you can still implement OTP with architecture like that, so you can create quite a lightweight application (because real-time logic is done by Erlang for you) being able to access stored data easily (because Java helps you here).

Related

Distributing an Erlang Chat system

I just finished Erlang in Practice screencasts (code here), and have some questions about distribution.
Here's the is overall architecture:
Here is how to the supervision tree looks like:
Reading Distributed Applications leads me to believe that one of the primary motivations is for failover/takeover.
However, is it possible, for example, the Message Router supervisor and its workers to be on one node, and the rest of the system to be on another, without much changes to the code?
Or should there be 3 different OTP applications?
Also, how can this system be made to scale horizontally? For example if I realize now that my system can handle 100 users, and that I've identified the Message Router as the main bottleneck, how can I 'just add another node' where now it can handle 200 users?
I've developed Erlang apps only during my studies, but generally we had many small processes doing only one thing and sending messages to other processes. And the beauty of Erlang is that it doesn't matter if you send a message within the same Erlang VM or withing the same Computer, same LAN or over the Internet, the call and the pointer to the other process looks always the same for the developer.
So you really want to have one application for every small part of the system.
That being said, it doesn't make it any simpler to construct an application which can scale out. A rule of thumb says that if you want an application to work on a factor of 10-times more nodes, you need to rewrite, since otherwise the messaging overhead would be too large. And obviously when you start from 1 to 2 you also need to consider it.
So if you found a bottleneck, the application which is particularly slow when handling too many clients, you want to run it a second time and than you need to have some additional load-balancing implemented, already before you start the second application.
Let's assume the supervisor checks the message content for inappropriate content and therefore is slow. In this case the node, everyone is talking to would be simple router application which would forward the messages to different instances of the supervisor application, in a round robin manner. In case those 1 or 2 instances are not enough, you could have the router written in a way, that you can manipulate the number of instances by sending controlling messages.
However for this, to work automatically, you would need to have another process monitoring the servers and discovering that they are overloaded or under utilized.
I know that dynamically adding and removing resources always sounds great when you hear about it, but as you can see it is a lot of work and you need to have some messaging system built which allows it, as well as a monitoring system which can monitor the need.
Hope this gives you some idea of how it could be done, unfortunately it's been over a year since I wrote my last Erlang application, and I didn't want to provide code which would be possibly wrong.

Why or when should I use messages queues such as RabbitMQ, ZeroMQ in Erlang?

Hello awesome Erlang community!
I'm making a little project that contains a Client and a Backend. (Complicated.. right?) :)
I'm making it in erlang.
The client and backend will be two separate processes and I'm wondering if I would need to (or should I) use some sort of message queue to get them to interact?
I know I can get them to interact using their PIDs and send messages using the "!" operator.
I guess what I'm trying to say is I'm struggling with finding an answer for this question:
"Why or when should I use message queues such as RabbitMQ, ZeroMQ in Erlang"?
You want to use a messaging library when you need something that the native message passing facility won't provide.
These include:
If you need to guarantee that your messages are processed at
least once, exactly once etc. (i.e. transaction)
If your system load is such that it would be convenient if you could
hold your messages on disk instead of memory (persistence)
You need other bells and whistles like security, interop with other
systems, complex messaging pattern (routing) etc.
I would go for a messaging component when you need to decouple the different layers of my system. Also, a messaging component allows you to be able to do different integration patters with your messages/requests like topic/fanout/route based on headers...
A messaging system is also used for scalibility purposes, so you can have multiple instances of the same process running simultaneously consuming from the same queue.
Last thing I want to mention is that RabbitMQ is a message broker but ZeroMQ is not, it is a messaging library.
If you can sacrifice reliability for performance, use ZeroMq.
If you need reliability (message persistence, etc), and can give up some performance, use a brokered solution like RabbitMq.

Sandboxing user code with Erlang

As far as I know Erlang provides advanced features for error handling and isolation of processes.
I'm building a system that allow user to submit their code to be executed on the shared server environment and need to make it safe.
Requirements are:
limit CPU and Memory usage individually for each user-process.
forbid user-process to communicate with other processes (except some processes specially designed for such purpose).
forbid access to all sytem resources (shell, file system, ...).
terminate user-process in case of errors or high resource consumption.
Is it possible to to all this with Erlang and keep it performance efficient?
In general, Erlang doesn't provide means to sandbox code which a user can inject. You can try writing your own piece of protection code, but it is rather hard.
A better choice would probably be a language like "safe haskell":
http://www.haskell.org/ghc/docs/7.4.2/html/users_guide/safe-haskell.html
which is specifically built to do this kind of thing.
The isolation provided by Erlang is not intended to protect against malicious modules being injected. In fact, there is no such protection in the distributed case either. As soon as two machines are connected, there is no limit to what you can do to the other machine.
There has been work done on Safe Erlang in the past and you can find several papers about it.
The ErlHive project addresses the problem in an interesting way.

Erlang: When is it logical to spawn a new process? When not?

If we have really heavy-processes system where process spawning is made for some kind of distribution of load - that's clear.
If we are talking about web-server : it's a good idea to spawn a new proccess for each connection, because then can be distributed. But what else? A single process for Model, View and Controller? Sounds strange, because they all run in a "liner" way, so it can not be good paralleled and we only get overhead on swapping. Also, those "Model, View and Controller" are so light, so they can stay in a single process, isn't it?
So, where is it good to spawn a new process excepting "new connection" situation.
Thank you in advice.
In general, it's anywhere you have a shared resource to manage. It may be a socket, or a database connection, but it may also be some shared in-memory data, or a state machine of some kind.
You may also want to do parallel processing of a list of values (see pmap).
To your "swapping" point you should know that Erlang processes do no use op-sys facilities for scheduling, and scheduling is all but free.
In the specific case of a web-application server, I understand your question. If you are writing a conventional web application with very little share state. Your web framework probably already handles caching and session state and such (these facilities will spawn process).
We are all highly indoctrinated into this stateless web application model. We have all been told since we were pups the stateful systems are hard to develop and they don't scale. I think you will find that there are those that are challenging that. As browser support for WebSockets improve, and with server-side language like Erlang and Clojure providing scalable platforms with safe state management, there will be those who are able to make more interactive web-applications. As an extreme example, could you image WoW as a web application?
One reason to spawn a new process for each connection is that it makes programming the connections much simpler. As a process only handles one connection doing things like having blocking access to data-bases, long polling or streaming becomes much easier. That this process blocks will not affect any other connections.
In Erlang the general "rule" is that you use processes to model concurrent activity and to manage shared resources. Processes are the fundamental way for structuring your system.

Sharing data system wide

Good evening.
I'm looking for a method to share data from my application system-wide, so that other applications could read that data and then do whatever they want with it (e.g. format it for display, use it for logging, etc). The data needs to be updated dynamically in the method itself.
WMI came to mind first, but then you've got the issue of applications pausing while reading from WMI. Additionally, i've no real idea how to setup my own namespace or classes if that's even possible in Delphi.
Using files is another idea, but that could get disk heavy, and it's a real awful method to use for realtime data.
Using a driver would probably be the best option, but that's a little too intrusive on the users end for my liking, and i've no idea on where to even start with it.
WM_COPYDATA would be great, but i'm not sure if that's dynamic enough, and whether it'll be heavy on resources or not.
Using TCP/IP would be the best choice for over the network, but obviously is of little use when run on a single system with no networking requirement.
As you can see, i'm struggling to figure out where to go with this. I don't want to go into one method only to find that it's not gonna work out in the end. Essentially, something like a service, or background process, to record data and then allow other applications to read that data. I'm just unsure on methods. I'd prefer to NOT need elevation/UAC to do this, but if needs be, i'll settle for it.
I'm running in Delphi 2010 for this exercise.
Any ideas?
You want to create some Client-Server architecture, which is also called IPC.
Using WM_COPYDATA is a very good idea. I found out it is very fast, lightweight, and efficient on a local machine. And it can be broadcasted over the system, to all applications at once (to be used with care if some application does not handle it correctly).
You can also share some memory, using memory mapped files. This is may be the fastest IPC option around for huge amount of data, but synchronization is a bit complex (if you want to share more than one buffer at once).
Named pipes are a good candidates for local. They tend to be difficult to implement/configure over a network, due to security issues on modern Windows versions (and are using TCP/IP for network communication - so you should better use directly TCP/IP instead).
My personal advice is that you shall implement your data sharing with abstract classes, able to provide several implementations. You may use WM_COPYDATA first, then switch to named pipes, TCP/IP or HTTP in order to spread your application over a network.
For our Open Source Client-Server ORM, we implemented several protocols, including WM_COPY_DATA, named pipe, HTTP, or direct in-process access. You can take a look at the source code provided for implementation patterns. Here are some benchmarks, to give you data from real implementations:
Client server access:
- Http client keep alive: 3001 assertions passed
first in 7.87ms, done in 153.37ms i.e. 6520/s, average 153us
- Http client multi connect: 3001 assertions passed
first in 151us, done in 305.98ms i.e. 3268/s, average 305us
- Named pipe access: 3003 assertions passed
first in 78.67ms, done in 187.15ms i.e. 5343/s, average 187us
- Local window messages: 3002 assertions passed
first in 148us, done in 112.90ms i.e. 8857/s, average 112us
- Direct in process access: 3001 assertions passed
first in 44us, done in 41.69ms i.e. 23981/s, average 41us
Total failed: 0 / 15014 - Client server access PASSED
As you can see, fastest is direct access, then WM_COPY_DATA, then named pipes, then HTTP (i.e. TCP/IP). Message was around 5 KB of JSON data containing 113 rows, retrieved from server, then parsed on the client 100 times (yes, our framework is fast :) ). For huge blocks of data (like 4 MB), WM_COPY_DATA is slower than named pipes or HTTP-TCP/IP.
Where are several IPC (inter-process communication) methods in Windows. Your question is rather general, I can suggest memory-mapped files to store your shared data and message broadcasting via PostMessage to inform other application that the shared data changed.
If you don't mind running another process, you could use one of the NoSQL databases.
I'm pretty sure that a lot of them won't have Delphi drivers, but some of them have REST drivers and hence can be driven from pretty much anything.
Memcached is an easy way to share data between applications. Memcached is an in-memory key-value store for small chunks of arbitrary data (strings, objects).
A Delphi 2010 client for Memcached can be found on google code:
http://code.google.com/p/delphimemcache/
related question:
Are there any Caching Frameworks for Delphi?
Googling for 'delphi interprocess communication' will give you lots of pointers.
I suggest you take a look at http://madshi.net/, especially MadCodeHook (http://help.madshi.net/madCodeHook.htm)
I have good experience with the product.

Resources