Full discloser: I've only just started learning erlang. So forgive me if I'm being nieve. In the erlang manual the signature for the monitor function is:
monitor(Type, Item) -> MonitorRef
According to the rest of the documentation:
Currently only processes can be monitored, i.e. the only allowed Type
is process, but other types may be allowed in the future.
Monitor semantics seem pretty inherently tied to processes i.e. it doesn't make sense to monitor something other than a process. Having this extra parameter seems to border on paranoia rather than trying to plan for the future. What are these other things that might be allowed to be monitored in the future?
I don't know what the designers may have had in mind, but I'd guess remote nodes.
It may also make sense that a process group (http://www.erlang.org/doc/man/pg2.html) could be monitored.
Related
I am building framework for realtime web applications. I started to do it in Elixir, because
it is modern way how to develop application for Erlang VM. Erlang should be good if you need concurrency, fault tolerant, scalable apps (something like web server etc.). That is exactly what i need.
Question: Realtime framework always need for instance keep information about who is interested in what. This will be accomplished by using publish/subscribe pattern. So i will have 1000 clients subscribing to topic "newest-message". I need to save those clients (pid of process representing each client) somewhere to later access them if content for topic "newest-message" appears.
This is where i am confused if Erlang is really good for my framework.
ETS is probably the only option where to store shared data, but ETS is always copying everything if you save/access records. So that means copy 1000 pids always when i need to access them (instead of just iterating over some list, if i will do it for instance in c/java/python).
This will be probably great bottleneck if still copying many and many records from ETS (many clients, many subscriptions etc), i am right?
Sharing the state may be a sign of bad design. You can for example have process for each queue/topic and it will store its own list of subscribers. You send a message to that topic process and it in turn sends the message to clients. This way, you don't copy entire subscriber list.
If you need to process them in parallel, you can split the subscriber list between more processes.
The fault tolerance of Erlang is achieved, because it doesn't let you share state and you have to put more thought to the design, that will not involve state sharing, but will be efficient. This will pay off in the long run, so Erlang/Elixir is definitely good language for this kind of apps. Just look at RabbitMQ.
In my opnion, if you plan to save states like "who is interested in what" Erlang alone may not be a good idea. Of course, sometimes it is very convenient to pass everything in signals (like you'd do in Erlang), but when there is much content to store - lack of state in Erlang starts to hinder you rather than help.
On the other hand, you can keep a broad piece of convenience of Erlang and use it with a Java application, for example. Erlangs interface for Java enables you to connect both technologies quite easily, and at the same time you can use a Java app to store information for you (and save them somewhere, when necessary) and Erlang for the whole concurrent signaling real time part. Even better than that: you can still implement OTP with architecture like that, so you can create quite a lightweight application (because real-time logic is done by Erlang for you) being able to access stored data easily (because Java helps you here).
I just finished Erlang in Practice screencasts (code here), and have some questions about distribution.
Here's the is overall architecture:
Here is how to the supervision tree looks like:
Reading Distributed Applications leads me to believe that one of the primary motivations is for failover/takeover.
However, is it possible, for example, the Message Router supervisor and its workers to be on one node, and the rest of the system to be on another, without much changes to the code?
Or should there be 3 different OTP applications?
Also, how can this system be made to scale horizontally? For example if I realize now that my system can handle 100 users, and that I've identified the Message Router as the main bottleneck, how can I 'just add another node' where now it can handle 200 users?
I've developed Erlang apps only during my studies, but generally we had many small processes doing only one thing and sending messages to other processes. And the beauty of Erlang is that it doesn't matter if you send a message within the same Erlang VM or withing the same Computer, same LAN or over the Internet, the call and the pointer to the other process looks always the same for the developer.
So you really want to have one application for every small part of the system.
That being said, it doesn't make it any simpler to construct an application which can scale out. A rule of thumb says that if you want an application to work on a factor of 10-times more nodes, you need to rewrite, since otherwise the messaging overhead would be too large. And obviously when you start from 1 to 2 you also need to consider it.
So if you found a bottleneck, the application which is particularly slow when handling too many clients, you want to run it a second time and than you need to have some additional load-balancing implemented, already before you start the second application.
Let's assume the supervisor checks the message content for inappropriate content and therefore is slow. In this case the node, everyone is talking to would be simple router application which would forward the messages to different instances of the supervisor application, in a round robin manner. In case those 1 or 2 instances are not enough, you could have the router written in a way, that you can manipulate the number of instances by sending controlling messages.
However for this, to work automatically, you would need to have another process monitoring the servers and discovering that they are overloaded or under utilized.
I know that dynamically adding and removing resources always sounds great when you hear about it, but as you can see it is a lot of work and you need to have some messaging system built which allows it, as well as a monitoring system which can monitor the need.
Hope this gives you some idea of how it could be done, unfortunately it's been over a year since I wrote my last Erlang application, and I didn't want to provide code which would be possibly wrong.
This might be a trivial question for some erlang veterans but it would be nice to know since it wasn't clear in the documentation. Many distributed systems algorithms make use of the comparability of unique pids to make decisions. Erlang is kind enough to offer build-in comparison of pids, However, I was wandering whether comparisons stay consistent among multiple machines referring to both local and external pids. My guess is there are no comparison guarantees but I might be wrong, am I?
Erlang stores more than just a simple process ID in its PID structures; the data includes a unique identifier for the remote node (whether it be another local or a remote VM).
See Can someone explain the structure of a Pid in Erlang? for details.
Thus, you're guaranteed to not send a message to the wrong PID on the wrong VM (or misinterpret the source of a received message), at least not without making an error somewhere in your code.
Update: It occurs to me that I may well have been answering the wrong question. If you're asking how the comparisons would work (e.g., if Pid1 < Pid2, whether Pid1 is local or remote), all I can state with some confidence is that the ordering will be constant, based on http://learnyousomeerlang.com/starting-out-for-real#bool-and-compare.
If we have really heavy-processes system where process spawning is made for some kind of distribution of load - that's clear.
If we are talking about web-server : it's a good idea to spawn a new proccess for each connection, because then can be distributed. But what else? A single process for Model, View and Controller? Sounds strange, because they all run in a "liner" way, so it can not be good paralleled and we only get overhead on swapping. Also, those "Model, View and Controller" are so light, so they can stay in a single process, isn't it?
So, where is it good to spawn a new process excepting "new connection" situation.
Thank you in advice.
In general, it's anywhere you have a shared resource to manage. It may be a socket, or a database connection, but it may also be some shared in-memory data, or a state machine of some kind.
You may also want to do parallel processing of a list of values (see pmap).
To your "swapping" point you should know that Erlang processes do no use op-sys facilities for scheduling, and scheduling is all but free.
In the specific case of a web-application server, I understand your question. If you are writing a conventional web application with very little share state. Your web framework probably already handles caching and session state and such (these facilities will spawn process).
We are all highly indoctrinated into this stateless web application model. We have all been told since we were pups the stateful systems are hard to develop and they don't scale. I think you will find that there are those that are challenging that. As browser support for WebSockets improve, and with server-side language like Erlang and Clojure providing scalable platforms with safe state management, there will be those who are able to make more interactive web-applications. As an extreme example, could you image WoW as a web application?
One reason to spawn a new process for each connection is that it makes programming the connections much simpler. As a process only handles one connection doing things like having blocking access to data-bases, long polling or streaming becomes much easier. That this process blocks will not affect any other connections.
In Erlang the general "rule" is that you use processes to model concurrent activity and to manage shared resources. Processes are the fundamental way for structuring your system.
I am implementing some networking stuff in our project. It has been decided that the communication is very important and we want to do it synchronously. So the client sends something the server acknowledges.
Are there some general best practices for the interaction between the client and the server. For instance if there isn't an answer from the server should the client automatically retry? Should there be a timeout period before it retries? What happens if the acknowledgement fails? At what point do we break the connection and reconnect? Is there some material? I have done searches but nothing is really coming up.
I am looking for best practices in general. I am implementing this in c# (probably with sockets) so if there is anything .Net specific then please let me know too.
First rule of networking - you are sending messages, you are not calling functions.
If you approach networking that way, and don't pretend that you can call functions remotely or have "remote objects", you'll be fine. You never have an actual "thing" on the other side of the network connection - what you have is basically a picture of that thing.
Everything you get from the network is old data. You are never up to date. Because of this, you need to make sure that your messages carry the correct semantics - for instance, you may increment or decrement something by a value, you should not set its value to the current value plus or minus another (as the current value may change by the time your message gets there).
If both the client and the server are written in .NET/C# I would recommend WCF insted of raw sockets as it saves you a from a lot of plumbing code with serialization and deserialization, synchronization of messages etc.
That maybe doesn't really answer your question about best practices though ;-)
The first thing to do is to characterize your specific network in terms of speed, probability of lost messages, nominal and peak traffic, bottlenecks, client and server MTBF, ...
Then and only then you decide what you need for your protocol. In many cases you don't need sophisticated error-handling mechanisms and can reliably implement a service with plain UDP.
In few cases, you will need to build something much more robust in order to maintain a consistent global state among several machines connected through a network that you cannot trust.
The most important thing I found is that messages always should be stateless (read up on REST if this means nothing to you)
For example if your application monitors the number of shipments over a network do not send incremental updates (+x) but always the new total.
In a common think about network programming, I think you should learn about :
1. Socket (of course).
2. Fork and Threading.
3. Locking process (use mutex or semaphore or others).
Hope this help..