how mature is SD erlang project? - erlang

do you have any experience with SD Erlang project?
There seems to be implemented many interesting concepts regarding the comm mesh optimalizations and I'm just curious if some of you used those in production already or in some real project at least.
SD erlang repo
Thanks!

The project has finished a week ago. The main ideas behind SD Erlang are reducing the number of connections Erlang nodes maintain while keeping transitivity and common namespace for groups of nodes. Benchmarks that we used (Orbit, Ant Colony Optimization (ACO), and Instant Messenger) showed very promising results. Unfortunately, we didn't have enough human resources to refactor Sim-Diasca simulation engine. So, no, SD Erlang hasn't been used yet in a real application.
At the moment we are writing up the last deliverable that will provide an overview of what has been achieved. It will appear here in a few weeks (D6.2). In general we are happy with the results we get using SD Erlang, so there are plans for a follow up project to continue to work on it but currently this is work in progress.

This is not a direct answer but I will use SD-Erlang in a embedded application which needs to scale to hundreds of nodes (small embedded CPUs). From what I have seen its ready to be tried out in a real application. To furtehr evaluate lets consider the alternatives:
You have only a few distributed nodes: then you probably don't need it and can just connect all the nodes and for name registry use either the global module (slow but sturdy) or gprocwith the new locks_leader branch which avoids the quite broken gen_leader which so far prevented using gproc in distributed mode in production.
You need many nodes (how many depends on your hardware and requirements but you start to get into interesting territory with > 70 nodes)
Use SD-Erlang and fix whatever problems you encounter in production, or at least report them. It certainly solves a lot of the problems you get with normal Erlang distribution
Roll your own solution either with playing with different cookie values or with hidden nodes: hint you can set different cookie values for different peer nodes. But then you need to roll your own global name registry and management code: looks like a variant of Greenspuns 10th rule or closer to Erlang Virdings 1st rule : you probably will result in implementing half of SD Erlang yourself.
Don't use Erlang distribution at all. That seems to be the industry standard that for anything involving more nodes or crossing data-centers you shouldn't use Erlang distribution at all but run your own protocols. My personal opinion is to rather fix Erlang Distributions problems than just ditch it. Its much too useful and time saving when it works for a use case to just give up on it. And I see SD-Erlang as being the fix for the "too many nodes" problem, its at least the right starting point.

Related

What is the maximum (practical) number of nodes in an Erlang system

I wish to create a platform as a service in the financial markets using Erlang/Elixir. I will provide AWS lambda-style functions in financial markets, but rather than being accessible via web/rest/http, I plan to distribute my own ARM-based hardware terminals to clients (Nvidia Jetson TX2-based or similar, so decent hardware). They will access the functions from these terminals. I want said terminals to be full nodes in the system. So they will use the actor model to message pass to my central servers, and indeed, the terminals might message pass amongst each other if terminal users decide to put their own functions online.
Is this a viable model? Could I run 1000 terminals like this? 100 000? What kinds of limitations might I start bumping into? Is Erlang message routing scalable enough to imagine such a network still being performant if we had soft-real time financial markets streaming data flowing around? (mostly from central servers to terminals, but a good proportion possible moving directly around from terminal to terminal). We could have a system where up to 100k or more different "subscription" data channel processes were available, many of them taking input and producing output every second.
Basically I'd like a canonical guide to the scalability capabilities of an Erlang system something like the above. Ideally I'd also like some guide to the security implications of such a system ie. would global routing tables or any other part of the system be compromisable by a rogue terminal user, or can edge nodes be partly "sealed off" from sensitive parts of the rest of the Erlang network?
Note that I'd want to make heavy use of ports/NIFs for high-compute processes.
I would not pursue this avenue for various reasons, all of which hark back to the sort of systems that Erlang's distribution mechanism was developed for - a set of boards on a passive backplane: "free" local bandwidth and the whole machine sits in the same security domain. The Erlang distribution protocol is probably too chatty to work well on widely spread and large networks, and it is certainly too insecure. Unless you want nodes to be able to execute :os.cmd("rm -rf /") on each other, of course.
Use the Erlang distribution protocol in your central system to your heart's content, and have these terminals talk something that's data-only-over-SSL to that system and each other. On top of that, you can quite simply build a sort of overlay network to do whatever you want.
I recommend read this carefully and i recommend divide your service to little Micro-Services too.
Another benchmark is Investigating the Scalability Limits of
Distributed Erlang.
In the Joe Armstorng's book programming Erlang, he said:
"A few years ago, when I had my research hat on, I was working with PlanetLab. I had access to the PlanetLab a network, so I installed empty Erlang servers on all the PlanetLab machines (about 450 of them).
I didn’t really know what I would do with the machines, so I just set up the server infrastructure to do something later."
Do not use External ports, use internal drivers which are written in C or C++ instead.
You will find a lot of information regarding erlang Architectures is this answer: How scalable is distributed Erlang?
Short answer is, there is a pratical limitation of nodes in a cluster, but this limitation can be breach with federations fairly easily.
EDIT 1/ Further more I would recommend to read this book : Designing for scalability with Erlang/OTP

Are TIPC and distributed erlang related?

I was browsing the wikipedia page on TIPC (http://en.wikipedia.org/wiki/TIPC) and noticed that the addressing scheme used is similar to Distributed Erlang's pid addresses, when you display it on a different node. Just wondering, since both efforts were developed at Ericsson. There is also some form of periodic ping (heartbeat) mechanism to tell whether the other process is up or not, similar to TIPC.
There is no direct relation, -at least no such relation was ever brought to my attention by the Erlang developers.
But knowing that they were once sitting at an adjacent floor to me in the same building, and there was some information sharing between us, I would be surprised if this is purely coincidental ;) Just for the record, the TIPC addressing scheme is several years older than Distributed Erlang.

What distributed process registries are available for Erlang?

I'd like to compile a reasonably complete list of distributed process registry libraries for Erlang.
Such libraries need to support basic operations like register_name(Pid, Name) and whereis_name(Name) (and ideally registered_names/0). Names shouldn't be restricted to atoms only, and these registration/lookup operations need to work reasonably reliably with multiple nodes participating in the registry (ignoring partitions for now).
So far I've come up with global, gproc and nprocreg. What others are available?
I would argue that riak_core is such. I use its partition ring + consistent hashing to find the node, together with a local gproc instance to find the exact process. Thus I get very fine balance between fault tolerance, availability and speed.
Locker is used in several projects at wooga for registering processes.
Riak PG is a "work in progress" alternative to pg2. The source code also serves as a nice example of how to use riak_core.
Regarding Riak PG:
It's mainly considered a "work in progress," because it was the result of some research that wasn't tested in production. Here's a link to the Erlang Workshop 13 paper.
If anyone has any questions regarding Core, or using PG in particular, I'd be happy to help out any way that I can.

How to push Erlang to my workplace

I think Erlang is very well suited for server systems developed in my workplace (currently developed in Java). I am a bit skeptical how this would be accepted both by developers (who have no idea about functional or Erlang) and by managers.
Any ideas on how to approach the issue? I am thinking about some hybrid system, where the hardcore highly reliable infra uses Elrang, and app specific stuff developed in Java (as nodes?)
There are a few approaches, and neither have any guarantees to actually work
Implement something substantial in a short time frame, perhaps using your own time. Don't tell anyone until you have something to display that works. Unless you have a colleague in on it.
Pull up lots of Erlang projects that are good demonstrations of the features you want. Present it to your managers and try to frame them about the risk in keeping using Java with this kind of technology available.
If the company you work for actually have a working code base in Java already, they're not likely to take you seriously when you suggest to rewrite it in another language.
The true test that you believe in Erlang being a much better choice: Quit and start up a competing company and bring the technology insight you have in your current industry. Your managers are really comparing a similar risk-scenario as you would do if you were to quit your job, and they are looking for the same assuring facts for success as you would do, to consider leaving a "safe" paycheck.
As for how to integrate, check out the jinterface application in Erlang. It allows Java code to send messages to Erlang nodes, and it allows Java to expose mailboxes to the Erlang nodes as if there were Erlang processes.
It's all about ROI (Return On Investment) to a manager: a manager will be concerned about performance (of the company). In order to appeal to his business nature, you'll have to make a case for it using dollar$ (or whatever appropriate currency).
Beware that undertaking a "skunkwork" project on the side to "prove" your solution based on Erlang might backfire: "so you had time to play with Erlang, why didn't you spend the time on the project then?" (Of course, not all managers/companies would think this way).
You have to take into account the whole proposal e.g. impact on the team, skills to be developed etc. It's all about money.
If I have an advice for you: start small, plant a seed, nurture it and watch it grow.
A wise man once said to me:
"It's not about technology, it's about
the product & market".
Start by not targetting a rewrite but using erlang for a new feature/project. Rewrites can be expensive and taking a chance on erlang for something that is already a time consuming and costly undertaking is a hard sell. But if there is a new piece that could be done in erlang and java, you stand a better chance. The project will be small enough hopefully that you can discover early if erlang is a good fit and adapt accordingly. And when erlang proves itself in that project you will have better data to make your case with.
We're introducing RabbitMQ into our infrastructure, which currently runs a combination of C++, Java and Python applications. I'm not specifically intending to move the team towards Erlang, but if I were, introducing a well-written third-party tool that just happens to use Erlang is a very good way to get the foot in the door.
One major caveat is that while Erlang is a wonderful language to learn, the surrounding technology (OTP in particular) has a huge learning curve and is extremely primitive in many ways (debugging, IDE's, etc.). It is getting better all the time, but reluctant converts will crucify you if you don't warn them about the pain of learning to program in a radically different environment. Even simple things like the lack of code-sense technology (E.g., type 'foo.' and the IDE tells you what methods you can call on foo) can leave a really bad taste in the mouth.

What is bootstrapping?

I keep seeing "bootstrapping" mentioned in discussions of application development. It seems both widespread and important, but I've yet to come across even a poor explanation of what bootstrapping actually is; rather, it seems as though everyone is just supposed to know what it means. I don't, though. Near as I can figure, it has something to do with initialization tasks required of an application upon launch, but I could be completely wrong about that. Can anyone help me to understand this idea?
"Bootstrapping" comes from the term "pulling yourself up by your own bootstraps." That much you can get from Wikipedia.
In computing, a bootstrap loader is the first piece of code that runs when a machine starts, and is responsible for loading the rest of the operating system. In modern computers it's stored in ROM, but I recall the bootstrap process on the PDP-11, where you would poke bits via the front-panel switches to load a particular disk segment into memory, and then run it. Needless to say, the bootstrap loader is normally pretty small.
"Bootstrapping" is also used as a term for building a system using itself -- or more correctly, a predecessor version. For example, ANTLR version 3 is written using a parser developed in ANTLR version 2.
An example of bootstrapping is in some web frameworks. You call index.php (the bootstrapper), and then it loads the frameworks helpers, models, configuration, and then loads the controller and passes off control to it.
As you can see, it's a simple file that starts a large process.
The term "bootstrapping" usually applies to a situation where a system depends on itself to start, sort of a chicken and egg problem.
For instance:
How do you compile a C compiler written in C?
How do you start an OS initialization process if you don't have the OS running yet?
How do you start a distributed (peer-to-peer) system where the clients depend on their currently known peers to find out about new peers in the system?
In that case, bootstrapping refers to a way of breaking the circular dependency, usually with the help of an external entity, e.g.
You can use another C compiler to compile (bootstrap) your own compiler, and then you can use it to recompile itself
You use a separate piece of code that sets up the initial process without depending on any functions provided by the OS
You use a hard-coded list of initial peers or a hard-coded tracker URL that supplies the peer list
etc.
See on the Wikipedia article on bootstrapping.
There is a section and links explaining what it means in Computing. It has four different uses in the field.
Here are some quotes, but for a more in depth explanation, and alternative meanings, consult the links above.
"...is a technique by which a simple computer program activates a more complicated system of programs."
"A different use of the term bootstrapping is to use a compiler to compile itself, by first writing a small part of a compiler of a new programming language in an existing language to compile more programs of the new compiler written in the new language."
In the context of application development, "bootstrapping" usually comes up when talking about modular and/or auto-updatable software.
Rather than the user downloading the entire app, including features he does not need, and re-downloading and manually updating it whenever there is an update, the user only downloads and starts a small "bootstrap" executable, which in turn downloads and installs those parts of the application that the user needs. Additionally, the bootstrap component is able to look for updates and install them each time it is started.
Alex, it's pretty much what your computer does when it boots up. ('Booting' a computer actually comes from the word bootstrapping)
Initially, the small program in your BIOS runs. That contains enough machine code to load and run a larger, more complex program.
That second program is probably something like NTLDR (in Windows) or LILO (in Linux), which then executes and is able to load, then run, the rest of the operating system.
For completeness, it is also a rather important (and relatively new) method in statistics that uses resampling / simulation to infer population properties from a sample. It has its own lengthy Wikipedia article on bootstrapping (statistics).
Boot strapping the dictionary meaning is to start up with minimum resources. In the Context of an OS the OS should be able to swiftly load once the Power On Self Test (POST) determines that its safe to wake up the CPU. The boot strap code will be run from the BIOS. BIOS is a small sized ROM. Generally it is a jump instruction to the set of instructions which will load the Operating system to the RAM. The destination of the Jump is the Boot sector in the Hard Disk. Once the bios program checks it is a valid Boot sector which contains the starting address of the stored OS, ie whether it is a valid MBR (Master Boot Record) or not. If its a valid MBR the OS will be copied to the memory (RAM)from there on the OS takes care of Memory and Process management.
As the question is answered. For web develoment.
I came so far and found a good explanation about bootsrapping in Laravel doc. Here is the link
In general, we mean registering things, including registering service
container bindings, event listeners, middleware, and even routes.
hope it will help someone who learning web application development.
Bootstrapping has yet another meaning in the context of reinforcement learning that may be useful to know for developers, in addition to its use in software development (most answers here, e.g. by kdgregory) and its use in statistics as discussed by Dirk Eddelbuettel.
From Sutton and Barto:
Widrow, Gupta, and Maitra (1973) modified the Least-Mean-Square (LMS)
algorithm of Widrow and Hoff (1960) to produce a reinforcement
learning rule that could learn from success and failure signals
instead of from training examples. They called this form of learning
“selective bootstrap adaptation” and described it as “learning with a
critic” instead of “learning with a teacher.” They analyzed this rule
and showed how it could learn to play blackjack. This was an isolated
foray into reinforcement learning by Widrow, whose contributions to
supervised learning were much more influential.
The book describes various reinforcement algorithms where the target value is based on a previous approximation as bootstrap methods:
Finally, we note
one last special property of DP [Dynamic Programming] methods. All of them update estimates
of the values of states based on estimates of the values of successor
states. That is, they update estimates on the basis of other
estimates. We call this general idea bootstrapping. Many reinforcement
learning methods perform bootstrapping, even those that do not
require, as DP requires, a complete and accurate model of the
environment.
Note that this differs from bootstrap aggregating and intelligence explosion that is mentioned on the wikipedia page on bootstrapping.
I belong to the generation who flipped switches to enter a boot program. In the early 1980s, I worked on a microcomputer called Micro-78, developed by Electronics Corporation of India Ltd (ECIL). It was a sort of clone of Altair 8800. I distinctly remember what happens when a small boot program was entered using the toggle switches and executed by pressing a button. The program reads a second boot program contained in the 1st track of the floppy disk and overwrites it on itself in such a way that the second boot program starts executing to load a disk operating system. I think the term "bootstrap" refers to this process of the first boot program reading and overwriting the second boot program on itself, in a way "pulling itself up" with the additional functionality of the second boot program. That may be the origin of the original meaning of "the bootstrap program".
IMHO there is not a better explanation than the fact about How was the first compiler written?
These days the Operating System loading is the most common process referred as Bootstrapping
In terms of it in regards to using the popular Twitter Bootstrap I feel like this type of bootstrapping is the action of integrating a modular component into a Web application without the Web application having to even acknowledge the modular component exists until it needs it or references it.
The developer can seamlessly integrate a default copy of the CSS Twitter Bootstrap theme by simply loading (referencing) it into the Web application. Vuola! Then you may need to override some of these changes, but you can do so in such a way that the resource/component is untouched and completely reusable.
This same concept is how Web Devs implement jQuery APIs and so on, but it's not really expressed by Devs as bootstrapping per se. What it does is it improves flexibility and reusability while allowing the isolation of different components/resources of an app to reside freely either on the same server/s or possibly on a CDN.
NOTE: In computing bootstrapping deals with the MBR and in UNIX it requires a special bootloader or manager which is a small program in ROM that loads the OS into RAM. If you think about it the same concept takes places in the action of the bootstrap loader checking the MBR and loading the OS based on this table which occurs without the OS having any idea that this takes place.
Bootstrap file is responsible for loading contents of main file. It is a wrapper around main file. This way we can catch errors if loading of file was unsuccessful for some reason.
As a humble beginner in the world of programming, and flicking through all the answers here after seeing this word used a lot in apparently slightly different ways in different places, I found reading the Wikipedia page on Bootstrapping (duh! I didn't think of it either at first) is very informative to understand differences in use of this word. Could it be......on extremely rare occasions......Wikipedia might even have better explanations of certain terms than....(redacted)? Will they bring in rep points on Wikipedia though?
To me, it seems all the meanings something to do with: start with something as simple as possible Thing1, make something slightly more complex with that Thing2, and now you can use Thing2 to do some kind of tasks more efficiently and quickly than you could originally with Thing1. Then repeat from Thing2 to Thing 3 ad infinitum...
I see it as closely connected to both biological evolution and 'Layers of Abstraction' (newbies like me see, ahem, Wikipedia, cough) - the evolution from 1940's computers with switches, machine code, Assembly, C, Python, AIs you can give all kinds of complex instructions to like "make the %4^% dinner to my default &^$% requirements and clean the floor you %$£"#:~" in drunken slang English or Amazon tribal dialect without them 'raising an exception' (for newbies again...you guessed it) - missed out lot of links there due to simple ignorance.
Then in certain specific software meanings:
Meaning1: Thing1 is used to load latest version of Thing2 (because of course Thing2 will be bigger than Thing1, just as Thing3 will be be bigger than Thing2).
Meaning2: Thing1 is a lower level language (closer to 1001011100....011001 than print("Hello, ", user.name)) used to write a little bit of the higher language of Thing2, then this little bit of Thing2 is used to expand Thing2 itself from baby vocabulary level towards adult vocabulary level (Thing2 starts to be processed, or to use correct technical term 'compiled', by the baby version of itself (it's a clever baby!), whereas the baby version of Thing2 itself could of course only be compiled by Thing1, cause it can't exist before it exists, right duh!), then child version of Thing2 compiles Surly Teenager version of Thing2, at which point programming community decides whether Surly Teenager's 'issues' (software term and metaphor term!) are worth spending enough time resolving to be accepted long term, or to abandon them to (not sure where to take the analogy here).
If yes, then Thing2 has 'Bootstrapped' itself (possibly a few times) from babyhood to adulthood: "the child is the father of the man" (Wordsworth, suggest don't try looking up the quote or the author on Stack Overflow).

Resources