For distributed applications, which to use, ASIO vs. MPI? - network-programming

I am a bit confused about this. If you're building a distributed application, which in some cases may perform parallel operations (although not necessarily mathematical), should you use ASIO or something like MPI? I take it MPI is a higher level than ASIO, but it's not clear where in the stack one would begin.

I know nothing about ASIO but from a quick Google it looks to me to be a lot lower level than MPI. For me the whole point of MPI is so that I can program against a higher level of abstraction from the messaging than, it seems, ASIO provides. Where you begin depends on your needs. For mine, parallelising scientific codes for high-performance, the obvious answer is MPI. I'm not sure I'd use it, or at least not sure it would be my default choice, if I were writing more general-purpose distributed, as opposed to parallel, applications. Well, actually, it probably would be my default choice to avoid learning another approach (most of which are less portable and less long-lived than MPI) but I'll admit it might not be the best choice if starting from an equal footing.

As far as I know MPI is currently incapable of handling the situation, when the new distributed nodes want to join the already started group. The problems also may occur if one of the nodes goes offline.
MPI does not reveal any network related machinery that is underneath. Thus if you would ever need something on the lower level -- you're in trouble. If you on the other hand do not aticipate such a need, then you'll save yourself a lot of time using MPI.

Related

Care to expound on this statement made on Erlang performance?

There's something else to keep in mind: while Erlang does some things very well, it's technically still possible to get the same results from other languages. The opposite is also true; evaluate each problem as it needs to be, and choose the right tool according to the problem being addressed. Erlang is no silver bullet and will be particularly bad at things like image and signal processing, operating system device drivers, etc. and will shine at things like large software for server use (i.e.: queues, map-reduce), doing some lifting coupled with other languages, higher-level protocol implementation
I'm learning Erlang and this link (http://learnyousomeerlang.com/introduction#kool-aid) got me curious of the reasoning of good vs bad applications for Erlang. Can anyone expound on this statement?
Why do Erlang excel at some of the aformentioned fields and not in the others?
while Erlang does some things very well, it's technically still possible to get the same results from other languages
Lets face it, really all programming languages can do more or less everything, and have ways to interface to C libraries to access anything they don't as such have a native library for.
The most obvious thing to point out is that all of Erlang boils down to C at the end of the day, and a little bit of assembler, but that's not really relevant to the point.
Thus it should be clear enough that anything you can write in Erlang could be written in C, and because you are eliminating a layer of abstraction and interpretation, if you do a reasonable job of it, it should be faster. Sometimes a little faster. Sometimes a lot faster.
Erlang is no silver bullet and will be particularly bad at things like image and signal processing, operating system device drivers, etc.
This is the arena of nitty gritty byte and bit shifting magic, and if you introduce an abstraction layer for every bit you shift... you can easily end up degrading the best possible achievable performance by multiple orders of magnitude.
and will shine at things like large software for server use (i.e.: queues, map-reduce), doing some lifting coupled with other languages, higher-level protocol implementation
This is the interesting bit. We've already established that if you write it in C, unless you do a sufficiently poor job of it, the result can only be better in terms of performance.
BUT performance isn't everything. In today’s world CPU and memory is cheap, but time to market is hugely important. A company might spend thousands on some extra hardware required to run your application because it's written in Erlang instead of C, but save (or make) millions because the product is first to market.
The fact is, if you match a given software problem to a high level language with the right paradigm, the average software engineer can often produce a given product many MANY times faster than if they had to write it in C.
Also, writing C is error prone, and provides vastly more scope for making mistakes and poor choices. That means a software engineer might write something in C badly enough that the equivalent Erlang, based on some very finely tuned mature clever C, if the Erlang itself is well through out, it might perform better!
evaluate each problem as it needs to be, and choose the right tool according to the problem being addressed
Erlang is a really great tool, generally, but it does suit some problem domains more than others. There are some problems which might just be better solved with perl for example, or C, python, etc. When it fits the problem domain, Erlang can be unbeatable, but if it's a bad fit, it's definitely best to consider something else.
Both Erlang and C are Turing complete (except for the lack of infinite memory) and thus both can be used to compute anything if you don't care about absolute performance or the amount of memory or other system resources used.
In systems with constrained memory (tinyDuino, et.al.), the language runtime footprint (and OS resources required to support that runtime) may be a differentiator. For problems where every multiply-accumulate per second counts (affects total cost in MegaWatt-days of power or microseconds of latency), any extra type or value checks, copies, or conversions, which might be implicit in the formal language definition, might incur an added performance cost in processor cycles, cache misses, or run-time memory management. A C program might be specified without much of the above overhead for certain types of applications. However, in applications which require such overhead for a robust solution, that performance advantage disappears as compared against the expected human cost of coding an equivalent (or more) robust solution.
Erlang is a good solution when you want to create:
Realtime Systems: They need predictable response time and Erlang preemptive scheduling and per process garbage collection features shine in it.
Distributed Systems: Erlang has out of box mechanisms for distribution and a standard protocol which is called Erlang Distributed Protocol.
Fault Tolerant Systems: The light-weight processes of Erlang which lets a process to crash without making other processes crash, and its mechanisms for processes to supervise and monitor each other is suitable for fault tolerant systems.
Concurrent Systems: Although writing a concurrent system in languages like C and Java is possible, it can be hard and error prone. But Erlang has internal primitives that makes it so easy to write a concurrent program.
Erlang is not a good choice when you need to write a program that has to do number crunching, image processing and such things because your Erlang codes runs above some layers of abstraction. However there are official mechanisms in Erlang for taking the advantage of C performance. Also Hipe (High Performance Erlang) project is worth considering.

llvm based code mutation for genetic programming?

for a study on genetic programming, I would like to implement an evolutionary system on basis of llvm and apply code-mutations (possibly on IR level).
I found llvm-mutate which is quite useful executing point mutations.
As far as I have understood, the instructions get count/numbered, one can then e.g. delete a numbered instruction.
However, introduction of new instructions seems to be possible as one of the availeable statements in the code.
Real mutation however would allow to insert any of the allowed IR instructions, irrespective of it beeing used in the code to be mutated.
In addition, it should be possible to insert library function calls of linked libraries (not used in the current code, but possibly available, because the lib has been linked in clang).
Did I overlook this in the llvm-mutate or is it really not possible so far?
Are there any projects trying to /already have implement(ed) such mutations for llvm?
llvm has lots of code analysis tools which should allow the implementation of the afore mentioned approach. llvm is huge, so I'm a bit disoriented. Any hints which tools could be helpful (e.g. getting a list of available library functions etc.)?
Thanks
Alex
Very interesting question. I have been intrigued by the possibility of doing binary-level genetic programming for a while. With respect to what you ask:
It is apparent from their documentation that LLVM-mutate can't do what you are asking. However, I think it is wise for it not to. My reasoning is that any machine-language genetic program would inevitably face the "Halting Problem", e.g. it would be impossible to know if a randomly generated instruction would completely crash the whole computer (for example, by assigning a value to a OS-reserved pointer), or it might run forever and take all of your CPU cycles. Turing's theorem tells us that it is impossible to know in advance if a given program would do that. Mind you, LLVM-mutate can cause for a perfectly harmless program to still crash or run forever, but I think their approach makes it less likely by only taking existing instructions.
However, such a thing as "impossibility" only deters scientists, not engineers :-)...
What I have been thinking is this: In nature, real mutations work a lot more like LLVM-mutate that like what we do in normal Genetic Programming. In other words, they simply swap letters out of a very limited set (A,T,C,G) and every possible variation comes out of this. We could have a program or set of programs with an initial set of instructions, plus a set of "possible functions" either linked or defined in the program. Most of these functions would not be actually used, but they will be there to provide "raw DNA" for mutations, just like in our DNA. This set of functions would have the complete (or semi-complete) set of possible functions for a problem space. Then, we simply use basic operations like the ones in LLVM-mutate.
Some possible problems though:
Given the amount of possible variability, the only way to have
acceptable execution times would be to have massive amounts of
computing power. Possibly achievable in the Cloud or with GPUs.
You would still have to contend with Mr. Turing's Halting Problem.
However I think this could be resolved by running the solutions in a
"Sandbox" that doesn't take you down if the solution blows up:
Something like a single-use virtual machine or a Docker-like
container, with a time limitation (to get out of infinite loops). A
solution that crashes or times out would get the worst possible
fitness, so that the programs would tend to diverge away from those
paths.
As to why do this at all, I can see a number of interesting applications: Self-healing programs, programs that self-optimize for an specific environment, program "vaccination" against vulnerabilities, mutating viruses, quality assurance, etc.
I think there's a potential open source project here. It would be insane, dangerous and a time-sucking vortex: Just my kind of project. Count me in if someone doing it.

How to push Erlang to my workplace

I think Erlang is very well suited for server systems developed in my workplace (currently developed in Java). I am a bit skeptical how this would be accepted both by developers (who have no idea about functional or Erlang) and by managers.
Any ideas on how to approach the issue? I am thinking about some hybrid system, where the hardcore highly reliable infra uses Elrang, and app specific stuff developed in Java (as nodes?)
There are a few approaches, and neither have any guarantees to actually work
Implement something substantial in a short time frame, perhaps using your own time. Don't tell anyone until you have something to display that works. Unless you have a colleague in on it.
Pull up lots of Erlang projects that are good demonstrations of the features you want. Present it to your managers and try to frame them about the risk in keeping using Java with this kind of technology available.
If the company you work for actually have a working code base in Java already, they're not likely to take you seriously when you suggest to rewrite it in another language.
The true test that you believe in Erlang being a much better choice: Quit and start up a competing company and bring the technology insight you have in your current industry. Your managers are really comparing a similar risk-scenario as you would do if you were to quit your job, and they are looking for the same assuring facts for success as you would do, to consider leaving a "safe" paycheck.
As for how to integrate, check out the jinterface application in Erlang. It allows Java code to send messages to Erlang nodes, and it allows Java to expose mailboxes to the Erlang nodes as if there were Erlang processes.
It's all about ROI (Return On Investment) to a manager: a manager will be concerned about performance (of the company). In order to appeal to his business nature, you'll have to make a case for it using dollar$ (or whatever appropriate currency).
Beware that undertaking a "skunkwork" project on the side to "prove" your solution based on Erlang might backfire: "so you had time to play with Erlang, why didn't you spend the time on the project then?" (Of course, not all managers/companies would think this way).
You have to take into account the whole proposal e.g. impact on the team, skills to be developed etc. It's all about money.
If I have an advice for you: start small, plant a seed, nurture it and watch it grow.
A wise man once said to me:
"It's not about technology, it's about
the product & market".
Start by not targetting a rewrite but using erlang for a new feature/project. Rewrites can be expensive and taking a chance on erlang for something that is already a time consuming and costly undertaking is a hard sell. But if there is a new piece that could be done in erlang and java, you stand a better chance. The project will be small enough hopefully that you can discover early if erlang is a good fit and adapt accordingly. And when erlang proves itself in that project you will have better data to make your case with.
We're introducing RabbitMQ into our infrastructure, which currently runs a combination of C++, Java and Python applications. I'm not specifically intending to move the team towards Erlang, but if I were, introducing a well-written third-party tool that just happens to use Erlang is a very good way to get the foot in the door.
One major caveat is that while Erlang is a wonderful language to learn, the surrounding technology (OTP in particular) has a huge learning curve and is extremely primitive in many ways (debugging, IDE's, etc.). It is getting better all the time, but reluctant converts will crucify you if you don't warn them about the pain of learning to program in a radically different environment. Even simple things like the lack of code-sense technology (E.g., type 'foo.' and the IDE tells you what methods you can call on foo) can leave a really bad taste in the mouth.

What's the quickest way to parallelize code?

I have an image processing routine that I believe could be made very parallel very quickly. Each pixel needs to have roughly 2k operations done on it in a way that doesn't depend on the operations done on neighbors, so splitting the work up into different units is fairly straightforward.
My question is, what's the best way to approach this change such that I get the quickest speedup bang-for-the-buck?
Ideally, the library/approach I'm looking for should meet these criteria:
Still be around in 5 years. Something like CUDA or ATI's variant may get replaced with a less hardware-specific solution in the not-too-distant future, so I'd like something a bit more robust to time. If my impression of CUDA is wrong, I welcome the correction.
Be fast to implement. I've already written this code and it works in a serial mode, albeit very slowly. Ideally, I'd just take my code and recompile it to be parallel, but I think that that might be a fantasy. If I just rewrite it using a different paradigm (ie, as shaders or something), then that would be fine too.
Not require too much knowledge of the hardware. I'd like to be able to not have to specify the number of threads or operational units, but rather to have something automatically figure all of that out for me based on the machine being used.
Be runnable on cheap hardware. That may mean a $150 graphics card, or whatever.
Be runnable on Windows. Something like GCD might be the right call, but the customer base I'm targeting won't switch to Mac or Linux any time soon. Note that this does make the response to the question a bit different than to this other question.
What libraries/approaches/languages should I be looking at? I've looked at things like OpenMP, CUDA, GCD, and so forth, but I'm wondering if there are other things I'm missing.
I'm leaning right now to something like shaders and opengl 2.0, but that may not be the right call, since I'm not sure how many memory accesses I can get that way-- those 2k operations require accessing all the neighboring pixels in a lot of ways.
Easiest way is probably to divide your picture into the number of parts that you can process in parallel (4, 8, 16, depending on cores). Then just run a different process for each part.
In terms of doing this specifically, take a look at OpenCL. It will hopefully be around for longer since it's not vendor specific and both NVidia and ATI want to support it.
In general, since you don't need to share too much data, the process if really pretty straightforward.
I would also recommend Threading Building Blocks. We use this with the Intel® Integrated Performance Primitives for the image analysis at the company I work for.
Threading Building Blocks(TBB) is similar to both OpenMP and Cilk. And it uses OpenMP to do the multithreading, it is just wrapped in a simpler interface. With it you don't have to worry about how many threads to make, you just define tasks. It will split the tasks, if it can, to keep everything busy and it does the load balancing for you.
Intel Integrated Performance Primitives(Ipp) has optimized libraries for vision. Most of which are multithreaded. For the functions we need that aren't in the IPP we thread them using TBB.
Using these, we obtain the best result when we use the IPP method for creating the images. What it does is it pads each row so that any given cache line is entirely contained in one row. Then we don't divvy up a row in the image across threads. That way we don't have false sharing from two threads trying to write to the same cache line.
Have you seen Intel's (Open Source) Threading Building Blocks?
I haven't used it, but take a look at Cilk. One of the big wigs on their team is Charles E. Leiserson; he is the "L" in CLRS, the most widely/respected used Algorithms book on the planet.
I think it caters well to your requirements.
From my brief readings, all you have to do is "tag" your existing code and then run it thru their compiler which will automatically/seamlessly parallelize the code. This is their big selling point, so you dont need to start from scratch with parallelism in mind, unlike other options (like OpenMP).
If you already have a working serial code in one of C, C++ or Fortran, you should give serious consideration to OpenMP. One of its big advantages over a lot of other parallelisation libraries / languages / systems / whatever, is that you can parallelise a loop at a time which means that you can get useful speed-up without having to re-write or, worse, re-design, your program.
In terms of your requirements:
OpenMP is much used in high-performance computing, there's a lot of 'weight' behind it and an active development community -- www.openmp.org.
Fast enough to implement if you're lucky enough to have chosen C, C++ or Fortran.
OpenMP implements a shared-memory approach to parallel computing, so a big plus in the 'don't need to understand hardware' argument. You can leave the program to figure out how many processors it has at run time, then distribute the computation across whatever is available, another plus.
Runs on the hardware you already have, no need for expensive, or cheap, additional graphics cards.
Yep, there are implementations for Windows systems.
Of course, if you were unwise enough to have not chosen C, C++ or Fortran in the beginning a lot of this advice will only apply after you have re-written it into one of those languages !
Regards
Mark

What is Erlang's secret to scalability?

Erlang is getting a reputation for being untouchable at handling a large volume of messages and requests. I haven't had time to download and try to get inside Mr. Erlang's understanding of switching theory... so I'm wondering if someone can teach me (or point to a good instructional site.)
Say as a thought-experiment I wanted to port the Erlang ejabberd to a combination of Python and C, in a way that gave me the same speed and scalability. What structures or patterns would I have to understand and implement? (Does Python's Twisted already do this?)
How/why do functional languages (specifically Erlang) scale well? (for discussion of why)
http://erlang.org/course/course.html (for a tutorial chain)
As far as porting to other languages, a message passing system would be easy to do in most modern languages. Getting the functional style can be done in Python easily enough, although you wouldn't get the internal dispatching features of Erlang "for free". Stackless Python can replicate much of Erlang's concurrency features, although I can't speak to details as I haven't used it much. If does appear to be much more "explicit" (in that it requires you to define the concurrency in code in places that Erlang's design will allow concurrency to happen internally).
Erlang is not only about scalability but mostly about
reliability
soft real-time characteristics (enabled by soft real-time GC which is possible because immutability [no cycles] and share nothing and so)
performance in concurrent tasks (cheap task switch, cheap process spawn, actors model, ...)
scalability - debatable in current state , but rapidly evolving (about 32 cores well, it is better than most competitors but should be better in near future).
Another of the features of erlang that have an impact on scalability is the the lightweight cheap processes. Since processes have so little overhead erlang can spawn far more of them than most other languages. You get more bang for your buck with erlang processes than many other languages give you.
I think the best choice for Erlang is Network bound applications - makes communication much simpler between nodes and things like heartbeat monitoring, auto restart using supervisor are built into OTP.

Resources