Python for New Distributed Computing Project? - erlang

I need to write a compute-intensive simulation program. I tried writing a multi-threaded version of this program, but it's taking too much time. Now I plan to expand to multiple nodes (probably via Amazon EC2 nodes).
I'm already familiar with Python. Is Python outfitted with some parallel module a viable option if I care about speed, or would I be better off going to some other framework/language like Erlang?
Can you even write a simulation program in Erlang?
The project is more about dividing up computation rather than dividing up a dataset, so I didn't consider frameworks based on map reduce

dispy is a framework for distributed computing with Python. It uses asyncoro, a framework for asynchronous, concurrent programming using coroutines, with some features of erlang (broadly speaking). Disclaimer: I am the author of both these frameworks.

If you are familiar with python already I would recommend you keep simulation in python (and speed up critical parts in C) and use Erlang for managing it. Writing simulation in Erlang would be a far away out of your comfort zone (even personally I would do it). You probably can reuse parts of Erlang projects as Disco project or Riak core. Start your project with some sub-optimal POC and tune it in iterations. It means start with python, embed it to Erlang (probably Disco) and then move bits around until you are not happy with performance and features. You can end up with anything including pure Erlang solution or emended Python in BEAM using NIF or anything else which satisfy your needs.

Does your problem parallelize trivially? Then you may want to take a look at Elastic Map Reduce instead of EC2.

Related

Choosing an AST to develop a static code analyzer for Elixir? Core Erlang or Expanded Elixir AST?

We are hoping to develop a static code analyser for Elixir in order to detect concurrency issues (mainly deadlock and race conditions). We have some basic idea about the structure for the analyser, but our problem is which AST would be better suited for this task. As we have understood the Elixir compilation process creates an Expanded Elixir AST, Abstract Erlang Format and Core Erlang.
My question is out of these whether the Expanded Elixir AST or Core Erlang would be better for creating the Call Graph and Control flow graph. And if we use Core Erlang is it possible to work our way backwards from Core Erlang to find the source within the Elixir code for the issues identified by the analyser?
If anyone has some idea about this, your help would be really appreciated. :)
If the purpose is educational I would probably go with erlang. The compiling of erlang to beam will probably be a little more straightforward (if not a lot more) and as you develop your tool, you will probably find more resources/documentation on erlang ASTs. It's lower level than what most people do and you'll find more answers in the erlang community (probably won't be true in a few years). Overall, your tool would be simpler in erlang, with less moving parts.
I also found that project : https://github.com/rrrene/credo
More specifically, I think it will be practically difficult to work you way back to elixir code. From your perspective, after having implemented your tool, it will make sense. But that probably would not be the case for your average elixir developer. The more elixir matures, the more it tends to depart from core erlang concepts. It builds a lot of features on top of erlang, to the point where you can build entire web applications without knowing a single bit of erlang. Probably not the best approach but the fact that it's possible tells you how large the gap between the two languages can become.

Are there any implementations\prototypes of Erlang alike VM that could run not only on CPU but also on gpu?

I was creating distributed systems in OOP languages using message passing libraries like MPI, ZepoMQ, RabbitMQ and so on. Now I found myself watching some erlang promotional material and understood that lots of things we emulate in OOP languages like C++ and C# using libraries (1 000 000 socket connections per process, distributed messaging and distributed process monitoring visualization) was there in Erlang for many years now. And it seemed reasonable to get to know the language better. I found myself asking one last question: are there any implementations\prototypes of Erlang alike VM that could run/spawn some processes not only on CPU but also on GPU?
Because that would definitely make Erlang (and its more readable for my OOP background dialects like Elixir) language of choice for most future projects.
GPU is fast only with sequential memory access. I hardly imagine garbage collection on GPU RAM. GPU is NOT a cool and parallel CPU. It requires more effort to write to. So most probably there is no Erlang compiler for GPU.
I doubt there's any implementation that can run Erlang processes on GPU but you can use two techniques to run computations on GPU under Erlang:
use C library through NIFs (native implemented functions) - see http://www.erlang.org/doc/man/erl_nif.html and an example of such an implementation: msantos/procket on Github (I'm sorry, I can't post the link due to low reputation :)
use native OS process and communicate with it through erlang "port" - see http://www.erlang.org/doc/reference_manual/ports.html
The first one is faster and the later is safer (NIFs can crash the whole VM).
This is not specific to GPU coputations. Erlang is not well suited for high performance number crunching - it's better to do it in C and manipulate the results in Erlang anyway. The communication between the C and Erlang should be implemented in the one of the two described manners.

"Erlang" vs "zeromq+any language" for Embedded Applications

I want to write actor style code for embedded processors and I am trying to decide between writing everything in Erlang vs writing everything in zeromq+any language. Using zeromq looks to be very powerful in the sense that I can use any programming language and make my development a lot easier(many available libs) but then I am not sure if there is any gotcha in this power? I understand that Erlang represent actor model much better especially with OTP concepts but then it seems easy to represent similar actor model with zeromq? Am I looking at this correctly?
1.What do I really lose not using Erlang for embedded applications (where distributed processing, a power point of Erlang, is NOT required) and just build things on top a generic messaging framework like zeromq?
2.Is Erlang offering more than a coordinated messaging framework for a non-distributed embedded application?
3.What specific capabilities of Erlang could took too long to implement with zeromq?
You're comparing apples and oranges. Part of the advantage of using Erlang is the language; if you're going to put it up against zmq + some other language, the other language in that comparison really matters. zmq + ARM assembly? Erlang brings all the wonderful advantages of not hand-coding ASM.
As for what else Erlang brings to the table, Embedded Erlang? Absolutely argues that Erlang has advantages in fault tolerance, hot code loading, rapid development by leveraging Erlang and OTP, easy interaction with C libraries, and simple debugging by live REPL and copy-paste of terms.
Some of those things, such as hot reload, on-device REPL, and established libraries, will definitely take some real hacking to reproduce from the ground up.
My point would be that you will have to work very hard to get the same kind of error handling in Zmq. Erlang has some really nice built-in error handling when things begins to go bad. There has been considerable time spent in Erlang optimizing that part and making it robust.
Zmq on the other hand, is probably faster in some combinations with some languages when you make simple benchmarks. There is less overhead, so it may process messages faster than what Erlang can provide.
But chances are that you will end up re-implementing large parts of Erlang in the language of your choice. And you will probably not do a job as good as 6-10 developers working on Erlang/OTP for 15 years.
On the other hand, Erlang is not a simple language to learn. There is way more to it than just learning how to program in a functional style. Especially the concurrency patterns and failure handling can take some time getting used to.
ZeroMQ =/= Erlang covers many differences. The claim there is that ZeroMQ only provides the messaging aspect, not the light-weight processes, process monitoring and other aspects.

How to push Erlang to my workplace

I think Erlang is very well suited for server systems developed in my workplace (currently developed in Java). I am a bit skeptical how this would be accepted both by developers (who have no idea about functional or Erlang) and by managers.
Any ideas on how to approach the issue? I am thinking about some hybrid system, where the hardcore highly reliable infra uses Elrang, and app specific stuff developed in Java (as nodes?)
There are a few approaches, and neither have any guarantees to actually work
Implement something substantial in a short time frame, perhaps using your own time. Don't tell anyone until you have something to display that works. Unless you have a colleague in on it.
Pull up lots of Erlang projects that are good demonstrations of the features you want. Present it to your managers and try to frame them about the risk in keeping using Java with this kind of technology available.
If the company you work for actually have a working code base in Java already, they're not likely to take you seriously when you suggest to rewrite it in another language.
The true test that you believe in Erlang being a much better choice: Quit and start up a competing company and bring the technology insight you have in your current industry. Your managers are really comparing a similar risk-scenario as you would do if you were to quit your job, and they are looking for the same assuring facts for success as you would do, to consider leaving a "safe" paycheck.
As for how to integrate, check out the jinterface application in Erlang. It allows Java code to send messages to Erlang nodes, and it allows Java to expose mailboxes to the Erlang nodes as if there were Erlang processes.
It's all about ROI (Return On Investment) to a manager: a manager will be concerned about performance (of the company). In order to appeal to his business nature, you'll have to make a case for it using dollar$ (or whatever appropriate currency).
Beware that undertaking a "skunkwork" project on the side to "prove" your solution based on Erlang might backfire: "so you had time to play with Erlang, why didn't you spend the time on the project then?" (Of course, not all managers/companies would think this way).
You have to take into account the whole proposal e.g. impact on the team, skills to be developed etc. It's all about money.
If I have an advice for you: start small, plant a seed, nurture it and watch it grow.
A wise man once said to me:
"It's not about technology, it's about
the product & market".
Start by not targetting a rewrite but using erlang for a new feature/project. Rewrites can be expensive and taking a chance on erlang for something that is already a time consuming and costly undertaking is a hard sell. But if there is a new piece that could be done in erlang and java, you stand a better chance. The project will be small enough hopefully that you can discover early if erlang is a good fit and adapt accordingly. And when erlang proves itself in that project you will have better data to make your case with.
We're introducing RabbitMQ into our infrastructure, which currently runs a combination of C++, Java and Python applications. I'm not specifically intending to move the team towards Erlang, but if I were, introducing a well-written third-party tool that just happens to use Erlang is a very good way to get the foot in the door.
One major caveat is that while Erlang is a wonderful language to learn, the surrounding technology (OTP in particular) has a huge learning curve and is extremely primitive in many ways (debugging, IDE's, etc.). It is getting better all the time, but reluctant converts will crucify you if you don't warn them about the pain of learning to program in a radically different environment. Even simple things like the lack of code-sense technology (E.g., type 'foo.' and the IDE tells you what methods you can call on foo) can leave a really bad taste in the mouth.

Why do so many insist on dragging the JVM into new applications?

For example, I'm running into developers and architects who are scared to death of Rails apps, but love the idea of writing new Grails apps.
From what I've seen, there is a LOT of resource overhead that goes into using the JVM to support languages such as Groovy, JRuby and Jython instead of straight Ruby or Python.
Ruby and Python can both be interpreted on just about any OS, so I don't see any "write once run anywhere" advantage... why bring the hulking JVM along with you?
Java is a much, much more mature platform, with a lot of existing class libraries that could be "dropped in" and used, than, say, Ruby or Python (or even Perl, for that matter). So for people who like using existing code, rather than writing everything themselves, Java is a huge win.
For example, recently I've been looking for something like JAXB for Python or Ruby. In the end, I ended up using JRuby just because I haven't found any mature, widely-used XML-binding libraries.
The huge advantage of writing code (in any language) for the JVM is that it's usually very easy to tap into the enormous wealth of mature Java libraries out there, if necessary.
And I don't know where you got this idea of a "hulking" JVM with a huge resource overhead. The JIT tends to produce code that is quite fast, and the core JVM is anything but huge by today's standards. It does tend to have a huge memory footprint when running, but that's because modern machines have a lot of RAM and the GC works best when it has a lot of RAM to play with. If desired, the GC can be fine-tuned to hell and back to be more conservative.
As someone else put it: "The best thing about Groovy is that I don't have to use Java. The second best thing about Groovy is that I can use Java".
An assumption that seems to be built into the question is that new projects are greenfield projects. Many organizations have made a huge investment in Java over the last decade+ and require any new project to work within the existing (internal) code ecosystem. As pointed out, there's a huge bonus in all the publicly available Java libraries (whether free/OSS or commercial), but the need to work with existing code and even as a component within an existing system is at least as important (if not more so) to large organizations.
A lot also comes down to the maturity and capability of the platform, which is to say the JVM and everything that comes with it (the entire Java ecosystem). A few examples off the top of my head:
You can plug a remote debugger into a running JVM and get all kinds of information about a running application that is simply impossible with Python, Ruby, etc. Going a step further, there's JMX, a standard way to write code so that objects can be monitored and even tweaked in a live application. Take a look at JConsole and see if you don't drool just a little (despite the ugliness of the interface).
Going even further in this direction, there's OSGi, a standard for writing highly modular code that can be deployed, started, stopped, and even upgraded in a live application. With OSGi you break a large application into many smaller "bundles" which can then be maintained (deployed, started/stopped, upgraded) separately. This is a really big deal in large applications, or any applications that need to remain running at all times.
The platform has very good support for asynchronous, reliable messaging. You get JMS as a baseline, and many excellent and powerful libraries built on it for doing complicated things with very little code (cf. Apache Camel, ServiceMix, Mule, and many others). This is another feature that's extremely valuable in larger applications or those which must run within a larger code universe.
The JVM has real (OS-level) threading, while Python et al. are very limited in this regard (notoriously so). (That being said, shared state concurrency -- threading -- is the wrong approach; cf. Erlang, Alice, Mozart/Oz, etc.)
There are numerous JVM choices beyond the standard Sun implementations, like JRockit, IBM's JVM, etc. This is a developing area with other languages -- Python has Jython, Iron Python, even PyPy and Stackless; Ruby has JRuby, Rubinius, and others -- but as good as these are they can't match the maturity found in the various JVM offerings.
All that being said, I really don't like Java the language and avoid it as much as possible. These days with all the excellent alternative languages for the JVM I don't have to. Groovy gets my vote for its accessibility and tight integration with the platform (and even the language), and because of Grails, which I sometimes like to call "Rails for grownups". I like other JVM languages better, particularly Clojure and Scala, but these aren't as accessible to the average programmer. Scala is popping up a lot lately, though, especially thanks to its high profile use at Twitter, so there's hope for interesting and truly excellent languages making it in larger environments. But that's another topic.
why bring to hulking JVM along with you?
JVM isn't bloated, nor is it slow. on the contrary, it's a lean, fast, deeply optimized VM. Unfortunately, it's optimized for static OOP languages.
Still, good compilers targeting JVM do create good performing programs. I don't know about JRuby; but Jython's goal is to be all-around faster than regular C Python, and they're getting close (it's already faster at several important use cases).
Remember that a good JIT (like those for JVM) can apply some optimizations unavailable on static C compilers, getting faster code from them isn't a pipe dream. Of course, a VM optimized for your language should be faster than a 'not-really-generic' VM like JVM; but there's the maturity issue: JVM has a lot of work done there, while JITs for Ruby and Python aren't anywhere near.
Unfortunately, there doesn't seem to be any better generic bytecode VM. Microsoft's CLI suffers from similar limitations as JVM (ironPython is much slower and heavier than JPython). The best candidate seems to be LLVM. Does anybody know why isn't there more dynamic languages over LLVM? I've seen a couple of Scheme compilers, but seem to have several problems.
Groovy is NOT an interpreted language, it is a dynamic language. The groovy compiler produces JVM bytecode that runs inside the JVM just like any other java class. In this sense, groovy is just like java, simply adding syntax to the java language that is meaningful only to developers and not to the JVM.
Developer productivity, ease and flexibility of syntax make groovy attractive to the java ecosystem - ruby or python would be as attractive if they resulted in java bytecode (see jython).
Java developers are not really scared of ruby; as a matter of fact many quickly embrace groovy or jython both close to ruby and python. What they don't care about is leaving such an amazing platform (java) for a less performant, less scalable even less used language such as ruby (for all its merits).
The big knock on RoR is that it isn't scalable and hard to deploy. By using the Java platform, you can leverage your existing infrastructure.
grails war
Produces a war file that is easily deployed on Glassfish, Jboss, etc.
Ruby and Python can both be
interpreted on just about any OS, so I
don't see any "write once run
anywhere" advantage... why bring the
hulking JVM along with you?
Mostly because you want to take advantage of the HUGE existing ecosystem of Java libraries, APIs and products, which dwarfs anything available for Ruby or Python, especially in the enterprise domain.
Also, keep in mind that JRuby and Jython are faster in a lot of benchmarks than the regular (C implementations) of the languages, especially Ruby (even Ruby 1.9).
Having multiple languages targeting the same virtual machine has a lot of benefits, such as leveraging a common infrastructure, code reuse, shared APIs, the ability to use whatever language is conceptually best for you, or for a specific problem domain, etc.
The same things happens in the .NET space, with multiple languages targeting the CLR. The Parrot (vaporware) VM project also aims to the same thing, and it's a stated goal of the LLVM project too.
The reason is Hotspot.
It is an engineering tour de force.
the other reason not many mentioned is existing infrastructure related to jvm - if you already have a server running java stuff, why not use it instead of bringing in yet another platform (like rails)?
I've encountered this and also been baffled by it, and here's my theory.
Enterprise software is full of Java programmers. Like programmers of all stripes, many Java programmers are convinced that their language is the fastest, the most flexible and the easiest to use--they're not too familiar with other languages but are convinced that those who practice them must be savages and barbarians, because any enlightened person would, of course, use Java.
These people have built vast, complicated Java infrastructures: rube-goldberg machines of frameworks and auto-generated code full of byzantine inheritance structures and very, very large XML files.
So, when someone comes along and says "Hey! Let's use a C interpreted language! It's fast and has neat libraries and is much quicker for scripting and prototyping!" The Java guy is firstly like "I have to run a make file to configure this? QUEL HORREUR!" Then the reality of having to deploy and host this on servers that are running dated OSes and dated versions of Tomcat and nothing else starts to set in.
"Hey, I know! There's a java version of this interpreted language! It may break down in the fast lane on the bridge in rush-hour, and it sometimes catches on fire, but I can get Tomcat to run it. I don't have to dirty my hands with learning non-java stuff, and I can shoehorn it into the existing infrastructure! Win!"
So, is this the "right" reason for choosing a java implementation of a scripting language? Probably not. Depends on your definition of "right". But, I suspect that it's the reason they're chosen more often than snobs like me would like to believe.

Resources