I'm interested in porting my database engine from Java to Erlang.
Currently, the Java implementation depends on memory mapping for efficiency. For memory mapping in Erlang, the only thing I've found so far is emmap.
As far as I know, CouchDB does not depend on memory mapping. How does it keep up with efficiency? Does it store as much in memory as possible and flush it to disk as necessary?
One way is to use a LSM btree, as in
https://github.com/krestenkrab/hanoidb
Related
I was going through one of the presentation on spark memory management and wanted to know how to get a good graphical picture of executor memory usage (something similar to what was mentioned in presentation), to understand out of memory errors better. Also, what is the best way to analyze off-heap memory usage in spark executors? How to find the amount of off-heap memory usage as a function of time?
I looked into Ganglia but it gives node level metrics. I found it hard to understand executor level memory usage using node level metrics.
I've been thinking about a similar tool!
I think org.apache.spark.scheduler.SparkListener is the interface to all the low-level metrics in Apache Spark with onExecutorMetricsUpdate being the method to look at when developing a higher-level monitoring tool.
You could also monitor JVM using JMX interface, but it might be too low-level and definitely without the contextual information on how Spark uses the resources.
I was creating distributed systems in OOP languages using message passing libraries like MPI, ZepoMQ, RabbitMQ and so on. Now I found myself watching some erlang promotional material and understood that lots of things we emulate in OOP languages like C++ and C# using libraries (1 000 000 socket connections per process, distributed messaging and distributed process monitoring visualization) was there in Erlang for many years now. And it seemed reasonable to get to know the language better. I found myself asking one last question: are there any implementations\prototypes of Erlang alike VM that could run/spawn some processes not only on CPU but also on GPU?
Because that would definitely make Erlang (and its more readable for my OOP background dialects like Elixir) language of choice for most future projects.
GPU is fast only with sequential memory access. I hardly imagine garbage collection on GPU RAM. GPU is NOT a cool and parallel CPU. It requires more effort to write to. So most probably there is no Erlang compiler for GPU.
I doubt there's any implementation that can run Erlang processes on GPU but you can use two techniques to run computations on GPU under Erlang:
use C library through NIFs (native implemented functions) - see http://www.erlang.org/doc/man/erl_nif.html and an example of such an implementation: msantos/procket on Github (I'm sorry, I can't post the link due to low reputation :)
use native OS process and communicate with it through erlang "port" - see http://www.erlang.org/doc/reference_manual/ports.html
The first one is faster and the later is safer (NIFs can crash the whole VM).
This is not specific to GPU coputations. Erlang is not well suited for high performance number crunching - it's better to do it in C and manipulate the results in Erlang anyway. The communication between the C and Erlang should be implemented in the one of the two described manners.
I've just started reading Joe Armstrongs book on Erlang and listened to his excellent talk on Software Engineering Radio.
Its an interesting language/system and one whose time seems to have come around with the advent of multi-core machines.
My question is: what is there to stop it being ported to the JVM or CLR? I realise that both virtual machines aren't setup to run the lightweight processes that Erlang calls for - but couldn't these be simulated by threads? Could we see a lightweight or cutdown version of Erlang on a non Erlang VM?
You could not use JVM/CLR libraries, given their reliance on mutable objects.
Erlang exception handling is quite different from JVM and CLR exceptions, you would need to handle this somehow.
Implementing processes as threads would mean that any sizable Erlang system runs out of memory pretty fast (process size on my machine on creation: 1268 bytes, thread stack size in CLR: 1 MB) and communication between processes is much slower than in Erlang.
What you probably want is an Actor Model implementation on JVM or CLR.
Scala and Clojure have already been mentioned. In addition, there are many Actor implementations for JVM:
Kilim, Functional Java, Jetlang, Actors Guild, ActorFoundry, and at least one for CLR: Retlang, which can be used from any JVM/CLR language.
For educational reasons, we are implementing a subset of ErlangVM for CLR. We were highly inspired by Kresten Krab Thorup and his project Erjang, a JVM based Erlang VM. Erjang uses kilim framework for representing lightweight processes, and it starts to attract attention.
Javalimit - Erjang's author blog.
Erjang repository
This is a well trod-discussion. Some context might be useful.
From the Erlang mailing list last November:
The start of a long discussion thread
continuing here
and going a bit mental here
and ending on Joe's contribution.
My contribution to the debate about Erlang on the JVM? No, not a good idea :(
Nothing at all, actually. You might have a look at Clojure, which is an interesting functional language built on the JVM.
Axum -- an incubation project on the CLR -- was clearly inspired by Erlang.
Erjang is a virtual machine for Erlang, which runs on Java™.
I don't know of any technical problem inhiting this.
Actually Scala (a JVM functional language) uses what is called an Actor Model that is very similar to, and as I understand it borrows heavily from, the Erlang model of shared-nothing concurrency.
Threads could not simulate Erlang processes. They're much too heavy-weight.
Just for completeness additional source about topic.
Possible? Yes. Practical? Well, probably not; they solve different problems in very different ways, and thus have lots of major differences in the way they do things. This would make porting hard, and performance would likely suffer severely. That doesn't mean it can't be done, just that there are better ways to accomplish what such a port would bring to the table.
Erlang is getting a reputation for being untouchable at handling a large volume of messages and requests. I haven't had time to download and try to get inside Mr. Erlang's understanding of switching theory... so I'm wondering if someone can teach me (or point to a good instructional site.)
Say as a thought-experiment I wanted to port the Erlang ejabberd to a combination of Python and C, in a way that gave me the same speed and scalability. What structures or patterns would I have to understand and implement? (Does Python's Twisted already do this?)
How/why do functional languages (specifically Erlang) scale well? (for discussion of why)
http://erlang.org/course/course.html (for a tutorial chain)
As far as porting to other languages, a message passing system would be easy to do in most modern languages. Getting the functional style can be done in Python easily enough, although you wouldn't get the internal dispatching features of Erlang "for free". Stackless Python can replicate much of Erlang's concurrency features, although I can't speak to details as I haven't used it much. If does appear to be much more "explicit" (in that it requires you to define the concurrency in code in places that Erlang's design will allow concurrency to happen internally).
Erlang is not only about scalability but mostly about
reliability
soft real-time characteristics (enabled by soft real-time GC which is possible because immutability [no cycles] and share nothing and so)
performance in concurrent tasks (cheap task switch, cheap process spawn, actors model, ...)
scalability - debatable in current state , but rapidly evolving (about 32 cores well, it is better than most competitors but should be better in near future).
Another of the features of erlang that have an impact on scalability is the the lightweight cheap processes. Since processes have so little overhead erlang can spawn far more of them than most other languages. You get more bang for your buck with erlang processes than many other languages give you.
I think the best choice for Erlang is Network bound applications - makes communication much simpler between nodes and things like heartbeat monitoring, auto restart using supervisor are built into OTP.
Hypothetically, if I were to build the same app using a few popular/similar frameworks, say PHP(cakePHP|Zend), Django, and Rails, should the memory consumption of each be roughly the same?
Also, I'm sure many have evaluated or used each and would be interested in which you settled on and why?
Code with whatever framework you like best. Then pray your app is popular enough to cause memory problems. We should all be so lucky.
No, it will absolutely vary wildly from one framework to another.
That said, in most cases the memory footprint of the framework is not the determining factor in site performance nor in selection of a framework. It's usually more a matter of using the right tool for the job, since each framework has its own strengths and weaknesses.
It is hard to efficiently say, I would say that PHP frameworks will have mostly a similar footprint, which is typically less than other frameworks such as Rails and Django. But it depends what you include as rails, such as mongrel (rails server proxy). Overall it depends on your code as well however PHP will most of the time give an easier time on the server. (Without any language Bias, I use both PHP and Rails)
Just for getting some perspective let me report a real case memory consumption using a Smalltalk web framework AIDA/Web.
For running 40+ websites on a single Smalltalk image on a single server it currently consumes 330MB of memory.
The only one of those frameworks I have used is CakePHP. I found that it's not to bad footprint wise however it is a lot more heavy that normal PHP without using a framework obviously but can be a good trade off.
A good comparison of some of the most popular PHP frameworks can be found at http://www.avnetlabs.com/php/php-framework-comparison-benchmarks.
Memory is cheap these days. Go with what will make your development easiest (which is usually what your team knows best).
But... In my experience, Django isn't terribly memory hungry. I've run it on my shared host with less than 100 MB of RAM. But my experience is sheerly anecdotal. YMMV. If you go with Django, here are some tips to keep memory usage down.
EDIT: And don't go with zope if memory footprint is important to you.