0MQ with green threads? - erlang

I've grown to like erlang, and it's a great (cough) architectural fit to my problem. Meanwhile I still like to imagine that I can kludge erlang processes & asynchronous message passing in python (I am currently in therapy to rid myself of this obsession).
During a recent binge I came across 0MQ & I like its messaging features. These may be self-evident to an erlang/OTP expert, but I'm just a humble python programmer (my shrink will no doubt get to read this clever argument). The 0MQ user-guide states that it uses native OS threads, and not virtual "green" threads.
Is there a way to make 0MQ work with say eventlet/gevent?
Or, should I avoid the green-eyed monster and stick to a single Python app thread, with non-blocking I/O handled by 0MQ's message queuing & its own (skilled) use of native threads?
Or, check out of rehab & go back to erlang?

Responding to a stale thread because I am kind of in the same boat. Thought I would share my thoughts.
1: It looks like all the heavy lifting has already been done: https://github.com/traviscline/gevent-zeromq has integrated the gevent loop with a nonblocking zmq socket and even some Cpython speedups. It also seems to be (at the time of this writing), reasonably well maintainted.
2: It depends; if you are writing something that can use zmq without a ton of external event logic, then you should just use zmq. If OTOH you need to integrate with other protocols, you may want to use gevent (or twisted perhaps, although it has no workable zmq now at all). My projects generally require multiple protocols (ie: private queue manager, public http, public https, private memcache, etc), so I am investigating switching to gevent for quicker project turnaround than my current favorite: twisted.
3: You may want to skip zmq entirely and integrate with an existing erlang based solution like rabbitMQ; the performance advantages of zmq may not be as important as you think, and then you have an erlang message queue that easily integrates with python with existing libraries.
Also see: Messsage Queue comparison at second life wiki

Zero MQ now works with Eventlet:
https://lists.secondlife.com/pipermail/eventletdev/2010-October/000907.html

Related

What is the best way of doing computationally intensive tasks in Erlang w/o scalability sacrifices?

Erlang Interoperability guide discusses different interoperability mechanisms. Here are my conclusions:
Ports and Erl_Interface programs: OS scheduled, limit scalability.
Port Drivers: dangerous because a crash in the port driver brings the
emulator down too.
C Nodes: Node server needs to scale as well as Erlang app to avoid
scalability sacrifices.
NIFs: Loic sums
them up well.
Some advocate the use of OpenCL basically delegating resource hungry computations to GPU while letting the Erlang emulator to own the CPU. This sounds fantastic but then you have a requirement on your servers having a suitable GPU.
Using JInterface and communicating with a Java process that spawns a thread for every request might be an option.
So has anyone come across a solution that has been tested in practise and turned out to work well?
Actually all solutions take place. As I've been working tightly with some of them I could say the following:
Ports are safe but port communication is slow. If port crashes, VM continues working. If you do not communicate with your port extensively or you do not trust the port - this is your choice
NIFs are extremely fast. If your data flow is great you should use them. Of course they are unsafe so you have to program NIF library carefully and you'd better learn some C (the point that most of NIF creators skip). Actually scheduling problems are easily overcome with the specific pattern. You should start the new C thread that does actual job just after receiving data from Erlang and detach processing from Erlang thread. So you quit NIF function very quickly returning back in Erlang and waiting for a message from C code.
Java Nodes or C nodes are for tasks that can be moved to the node completely. That are some long and heavy jobs.
Bearing in mind above considerations you decide the way that fits your task best.

NIF to wrap my multi-threaded C++ code

I have a C++ code that implement a special protocol over the serial port. The code is multi-threaded and internally polls the serial port and do its own cyclic processing. I would like to call this driver from erlang and also receive events from this driver. My concern is that this C++ code is multi-threaded and also statefull meaning that when I call a certain function on the driver, it caches things internally which will be used/required on the subsequent calls of the driver. My questions are
1.Does NIF run in the same os process as the rest of my erlang proceses or NIF is launched in a separate os process?
2.Does it make sense to warp this multi-threaded stateful C++ code with NIF?
4.If NIF is not the right approach, what is the better way for me to make Elrang talk back and forth with this C++ code. I also prefer my C++ code to be inside the same OS process as the rest of my Erlang processes and as it looks like linked-in drivers are an option but not sure if the multi-threaded nature of my C++ code will be ok to that model. Plus I hear they can mess up elrang scheduler?
Unlike ports, NIFs are run within Erlang VM process, similar to drivers. Because of that, any NIF crashes will bring VM down as well. And, answering in advance, to your last question, NIFs, like drivers, may block your scheduler.
That depends on the functionality you are implementing by this C++ code. Due to the answer 1), you probably want to avoid concurrency in the C++ part, since it's a potential source of errors. It's not always possible, of course. But if you are implementing, say, some workers pool, go ahead and implement 1-threaded code, spawning it as many times as you need.
Drivers can be multi-threaded too, with same potential problems and quite similar performance (well, still slightly faster than NIFs). If you are not completely sure about your C++ code stability, use it as an Erlang port.
Speaking of the difference between NIFs and drivers, the former is synchronous natively, and the latter can be asynchronous (which can be really a huge advantage if you don't want to receive any answers for most of the commands). Drivers are easier to mess up and harder to implement (but once you grasp the main patterns and problems, they seem okay, actually).
Here's a good start for drivers:
http://www.erlang.org/doc/apps/erts/driver.html
And something similar (behold the difference in complexity) for NIFs:
http://www.erlang.org/doc/tutorial/nif.html

Best approach for Comet? (Non Blocking IO vs Erlang)

Perhaps the Question isnt that simple to answer... but what is your opinion? Should i either use Non-Blocking approaches (libevent for exampe) or use erlang light weight processes to:
Achieve as much connections as possible at a given amount of RAM
Achieve as much throughput as possible at a given amount of CPU
The background is, that i am planing to code a pub/sub-Server and i cannot decide which approach i should use.
One article about making A Million-user Comet Application with Mochiweb you can read there. But I think stability, flexibility and maintainability will be more important most of time. Keeping this in mind I would not think about anything other than Erlang even there will be some better performing solution.
Under the hood, the Erlang VM uses non-blocking IO. If you Erlang light weight process blocks, the VM does not really do a kernel level thread context switch. Most of the time, it will just wake up another LWP on the same OS thread (thus, its not "blocking" in the right sense of the word).
You can even start the vm using the +A argument and specify how many IO event loop threads you would like to allocate (AFAIK, Node.js is still single-threaded and if a callback function hangs, ur VM is done for)

What asynchronous or incremental IMAP clients or parsers exist?

I'm looking for an IMAP client library or parser that can support asynchronous I/O. The end goal being I could have dedicated thread(s) do socket I/O (via a poll() loop or similar) and could send data to waiting clients/parsers, as it becomes available. All of the code/libraries I've seen to date (java.mail, Python's imaplib, Thunderbird's C++ IMAP client, many random ones in C, C++) seem to follow the traditional blocking, one-thread-per-socket approach, which won't work for me.
My ideal client or library would behave much like https://github.com/ry/http-parser in that I/O behavior would not be dictated by the IMAP bits. Instead, the IMAP library would deal with buffers/strings and the caller would manage I/O.
The only possibility I've seen so far is libcurl. But, I'm not sure if the API will work and want to look at other possibilities before going too far down that road or inventing my own solution.
I'm open to considering libraries in any programming language.
Twisted (http://twistedmatrix.com/) has an asynchronous IMAP4 client: twisted.mail.imap4.IMAP4Client
People sometimes say that this protocol is difficult to implement, so implementation quality may be an issue. The defunct Chandler project used the twisted IMAP4 client, and its source code contains the comment "This functionality will be enhanced to be a more robust IMAP client in the near future".
I've had great results with node.js for this kind of thing. If listening to a lot of open sockets you'll need to tweak some linux settings to increase limits for the number of open file discriptors but it works great.

How does Grand Central Dispatch really use the operating system?

I have a solid idea how GCD works, but I want to know more about the touted "operating system management" internals. It seems almost every technical explanation of how Grand Central Dispatch works with the "Operating System" is totally different. I'll paraphrase some of my findings.
"It's a daemon that's global to the OS
that distributes tasks over many
cores."
I'm not stupid enough to believe that.
"Support is built into the kernel to
be aware of all GCD applications. GCD
applications work in concert with the
kernel to make logical decisions on
how to manage threads within the
application."
Sounds like this synchronization scheme would be much slower than just managing the logic within the application.
"GCD is exists solely in the
application and uses current system
load as a metric to how it behaves."
This sounds more realistic to me, but I only saw a statement like this in one place.
What's really going on here? Is it just a library, or is it an entire "system"?
It is a library, but there are some kernel optimizations to allow for system level control. In particular, what happens is that there is an addition interface pthread_workqueue that allows GCD to tell the kernel it wants a thread to run some particular function, but doesn't actually start a thread (it is basically a continuation). At that point the kernel can choose to start that continuation or not depending on system load.
So yes, there is a global system wide infrastructure that manages GCD threads in the kernel, and the second answer is the correct one. The mistake you are making is thinking that there is synchronization going on there that is going to cost something. The scheduler is going to run no matter what, what GCD has done is used a new interface that lets the scheduler not only decide whether or not to run the threads based on their relative priority, but whether or not to create or destroy the threads as well.
It is a (significant) optimization, but it is not strictly necessary, and the FreeBSD port doesn't actually have support for the system wide stuff. If you want to look at the actual interfaces, here is pthread_workqueue.h, the implementation is in Apple's pthread.c, and you can see the stub entry point the kernel uses for starting up the workqueues in their asm stubs in start_wqthread.s. You can also go crawling through xnu to see how it upcalls into the stub if you really want.

Resources