Distributed Dask CPP workers

Distributed Dask CPP workers - dask

DASK has a very powerful distributed api. As far as I can understand it can only support though native python code and modules.
Does anyone know if distributed DASK can support c++ workers?
I could not find anything in the docs.
Would there be any other approach apart from adding python bindings to cpp code to use that functionality?

You are correct, if you wanted to call into C++ code using Dask, you would do it by calling from python, which usually means writing some form of binding layer to make the calling convenient. If there is also a C API, you could use ctypes or cffi.
In theory, the scheduler is agnostic of the language of the client and workers, so long as they agree with each other, but no one has implemented a C++ client/worker. This has been done, at leats a POC, for Julia.

Related

How to embed OPA as a lib in a low latency C++ process

We are currently evaluating OPA as our main fine-grained-access control engine. Our data path is written in C++ for high performance requirements. I see that it is possible to embed OPA in a GO process, but not sure if this is evaluated in a C++ container.
Are there any existing deployments where OPA was embedded as a library in C++ container?
If we embed OPA as a library, will there be any communication through the network (to other processes or data bases) when policies are evaluated?

For using OPA from C++, there are a few options, ordered roughly by complexity and increasingly unchartered territory:
Use the HTTP API, in a sidecar process or some standalone service. (Obviously now what you're looking for, included for completeness' sake.)
Use Wasm: there is no SDK for C++, but the ABI hopefully isn't too complicated, see the docs.
Embed OPA as a Cgo library: the amount of work is considerable, you'd have to define the surface API, i.e., do the work necessary to re-wrap OPA's core into a library you could link in.
I'd go with trying (1.) first, seeing if it really isn't feasible for your performance requirements (using a Unix socket, profiling the evaluation, having a good look at your policy code...); then I'd reach for Wasm (2.). OPA's Wasm modules contain the compiled recipe for evaluating your policy's logic; there is no interpreter overhead. With (3.), you'd have to do more work than for (2.), and (in my opinion) get less for it.

Python for New Distributed Computing Project?

I need to write a compute-intensive simulation program. I tried writing a multi-threaded version of this program, but it's taking too much time. Now I plan to expand to multiple nodes (probably via Amazon EC2 nodes).
I'm already familiar with Python. Is Python outfitted with some parallel module a viable option if I care about speed, or would I be better off going to some other framework/language like Erlang?
Can you even write a simulation program in Erlang?
The project is more about dividing up computation rather than dividing up a dataset, so I didn't consider frameworks based on map reduce

dispy is a framework for distributed computing with Python. It uses asyncoro, a framework for asynchronous, concurrent programming using coroutines, with some features of erlang (broadly speaking). Disclaimer: I am the author of both these frameworks.

If you are familiar with python already I would recommend you keep simulation in python (and speed up critical parts in C) and use Erlang for managing it. Writing simulation in Erlang would be a far away out of your comfort zone (even personally I would do it). You probably can reuse parts of Erlang projects as Disco project or Riak core. Start your project with some sub-optimal POC and tune it in iterations. It means start with python, embed it to Erlang (probably Disco) and then move bits around until you are not happy with performance and features. You can end up with anything including pure Erlang solution or emended Python in BEAM using NIF or anything else which satisfy your needs.

Does your problem parallelize trivially? Then you may want to take a look at Elastic Map Reduce instead of EC2.

How do you debug functions from includes in Erlang?

The implementation of ScrumJet on GitHub (as of this writing) shares essentially identical functions between the storage modules for tasks, categories and boards. This was achieved by moving the identical code which makes heavy use of the ?MODULE macro into scrumjet_datastore.hrl. Each of scrumjet_task.erl, scrumjet_category.erl and scrumjet_board.erl include scrumjet_datastore.hrl and have no functions defined locally.
This works very well when there is nothing wrong. However, if I need to debug, then the debugger brings up the empty module instead of the header file where the functions are defined.
Does anyone know how to make the Erlang debugger work for functions in includes?

Using includes in Erlang to share implementations of functions is not generally a good idea. It has some uses, but it should be avoided in regular application code.
As I mentioned back in 2009 I followed Zed and Adam Lindberg's advice and used a datastore module with parameterized methods instead.

Do other languages apart from Erlang have the ability to send code to running instances?

I just learnt that Erlang can remote load code and modules onto all instances of a cluster by using the "nl" command. Can any other languages do this?

Technically any of the lisp dialects could do it. Since 'code is data' in lisp, passing some code onto a different box and 'eval'-ing it would do the job. SLIME does this to some extent via remote repl using sockets.

You can write a ClassLoader in java similar to the codeloader in erlang. Java ClassLoaders have a lot of isolation, so it can be a bit more complicated (but you could do some nice things with this if you use it to your advantage rather than think of it as the enemy).
ClassLoaders are easy to write, but java doesn't ship with one that does the same kinds of things erlang does. Java also doesn't have the clustering tools erlang does, so it's not particularly surprising.

In theory pure functional languages should have such possibility but till this moment I've heard only about Erlang too.

None that I know, but it should be possible to implement it in dynamic languages such as Python, Perl or Lisp.

Scripting languages that support fibers/coroutines?

I'd like to start a new network server project in a language that supports concurrency through fibers aka coroutines aka user-mode threads. Determining what exactly are my options has been exceedingly difficult as the term "coroutine" seems to be used quite loosely to mean a variety of things, and "fiber" is used almost exclusively in reference to the Win32 API.
For the purposes of this question, coroutines/fibers:
support methods that pause execution by yielding a result to the calling function from within a nested function (i.e. arbitrarily deep in the call stack from where the coroutine/fiber was invoked)
support transferring control to another arbitrary coroutine at its current point of execution (i.e. yield to a coroutine that did not call your coroutine)
What are my language options? I know Ruby 1.9 and Perl (Coro) both have support, what else? Anything with a mature gc and dynamic method invocation is sufficient.

greenlet extension meets your requirements in Python (regular one, not Stackless).
Greenlet API is a bit low-level, so I recommend using gevent that gives you API suitable for an application. (Disclaimer: I wrote gevent)

Lua supports coroutines, see http://lua-users.org/wiki/CoroutinesTutorial , give it a try!

Tcl 8.6, currently in beta, will support coroutines. For more info see the Tcl Wiki coroutine page

Stackless Python is another option that meets your requirements. If Python, Ruby and Perl are all unsuitable for your purposes (despite all meeting your stated requirements), you presumably have other unstated requirements or preferences -- care to spell them out?-)

Scheme has call-with-current-continuation which is a building block on which all kinds of flow control can be built. It definitely can support the two uses you mentioned.
There are many robust, widely available implementations of Scheme such as PLT Scheme and Chicken Scheme.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

Distributed Dask CPP workers - dask

Related

How to embed OPA as a lib in a low latency C++ process

Python for New Distributed Computing Project?

How do you debug functions from includes in Erlang?

Do other languages apart from Erlang have the ability to send code to running instances?

Scripting languages that support fibers/coroutines?

Categories

Resources