My service modules not recognised by dask - dask

I have a service with in I have several modules and in the main file I am importing most of my modules like below.
from base_client import BaseClient
import request_dispatcher as rd
import utils as util
In one of the functions in main I am calling the dask client submit. When I try to get the result back from future object it give me modulenotfound error as below
****ModuleNotFoundError: No module named 'base_client'****
This is how I define my client and call the function
def mytask(url, dest):
.....
client = Client(<scheduler ip>)
f_obj = client.submit(mytask, data_url, destination)
How exactly can I make these modules available to scheduler and workers?

When you do submit, Dask wraps up your task and sends it to the worker(s). As part of this package, any variables that are required by the task are serialised and sent too. In the case of functions defined inline, this includes the whole function, but in the case of functions in a module, it is only the module and function names. This is done to same CPU and bandwidth (imagine trying to send all the source of all of the modules you happen to have imported).
On the worker side, the task is unwrappd, and for the function, this means importing the module, import base_client. This follows the normal python logic of looking in the locations defined by sys.path. If the file defining the module isn't there, you get the error above.
To solve, copy the file to a place that the worker can see it. You can do this with upload_file on a temporary basis (which uses a temporary directory), but you would be better installing the module using usual pip or conda methods. Importing from the "current directory" is likely to fail even with a local cluster.
To get more information, you would need to post a complete example, which shows the problem. In practice, functions using imports from modules are used all the time with submit without problems.

Related

How to Iterate over source objects of incoming links in DXL, in Modules not previously loaded

my question is quite the same as this one :
How to Iterate over target objects of outgoing links in DXL, in Modules not previously loaded
but regarding incoming links.
I would like to use the source objects of incoming links but there are located in module that are not previsously loaded.
I don't want to open and close module each time because it would cost too much time. I would like to open them once and close them at the end.
Two solution for this :
be able to know if the module is already open or not so that I don't open it again (is there a "is_open" function in DXL + store the list of open module in a table and close them all at the end.
or better :
before start of loop, use a loop using the link module and the target module to find all module in the database that could be linked to the target module. And I load them all (even if there is no links between them. But my script would be simpler this way). How can I do this ? I tried something like :
ModName_ src_mod_linkset
for src_mod_linkset in "target_module"<-"linkmodulename" do
{
print "test"
}
but in this kind of loop, it doesn't work because "target_module" should be an object and not the complete module.
https://www.ibm.com/mysupport/s/forumshome has a lot of information on this. A query "Engineering Requirements Management DOORS" incoming links brings you some example scripts. I prefer this approach (load the ModuleVersion if its data is null): https://www.ibm.com/mysupport/s/forumsquestion?language=de&id=0D50z00006HIDztCAH
About "is_open": there is a loop for module in database, which gives you a list of all open modules. You might want to store all open modules at the star of your script in a Skip list and when iterating over the incoming modules check to see whether you have to close the module at the end of your script.
I would not use your second approach if you plan to run your script on baselines, it might happen that the link set in the link module has been deleted in the meantime, so you will not get all possible in links. Anyway, the link modules could be anywhere in your database, not necessarily near your incoming module.

Beam python sdk - save_main_session - DoFn imports - what are the best practices?

I have a question about save_main_session and best practices, and please let me know if there is a doc somewhere that covers this question. So with save_main_session set to False, if my DoFn in the process method uses for example standard lib copy module, Beam's FileSystems API or my custom module, if I import those at the module level (top of the file) in which the DoFn is defined, this would fail in Dataflow service with an error that says that copy (etc) module was not found from the process method (which all makes sense), and I could fix this by either:
importing copy inside the process method
"saving" copy reference/object as a field/provider/etc in the DoFn instance
setting save_main_session to True
I don't want to set save_main_session to True because afaiu it captures whole main session and I have bunch of objects that are not serializable in there, and overall find save_main_session to be smelly and hacky. 1st option is kinda smelly as well and doesn't always work - tho imports are cached so performance wise should be okish - but it would not work for my custom modules afaiu (unless I explicitly install/send them over to the workers). And lastly 2nd is kinda hacky - working around the Beam framework.
I'm leaning mostly towards the 2nd option, but it just doesn't feel right to no be able to just use the global imports and workaround it be adding and using instance field(s).
What is the best practice for this problem? I know the examples are suggesting to set save_main_session to True, but that again has consequences and just smells. Are there better options?
According to the documentation, if you have objects in your global namespace that cannot be pickled, you will get a pickling error. If the error is regarding a module that should be available in the Python distribution, you can solve this by importing the module locally, where it is used.
The DoFn class comes with a setup method that is called once per DoFn instance. You can override this method, and perform your imports there.
As a note, this method is available in Beam's Python release for 2.13.0. If you're using an earlier version, you can override start_bundle in your DoFn to perform the import there.

Dart: how to specify an Isolate URI in an imported package?

I have written some code and I want to provide it in a package, but I want to also expose it to package consumers as a worker. For this purpose I have created a wrapper class, that runs an isolate internally and using the send command and listeners communicate with the Isolate to provide the functionality.
The problem arises when I want to use this wrapper class from bin or web directory: the Uri provided is interpolated from the directory of the running/main Isolate instead of from the package root. For bin it is packagename|bin/ and for web it is packagename|web.
I would like to export this class to the consumers so they can chose an easier approach than to construct their own Isolate, but I am not sure how to specify the main file that will be used in spawnUri.
Is there a way to specify the file so it will always be resolved to the correct file regardless of where the main Isolate is run from.
Structure:
// Exports the next file so the class in it will be package visible
packageroot -> lib/package_exports_code_that_spawns_isolate.dart
// This file should contain URI that always resolve to the next file
packageroot -> lib/code_that_spawns_isolate.dart
// The main worker/Isolate file
packageroot -> lib/src/worker/worker.dart
Thanks.
To refer to a library in your package, you should use a package: URI.
Something like:
var workerUri = Uri.parse("package:myPackage/src/worker/worker.dart");
var isolate = await Isolate.spawnUri(workerUri,...);
It's not perfect because it requires you to hard-wire your package name into the code, but I believe it's the best option currently available.
The Isolate.spawnUri function doesn't (and can't) resolve a relative URI reference wrt. the source file that called it - nothing in the Dart libraries depends on where it's called from, that's simply too fragile - so a relative URI isn't going to work. The only absolute URI referencing your worker is a package: URI, so that's what you have to use.

function 'Func/Arity' already imported from 'Module'

I defined both area/1 and perim/1 in modules sqaure and circle.
I want to import and use them in another module. Here is my import statements:
-import(square, [area/1, perim/1]).
-import(circle, [area/1, perim/1]).
I got these error messages.
~/test.erl:4: function area/1 already imported from square
~/test.erl:4: function perim/1 already imported from square
I know erlang does not support namespace. But since we can qualify a function call by specifying the module (i.e. square:area vs circle:area), I fail to see how the lack of namespace is the source of the error here.
So, what exactly caused the above error and how can I fix it?
In Erlang, "importing" a function from another module means being able to call it as if it were a local function, without the module prefix. So with this directive:
-import(square, [area/1, perim/1]).
you could write area(42) and it would mean the same as square:area(42).
However, if you include area and perim functions from two modules, it would be ambiguous which one you'd actually call when writing area(42).
As you correctly note, you can always qualify the function call with the name of the module, i.e. square:area(42) and circle:area(42) - so I would suggest doing so consistently and removing both import directives. This is also recommended by rule 6.6 of the Erlang Programming Rules - "Don't use import".

Erlang: "extending" an existing module with new functions

I'm currently writing some functions that are related to lists that I could possibly be reused.
My question is:
Are there any conventions or best practices for organizing such functions?
To frame this question, I would ideally like to "extend" the existing lists module such that I'm calling my new function the following way: lists:my_funcion(). At the moment I have lists_extensions:my_function(). Is there anyway to do this?
I read about erlang packages and that they are essentially namespaces in Erlang. Is it possible to define a new namespace for Lists with new Lists functions?
Note that I'm not looking to fork and change the standard lists module, but to find a way to define new functions in a new module also called Lists, but avoid the consequent naming collisions by using some kind namespacing scheme.
Any advice or references would be appreciated.
Cheers.
To frame this question, I would ideally like to "extend" the existing lists module such that I'm calling my new function the following way: lists:my_funcion(). At the moment I have lists_extensions:my_function(). Is there anyway to do this?
No, so far as I know.
I read about erlang packages and that they are essentially namespaces in Erlang. Is it possible to define a new namespace for Lists with new Lists functions?
They are experimental and not generally used. You could have a module called lists in a different namespace, but you would have trouble calling functions from the standard module in this namespace.
I give you reasons why not to use lists:your_function() and instead use lists_extension:your_function():
Generally, the Erlang/OTP Design Guidelines state that each "Application" -- libraries are also an application -- contains modules. Now you can ask the system what application did introduce a specific module? This system would break when modules are fragmented.
However, I do understand why you would want a lists:your_function/N:
It's easier to use for the author of your_function, because he needs the your_function(...) a lot when working with []. When another Erlang programmer -- who knows the stdlb -- reads this code, he will not know what it does. This is confusing.
It looks more concise than lists_extension:your_function/N. That's a matter of taste.
I think this method would work on any distro:
You can make an application that automatically rewrites the core erlang modules of whichever distribution is running. Append your custom functions to the core modules and recompile them before compiling and running your own application that calls the custom functions. This doesn't require a custom distribution. Just some careful planning and use of the file tools and BIFs for compiling and loading.
* You want to make sure you don't append your functions every time. Once you rewrite the file, it will be permanent unless the user replaces the file later. Could use a check with module_info to confirm of your custom functions exist to decide if you need to run the extension writer.
Pseudo Example:
lists_funs() -> ["myFun() -> <<"things to do">>."].
extend_lists() ->
{ok, Io} = file:open(?LISTS_MODULE_PATH, [append]),
lists:foreach(fun(Fun) -> io:format(Io,"~s~n",[Fun]) end, lists_funs()),
file:close(Io),
c(?LISTS_MODULE_PATH).
* You may want to keep copies of the original modules to restore if the compiler fails that way you don't have to do anything heavy if you make a mistake in your list of functions and also use as source anytime you want to rewrite the module to extend it with more functions.
* You could use a list_extension module to keep all of the logic for your functions and just pass the functions to list in this function using funName(Args) -> lists_extension:funName(Args).
* You could also make an override system that searches for existing functions and rewrites them in a similar way but it is more complicated.
I'm sure there are plenty of ways to improve and optimize this method. I use something similar to update some of my own modules at runtime, so I don't see any reason it wouldn't work on core modules also.
i guess what you want to do is to have some of your functions accessible from the lists module. It is good that you would want to convert commonly used code into a library.
one way to do this is to test your functions well, and if their are fine, you copy the functions, paste them in the lists.erl module (WARNING: Ensure you do not overwrite existing functions, just paste at the end of the file). this file can be found in the path $ERLANG_INSTALLATION_FOLDER/lib/stdlib-{$VERSION}/src/lists.erl. Make sure that you add your functions among those exported in the lists module (in the -export([your_function/1,.....])), to make them accessible from other modules. Save the file.
Once you have done this, we need to recompile the lists module. You could use an EmakeFile. The contents of this file would be as follows:
{"src/*", [verbose,report,strict_record_tests,warn_obsolete_guard,{outdir, "ebin"}]}.
Copy that text into a file called EmakeFile. Put this file in the path: $ERLANG_INSTALLATION_FOLDER/lib/stdlib-{$VERSION}/EmakeFile.
Once this is done, go and open an erlang shell and let its pwd(), the current working directory be the path in which the EmakeFile is, i.e. $ERLANG_INSTALLATION_FOLDER/lib/stdlib-{$VERSION}/.
Call the function: make:all() in the shell and you will see that the module lists is recompiled. Close the shell.
Once you open a new erlang shell, and assuming you exported you functions in the lists module, they will be running the way you want, right in the lists module.
Erlang being open source allows us to add functionality, recompile and reload the libraries. This should do what you want, success.

Resources