Multiprocessing sharing config.py - memory

I'm doing a multiprocessing program where each process modify a config.py file and where each process uses this config.py file.
PROBLEM : By the time the process 1 uses the config file, the process 2 has already overridden the config file, so the result of process 1 is wrong.
What could I do?

Related

Error running Beam job with DataFlow runner (using Bazel): no module found error

I am trying to run a beam job on dataflow using the python sdk.
My directory structure is :
beamjobs/
setup.py
main.py
beamjobs/
pipeline.py
When I run the job directly using python main.py, the job launches correctly. I use setup.py to package my code and I provide it to beam with the runtime option setup_file.
However if I run the same job using bazel (with a py_binary rule that includes setup.py as a data dependency), I end up getting an error:
Traceback (most recent call last):
File "/usr/local/lib/python3.7/site-packages/dataflow_worker/batchworker.py", line 804, in run
work, execution_context, env=self.environment)
File "/usr/local/lib/python3.7/site-packages/dataflow_worker/workitem.py", line 131, in get_work_items
work_item_proto.sourceOperationTask.split)
File "/usr/local/lib/python3.7/site-packages/dataflow_worker/workercustomsources.py", line 144, in __init__
source_spec[names.SERIALIZED_SOURCE_KEY]['value'])
File "/usr/local/lib/python3.7/site-packages/apache_beam/internal/pickler.py", line 290, in loads
return dill.loads(s)
File "/usr/local/lib/python3.7/site-packages/dill/_dill.py", line 275, in loads
return load(file, ignore, **kwds)
File "/usr/local/lib/python3.7/site-packages/dill/_dill.py", line 270, in load
return Unpickler(file, ignore=ignore, **kwds).load()
File "/usr/local/lib/python3.7/site-packages/dill/_dill.py", line 472, in load
obj = StockUnpickler.load(self)
File "/usr/local/lib/python3.7/site-packages/dill/_dill.py", line 462, in find_class
return StockUnpickler.find_class(self, module, name)
ModuleNotFoundError: No module named 'beamjobs'
This is surprising to me because the logs show above:
Successfully installed beamjobs-0.0.1 pyyaml-5.4.1
So my package is installed successfully.
I don't understand this discrepancy between running with python or running with bazel.
In both cases, the logs seem to show that dataflow tries to use the image gcr.io/cloud-dataflow/v1beta3/python37:2.29.0
Any ideas?
Ok, so the problem was that I was sending the file setup.py as a dependency in bazel; and I could see in the logs that my package beamjobs was being installed correctly.
The issue is that the package was actually empty, because the only dependency I included in the py_binary rule was that setup.py file.
The fix was to also include all the other python files as part of the binary. I did that by creating py_library rules to add all those other files as dependencies.
Probably the wrapper-runner script generated by Bazel (you can find path to it by calling bazel build on a target) restrict set of modules available in your script. The proper approach is to fetch PyPI dependencies by Bazel, look at example

Where is the correct location for Dask Worker configuration file and the Dask Scheduler configuration file?

I am attempting to find the correct location of Dask configuration files. I have a number of questions related to configuring Dask.
$ dask-worker --version
dask-worker, version 2.3.2
Do the Dask Worker and Dask Scheduler share the same configuration file or do they use different configuration files?
I am unclear if there are configuration variables that are specific to Dask Worker and Dask Scheduler. Is there a list of the valid configuration variables for Dask Worker and Dask Scheduler?
Where are the correct locations of the Dask Worker and Dask Scheduler configuration files?
I have found three different configuration files across my system and the Dask documentation:
~/.config/dask/distributed.yaml
~/.config/dask/dask.yaml
~/.dask/config.yaml
On my Dask Worker and Dask Scheduler machines, I find a file located at ~/.config/dask/dask.yaml which does not contain much information. I am not sure what should go into this file or if/where it is ever called by the Dask library.
I also see a file at ~/.config/dask/distributed.yaml that contains much more information. This looks more like the configuration I was expecting. I can see that these configuration are also loaded by Dask in distributed/config.py
A third file (~/.dask/config.yaml) makes an appearance in the documentation. To quote the documentation:
Dask accepts some configuration options in a configuration file, which by default is a .dask/config.yaml file located in your home directory.
I do not see this file on my system. Am I responsible for creating this configuration file? I never see this file referenced in the repository. Why does the documentation differ from the source code?
Can I print a list of all active configuration variables for both the Worker and the Scheduler?
Is there a way, either on the command line or in Python, where I can inspect the active configurations?
For documentation on Dask's configuation system please see https://docs.dask.org/en/latest/configuration.html
That page says:
Configuration is specified in one of the following ways:
YAML files in ~/.config/dask/ or /etc/dask/
Environment variables like DASK_DISTRIBUTED__SCHEDULER__WORK_STEALING=True
Default settings within sub-libraries
I've removed the page that you were looking at in this PR: https://github.com/dask/distributed/pull/3038

Precompile main.py with the micropython binary image for esp8266

There is boot.py available by default in the micropython image.
I have tested a code, in the python module main.py. I would like to do the following
I would like to compile a image, so it makes it easier to flash it to more than 10 devices and I do not have to start webrepl.
is there a way to stop boot messages that says micropython version number etc.?
I tried the following: apparently they are already activated:
https://forum.micropython.org/viewtopic.php?t=2334
I successfully compiled an image using the following:
https://cdn-learn.adafruit.com/downloads/pdf/building-and-running-micropython-on-the-esp8266.pdf
Question:
how to create an image with main.py, where should this file go in this folder /home/vagrant/micropython/esp8266 ?
You need to change micropython\esp8266\modules\inisetup.py.
In this file, a block of code writes boot.py file at micropython start-up. Like below
with open("boot.py", "w") as f:
f.write("""\
# This file is executed on every boot (including wake-boot from deepsleep)
#import esp
#esp.osdebug(None)
import gc
#import webrepl
#webrepl.start()
gc.collect()
import mymain
""")
Notice last line import mymain. Copy your mymain.py file to the micropython\esp8266\modules directory.
mymain.py file should not have if __name__ == '__main__' block, so that it is executed at import. All other files that mymain is importing should also be in the modules directory. After building the code, all required files will get included with the binary.
1) boot.py is generated by the following script:
/home/vagrant/micropython/esp8266/script/inisetup.py
the function: setup() writes boot.py to the filesystem at every start up.
this would be the place to add main.py also writing it in the file.
or to add it in scripts and start it with boot.py
2) stop boot messages: "performing initial checks" is on inisetup.py. some are on port_diag.py in the scripts folder.

Yaws - start of server from current directory

I know that can add set of docroot in yaws.conf but this not always convenient. Is exist way to start of server from current directory with yaws without modifying configuration file?
Two options:
Keep a template of your yaws.conf file somewhere and invoke yaws through a script that first uses the template to create a conf file with the current working directory filled in as docroot, and then runs yaws using its --conf command-line option to specify the newly-created conf file as its configuration file.
Run yaws in embedded mode, which allows you to programmatically specify the configuration. You can use file:get_cwd/0 to obtain the pathname of the current working directory, then use it as the value for docroot in the configuration details you pass to yaws_api:embedded_start_conf/1,2,3,4.

Config files and Reltool

I'm building a release with Reltool. The app needs config files to start. It reads a config file using the following function:
read_config(Filename) ->
{ok, [Config]} = file:consult(filename:join(
[filename:dirname(code:which(?MODULE)),
"..", "config", Filename])),
Config.
What's a good way to use config files so that Reltool builds a working release?
In case you need more specialized config files rebar allows you to copy files into your release, eg. into a etc folder under your app (rebar creates etc by default) using the overlay option in your reltool.config file (overlay is not a standard reltool config option):
%% reltool.config
{overlay, [{copy, "../path/foo.config", "etc/foo.config"}, ...
You can pass the config file as argument to the vm using the vm.args file:
%% vm.args
-config etc/foo.config
Your start script should pass the vm.args file as argument to the vm (again rebar generates a script that does that automatically).
The function init:get_argument allows you to read more specialized arguments to the vm, eg:
%% vm.args
-very_special_config etc/foo.config
and
case init:get_argument(very_special_config) of
{ok, Arg} -> Arg;
_ -> fail
end
You do not need to have your own config file, unless it's for very special purpose.
If your config file is different from version to version, you can have those different config to your <application>/ebin/<application>.app.
You can setup your default config variables to your <application>/ebin/<application>.app.
For more details about this, please refer to http://www.erlang.org/doc/man/app.html
Then, you are ready to use the config variables by using
application:get_env(<application_name>, <key>, <default_value>).
If not defined, you can also set with application:set_env/3.
For more, please look at this http://www.erlang.org/doc/man/application.html
Then you can also override those application variables by defining <any_name_or_system_name>.config, then use that one when you start with erl command with --config <file_name>.config. You can take a look at this for starting command options, http://www.erlang.org/doc/man/erl.html
When you start a command, you can also override the config variables by using -<application> <key> <value>.
You may also take look at this for config file syntax for your application.
http://www.erlang.org/doc/man/config.html
Once you have succesfully built an OTP application, it will seem very easy to you.

Resources