Frequent KilledWorker: pandas_read_text-read-block-from-delayed - dask

I have a standard cluster setup on kubernetes using the dask docker images but not using the dask helm charts. I tried running an existing script on the cluster but doesn’t seem to run. It keeps throwing errors.
The cluster details: 1 notebook, 1 scheduler, 1 worker & 1 shared vol.
I read up on some of the threads on KilledWorkers so I looked into the logs but couldn't figure it out.
distributed.worker - ERROR - None Traceback (most recent call last): File "/opt/conda/lib/python3.7/site-packages/distributed/worker.py", line 814, in handle_scheduler comm, every_cycle=[self.ensure_communicating, self.ensure_computing] File "/opt/conda/lib/python3.7/site-packages/tornado/gen.py", line 735, in run value = future.result() File "/opt/conda/lib/python3.7/site-packages/tornado/gen.py", line 748, in run yielded = self.gen.send(value) File "/opt/conda/lib/python3.7/site-packages/distributed/core.py", line 457, in handle_stream msgs = yield comm.read() File "/opt/conda/lib/python3.7/site-packages/tornado/gen.py", line 735, in run value = future.result() File "/opt/conda/lib/python3.7/site-packages/tornado/gen.py", line 742, in run yielded = self.gen.throw(*exc_info) # type: ignore File "/opt/conda/lib/python3.7/site-packages/distributed/comm/tcp.py", line 218, in read frames, deserialize=self.deserialize, deserializers=deserializers File "/opt/conda/lib/python3.7/site-packages/tornado/gen.py", line 735, in run value = future.result() File "/opt/conda/lib/python3.7/site-packages/tornado/gen.py", line 209, in wrapper yielded = next(result) File "/opt/conda/lib/python3.7/site-packages/distributed/comm/utils.py", line 85, in from_frames res = _from_frames() File "/opt/conda/lib/python3.7/site-packages/distributed/comm/utils.py", line 71, in _from_frames frames, deserialize=deserialize, deserializers=deserializers File "/opt/conda/lib/python3.7/site-packages/distributed/protocol/core.py", line 126, in loads value = _deserialize(head, fs, deserializers=deserializers) File "/opt/conda/lib/python3.7/site-packages/distributed/protocol/serialize.py", line 189, in deserialize dumps, loads, wants_context = families[name] KeyError: None

I got same problem and found solution.
In Dask 2.3 distributed serialization changed a bit. Your client is probably higher than 2.3.0 and scheduler and workers aren't. Try to upgrade your cluster so either everything is higher than 2.3.0 or lower than that.

Related

PartitionedDataSet not found when Kedro pipeline is run in Docker

I have multiple text files in an S3 bucket which I read and process. So, I defined PartitionedDataSet in Kedro datacatalog which looks like this:
raw_data:
type: PartitionedDataSet
path: s3://reads/raw
dataset: pandas.CSVDataSet
load_args:
sep: "\t"
comment: "#"
In addition, I implemented this solution to get all secrets from credentials file via environment variables including AWS secret keys.
When I run things locally using kedro run everything works just fine, but when I build Docker image (using kedro-docker) and run pipeline in Docker environement with kedro docker run and by providing all enviornement variables using --docker-args option I get the following error:
Traceback (most recent call last):
File "/usr/local/bin/kedro", line 8, in <module>
sys.exit(main())
File "/usr/local/lib/python3.7/site-packages/kedro/framework/cli/cli.py", line 724, in main
cli_collection()
File "/usr/local/lib/python3.7/site-packages/click/core.py", line 829, in __call__
return self.main(*args, **kwargs)
File "/usr/local/lib/python3.7/site-packages/click/core.py", line 782, in main
rv = self.invoke(ctx)
File "/usr/local/lib/python3.7/site-packages/click/core.py", line 1259, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/usr/local/lib/python3.7/site-packages/click/core.py", line 1066, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/usr/local/lib/python3.7/site-packages/click/core.py", line 610, in invoke
return callback(*args, **kwargs)
File "/home/kedro/kedro_cli.py", line 230, in run
pipeline_name=pipeline,
File "/usr/local/lib/python3.7/site-packages/kedro/framework/context/context.py", line 767, in run
raise exc
File "/usr/local/lib/python3.7/site-packages/kedro/framework/context/context.py", line 759, in run
run_result = runner.run(filtered_pipeline, catalog, run_id)
File "/usr/local/lib/python3.7/site-packages/kedro/runner/runner.py", line 101, in run
self._run(pipeline, catalog, run_id)
File "/usr/local/lib/python3.7/site-packages/kedro/runner/sequential_runner.py", line 90, in _run
run_node(node, catalog, self._is_async, run_id)
File "/usr/local/lib/python3.7/site-packages/kedro/runner/runner.py", line 213, in run_node
node = _run_node_sequential(node, catalog, run_id)
File "/usr/local/lib/python3.7/site-packages/kedro/runner/runner.py", line 221, in _run_node_sequential
inputs = {name: catalog.load(name) for name in node.inputs}
File "/usr/local/lib/python3.7/site-packages/kedro/runner/runner.py", line 221, in <dictcomp>
inputs = {name: catalog.load(name) for name in node.inputs}
File "/usr/local/lib/python3.7/site-packages/kedro/io/data_catalog.py", line 392, in load
result = func()
File "/usr/local/lib/python3.7/site-packages/kedro/io/core.py", line 213, in load
return self._load()
File "/usr/local/lib/python3.7/site-packages/kedro/io/partitioned_data_set.py", line 240, in _load
raise DataSetError("No partitions found in `{}`".format(self._path))
kedro.io.core.DataSetError: No partitions found in `s3://reads/raw`
Note: Pipeline works just fine in Docker environment, if I move files to some local directory, define PartitionedDataSet and build Docker image and provide environment variables through --docker-args
The solution (at least in my case) was to provide AWS_DEFAULT_REGION env variable in the kedro docker run command.

ESP8266 Micropython: [Errno 103] ECONNABORTED\r\n') after some time

I am having issues running the following script for a longer period of time:
I use ampy to execute the script on the ESP:
sudo ampy --port /dev/ttyUSB0 run photoresistor.py
photoresistor.py:
#!/usr/bin/env python3
import machine
import network
from time import sleep
from urllib.urequest import urlopen
import json
wifiap = network.WLAN(network.AP_IF)
wifiap.active(False)
routercon = network.WLAN(network.STA_IF)
routercon.active(True)
routercon.ifconfig(('10.0.0.128','255.255.255.0','10.0.0.138','10.0.0.138'))
routercon.connect('mywifi', '123')
while not routercon.isconnected():
pass
posturl=('http://10.0.0.156:23102/rest/v2/send')
adc = machine.ADC(0)
gc.enable()
while True:
value = adc.read()
if value < 200:
message = {'username': 'test', 'message': value, 'chatid': 'test', 'password': 'test', 'notifyself': 'false'}
r = urlopen(posturl, data=json.dumps(message).encode())
r.close()
gc.collect()
sleep(1)
It works as expected in the beginning but after some time I get the following stacktrace:
Traceback (most recent call last):
File "/usr/local/bin/ampy", line 11, in <module>
sys.exit(cli())
File "/usr/local/lib/python3.6/dist-packages/click/core.py", line 764, in __call__
return self.main(*args, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/click/core.py", line 717, in main
rv = self.invoke(ctx)
File "/usr/local/lib/python3.6/dist-packages/click/core.py", line 1137, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/usr/local/lib/python3.6/dist-packages/click/core.py", line 956, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/usr/local/lib/python3.6/dist-packages/click/core.py", line 555, in invoke
return callback(*args, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/ampy/cli.py", line 337, in run
output = board_files.run(local_file, not no_output)
File "/usr/local/lib/python3.6/dist-packages/ampy/files.py", line 303, in run
out = self._pyboard.execfile(filename)
File "/usr/local/lib/python3.6/dist-packages/ampy/pyboard.py", line 273, in execfile
return self.exec_(pyfile)
File "/usr/local/lib/python3.6/dist-packages/ampy/pyboard.py", line 267, in exec_
raise PyboardError('exception', ret, ret_err)
ampy.pyboard.PyboardError: ('exception', b'', b'Traceback (most recent call last):\r\n File "<stdin>", line 28, in <module>\r\n File "urequests.py", line 152, in post\r\n File "urequests.py", line 89, in request\r\nOSError: [Errno 103] ECONNABORTED\r\n')
No idea what to do.
I tried to play around with the garbage collection but it didn't help.
I suspect that the board doesn't clean up sockets properly.
If the board is sending post requests quickly in the loop (every second for 1 minute) and let it sit afterwards for a short period of time it fails quickly with above ECONNABORTED.
If the board sends post requests more slowly (like 2 in a minute) it takes way longer for it to fail. To conclude: I suspect the OS does not properly clean up resources and still has active connections after r.close() or I am overseeing something in the code.
I am not sure what else I can do to make sure these sockets are closed.
EDIT:
I found out it fails on connect (https://github.com/micropython/micropython-lib/blob/master/urllib.urequest/urllib/urequest.py):
line 28:
s.connect(ai[-1])
however routercon.isconnected() returns true:
>>> routercon.isconnected()
True
>>>
How can it be that altough there is a active connection I am unable to send an http post request?
EDIT2:
When this happens sometimes I also can't post to another endpoint e.g. the test server with the same webservice
>>> r = urlopen(posturl, data=json.dumps(message).encode())
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "urllib/urequest.py", line 28, in urlopen
OSError: [Errno 103] ECONNABORTED
>>> r = urlopen("http://10.0.0.8:23102/rest/v2/send", data=json.dumps(message).encode())
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "urllib/urequest.py", line 28, in urlopen
OSError: [Errno 103] ECONNABORTED
>>>
Interestingly a http get to google works:
>>> r = urlopen("http://www.google.com")
>>>
If I let it sit idle for some time http post start to work again.
Could it be that the OS is performing a cleanup in the background?
I faced the same problem. Restarting your api endpoint device will solve the issue.

Linux dask-worker cannot connect to windows dask-scheduler

I've started dask-scheduler on windows
Now I attempt to run dask-worker <ip>:<port> in ec2 instance.
I've been thrown at the following error:
distributed.nanny - INFO - Start Nanny at: 'tcp://10.34.33.12:36525'
distributed.diskutils - INFO - Found stale lock file and directory '/dask-worker-space/worker-v_5Vmm', purging
distributed.nanny - ERROR - Failed to start worker
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/distributed/nanny.py", line 541, in run
yield worker._start(*worker_start_args)
File "/usr/lib64/python2.7/site-packages/tornado/gen.py", line 1099, in run
value = future.result()
File "/usr/lib64/python2.7/site-packages/tornado/concurrent.py", line 260, in result
raise_exc_info(self._exc_info)
File "/usr/lib64/python2.7/site-packages/tornado/gen.py", line 315, in wrapper
yielded = next(result)
File "/usr/lib/python2.7/site-packages/distributed/worker.py", line 425, in _start
self.start_services(listen_host)
File "/usr/lib/python2.7/site-packages/distributed/worker.py", line 368, in start_services
self.services[k] = v(self, io_loop=self.loop, **kwargs)
File "/usr/lib/python2.7/site-packages/distributed/bokeh/worker.py", line 634, in __init__
main = Application(FunctionHandler(partial(main_doc, worker, extra)))
File "/usr/lib/python2.7/site-packages/bokeh/application/handlers/function.py", line 11, in __init__
_check_callback(func, ('doc',))
File "/usr/lib/python2.7/site-packages/bokeh/util/callback_manager.py", line 12, in _check_callback
sig = signature(callback)
File "/usr/lib/python2.7/site-packages/bokeh/util/future.py", line 85, in signature
for name in func.keywords.keys():
AttributeError: 'NoneType' object has no attribute 'keys'
distributed.nanny - INFO - Closing Nanny at 'tcp://10.34.33.12:36525'
distributed.dask_worker - INFO - End worker
Can you tell me what is happening?
Is it even possible for dask to connect to a machine for making a cluster with different os?

JupyterHub - oauth_client_id not found

I am using Azure to run python notebook using Jupyterhub. After spinning up the VM, I was able to access the notebooks just by using my username and password (just like ssh). However, one day later when I switched to another network (I am not claiming that the network might have been a problem) I am unable to access the link. It gives me The site can't be reached error.
So I tried rerunning the process again, and since then I have been struggling to make it run again. I have searched for similar issues on GitHub, but they aren't helpful either.
After the kill the process using kill pid command, I tried running the jupyterhub through this command:
/anaconda/envs/py35/bin/python /anaconda/envs/py35/bin/jupyterhub-singleuser --port=50387 --notebook-dir="~/notebooks" --config=/etc/jupyterhub/jupyterhub_config.py
And it gives me the error:
JUPYTERHUB_API_TOKEN env is required to run jupyterhub-singleuser. Did you launch it manually?
So I searched through github issues similar to this. I tried generating token manually using:
jupyterhub token username
And I added that token to JUPYTERHUB_API_TOKEN via export JUPYTERHUB_API_TOKEN=token. I also added token:username to c.Authenticator.tokens in jupyterhub_config.py. Now I get this error:
Traceback (most recent call last):
File "/anaconda/envs/py35/lib/python3.5/site-packages/traitlets/traitlets.py", line 528, in get
value = obj._trait_values[self.name]
KeyError: 'oauth_client_id'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/anaconda/envs/py35/bin/jupyterhub-singleuser", line 6, in <module>
main()
File "/anaconda/envs/py35/lib/python3.5/site-packages/jupyterhub/singleuser.py", line 455, in main
return SingleUserNotebookApp.launch_instance(argv)
File "/anaconda/envs/py35/lib/python3.5/site-packages/jupyter_core/application.py", line 267, in launch_instance
return super(JupyterApp, cls).launch_instance(argv=argv, **kwargs)
File "/anaconda/envs/py35/lib/python3.5/site-packages/traitlets/config/application.py", line 657, in launch_instance
app.initialize(argv)
File "<decorator-gen-7>", line 2, in initialize
File "/anaconda/envs/py35/lib/python3.5/site-packages/traitlets/config/application.py", line 87, in catch_config_error
return method(app, *args, **kwargs)
File "/anaconda/envs/py35/lib/python3.5/site-packages/notebook/notebookapp.py", line 1296, in initialize
self.init_webapp()
File "/anaconda/envs/py35/lib/python3.5/site-packages/jupyterhub/singleuser.py", line 393, in init_webapp
self.init_hub_auth()
File "/anaconda/envs/py35/lib/python3.5/site-packages/jupyterhub/singleuser.py", line 388, in init_hub_auth
if not self.hub_auth.oauth_client_id:
File "/anaconda/envs/py35/lib/python3.5/site-packages/traitlets/traitlets.py", line 556, in __get__
return self.get(obj, cls)
File "/anaconda/envs/py35/lib/python3.5/site-packages/traitlets/traitlets.py", line 535, in get
value = self._validate(obj, dynamic_default())
File "/anaconda/envs/py35/lib/python3.5/site-packages/traitlets/traitlets.py", line 593, in _validate
value = self._cross_validate(obj, value)
File "/anaconda/envs/py35/lib/python3.5/site-packages/traitlets/traitlets.py", line 599, in _cross_validate
value = obj._trait_validators[self.name](obj, proposal)
File "/anaconda/envs/py35/lib/python3.5/site-packages/traitlets/traitlets.py", line 907, in __call__
return self.func(*args, **kwargs)
File "/anaconda/envs/py35/lib/python3.5/site-packages/jupyterhub/services/auth.py", line 439, in _ensure_not_empty
raise ValueError("%s cannot be empty." % proposal.trait.name)
ValueError: oauth_client_id cannot be empty.
I am not sure where I went wrong in this process. Anybody familiar with this issue?
Try running jupyterhub instead of jupyterhub-singleuser
For your specific use case, the command would be as follows:
sudo /anaconda/envs/py35/bin/python /anaconda/envs/py35/bin/jupyterhub --port=50387 --notebook-dir="~/notebooks" --config=/etc/jupyterhub/jupyterhub_config.py
Make sure that jupyterhub is installed (correctly) in the path you mentioned.

Ipython notebook, how to set the correct path to kernel

When Running ipyhton notebook on Windows 7 64bit and launching notebook with python 2 kernel I get an error:
Traceback (most recent call last):
File "C:\Users\USER1\Anaconda2\lib\site-packages\notebook\base\handlers.py", line 436, in wrapper
result = yield gen.maybe_future(method(self, *args, **kwargs))
File "C:\Users\USER1\Anaconda2\lib\site-packages\notebook\services\sessions\handlers.py", line 56, in post
model = sm.create_session(path=path, kernel_name=kernel_name)
File "C:\Users\USER1\Anaconda2\lib\site-packages\notebook\services\sessions\sessionmanager.py", line 66, in create_session
kernel_name=kernel_name)
File "C:\Users\USER1\Anaconda2\lib\site-packages\notebook\services\kernels\kernelmanager.py", line 84, in start_kernel
**kwargs)
File "C:\Users\USER1\Anaconda2\lib\site-packages\jupyter_client\multikernelmanager.py", line 109, in start_kernel
km.start_kernel(**kwargs)
File "C:\Users\USER1\Anaconda2\lib\site-packages\jupyter_client\manager.py", line 244, in start_kernel
**kw)
File "C:\Users\USER1\Anaconda2\lib\site-packages\jupyter_client\manager.py", line 190, in _launch_kernel
return launch_kernel(kernel_cmd, **kw)
File "C:\Users\USER1\Anaconda2\lib\site-packages\jupyter_client\launcher.py", line 115, in launch_kernel
proc = Popen(cmd, **kwargs)
File "C:\Users\USER1\Anaconda2\lib\subprocess.py", line 710, in __init__
errread, errwrite)
File "C:\Users\USER1\Anaconda2\lib\subprocess.py", line 958, in _execute_child
startupinfo)
WindowsError: [Error 2] The system cannot find the file specified
I have investigated further and I have added following print lines before proc = Popen(cmd, **kwargs) inside launcher.py file
print cmd
print kwargs
Now I see that proc = Popen(cmd, **kwargs) is called with cmd=
['C:\\Users\\USER1\\Anaconda2_32bit\\python.exe', '-m', 'ipykernel', '-f', '
C:\\Users\\USER1\\AppData\\Roaming\\jupyter\\runtime\\kernel-a3f46334-4491-4
fef-aeb1-6772b8392954.json']
this is a problem because my python.exe is not in
C:\\Users\\USER1\\Anaconda2_32bit\\python.exe
but in
C:\\Users\\USER1\\Anaconda2\\python.exe
However I have checked paths in Computer/Advanced system settings/Advanced/Enviroment variables and \\Anaconda2_32bit\\ is never specified there.
Thus I suspect that the false path is specified somewhere else. Where could this be and how can I fix it?
Also I have previously had an installation of Anaconda in \\Anaconda2_32bit\\ but I have uninstalled it.
The ipython has kernels registered in special configuration files
I have run the command:
ipython kernelspec list
the output was:
Available kernels:
python2 C:\ProgramData\jupyter\kernels\python2
I have looked into C:\ProgramData\jupyter\kernels\python2\kernel.json file and there was a wrong path set for python2. I have fixed the path and it works now.

Resources