I have a standard cluster setup on kubernetes using the dask docker images but not using the dask helm charts. I tried running an existing script on the cluster but doesn’t seem to run. It keeps throwing errors.
The cluster details: 1 notebook, 1 scheduler, 1 worker & 1 shared vol.
I read up on some of the threads on KilledWorkers so I looked into the logs but couldn't figure it out.
distributed.worker - ERROR - None Traceback (most recent call last): File "/opt/conda/lib/python3.7/site-packages/distributed/worker.py", line 814, in handle_scheduler comm, every_cycle=[self.ensure_communicating, self.ensure_computing] File "/opt/conda/lib/python3.7/site-packages/tornado/gen.py", line 735, in run value = future.result() File "/opt/conda/lib/python3.7/site-packages/tornado/gen.py", line 748, in run yielded = self.gen.send(value) File "/opt/conda/lib/python3.7/site-packages/distributed/core.py", line 457, in handle_stream msgs = yield comm.read() File "/opt/conda/lib/python3.7/site-packages/tornado/gen.py", line 735, in run value = future.result() File "/opt/conda/lib/python3.7/site-packages/tornado/gen.py", line 742, in run yielded = self.gen.throw(*exc_info) # type: ignore File "/opt/conda/lib/python3.7/site-packages/distributed/comm/tcp.py", line 218, in read frames, deserialize=self.deserialize, deserializers=deserializers File "/opt/conda/lib/python3.7/site-packages/tornado/gen.py", line 735, in run value = future.result() File "/opt/conda/lib/python3.7/site-packages/tornado/gen.py", line 209, in wrapper yielded = next(result) File "/opt/conda/lib/python3.7/site-packages/distributed/comm/utils.py", line 85, in from_frames res = _from_frames() File "/opt/conda/lib/python3.7/site-packages/distributed/comm/utils.py", line 71, in _from_frames frames, deserialize=deserialize, deserializers=deserializers File "/opt/conda/lib/python3.7/site-packages/distributed/protocol/core.py", line 126, in loads value = _deserialize(head, fs, deserializers=deserializers) File "/opt/conda/lib/python3.7/site-packages/distributed/protocol/serialize.py", line 189, in deserialize dumps, loads, wants_context = families[name] KeyError: None
I got same problem and found solution.
In Dask 2.3 distributed serialization changed a bit. Your client is probably higher than 2.3.0 and scheduler and workers aren't. Try to upgrade your cluster so either everything is higher than 2.3.0 or lower than that.
Related
I have multiple text files in an S3 bucket which I read and process. So, I defined PartitionedDataSet in Kedro datacatalog which looks like this:
raw_data:
type: PartitionedDataSet
path: s3://reads/raw
dataset: pandas.CSVDataSet
load_args:
sep: "\t"
comment: "#"
In addition, I implemented this solution to get all secrets from credentials file via environment variables including AWS secret keys.
When I run things locally using kedro run everything works just fine, but when I build Docker image (using kedro-docker) and run pipeline in Docker environement with kedro docker run and by providing all enviornement variables using --docker-args option I get the following error:
Traceback (most recent call last):
File "/usr/local/bin/kedro", line 8, in <module>
sys.exit(main())
File "/usr/local/lib/python3.7/site-packages/kedro/framework/cli/cli.py", line 724, in main
cli_collection()
File "/usr/local/lib/python3.7/site-packages/click/core.py", line 829, in __call__
return self.main(*args, **kwargs)
File "/usr/local/lib/python3.7/site-packages/click/core.py", line 782, in main
rv = self.invoke(ctx)
File "/usr/local/lib/python3.7/site-packages/click/core.py", line 1259, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/usr/local/lib/python3.7/site-packages/click/core.py", line 1066, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/usr/local/lib/python3.7/site-packages/click/core.py", line 610, in invoke
return callback(*args, **kwargs)
File "/home/kedro/kedro_cli.py", line 230, in run
pipeline_name=pipeline,
File "/usr/local/lib/python3.7/site-packages/kedro/framework/context/context.py", line 767, in run
raise exc
File "/usr/local/lib/python3.7/site-packages/kedro/framework/context/context.py", line 759, in run
run_result = runner.run(filtered_pipeline, catalog, run_id)
File "/usr/local/lib/python3.7/site-packages/kedro/runner/runner.py", line 101, in run
self._run(pipeline, catalog, run_id)
File "/usr/local/lib/python3.7/site-packages/kedro/runner/sequential_runner.py", line 90, in _run
run_node(node, catalog, self._is_async, run_id)
File "/usr/local/lib/python3.7/site-packages/kedro/runner/runner.py", line 213, in run_node
node = _run_node_sequential(node, catalog, run_id)
File "/usr/local/lib/python3.7/site-packages/kedro/runner/runner.py", line 221, in _run_node_sequential
inputs = {name: catalog.load(name) for name in node.inputs}
File "/usr/local/lib/python3.7/site-packages/kedro/runner/runner.py", line 221, in <dictcomp>
inputs = {name: catalog.load(name) for name in node.inputs}
File "/usr/local/lib/python3.7/site-packages/kedro/io/data_catalog.py", line 392, in load
result = func()
File "/usr/local/lib/python3.7/site-packages/kedro/io/core.py", line 213, in load
return self._load()
File "/usr/local/lib/python3.7/site-packages/kedro/io/partitioned_data_set.py", line 240, in _load
raise DataSetError("No partitions found in `{}`".format(self._path))
kedro.io.core.DataSetError: No partitions found in `s3://reads/raw`
Note: Pipeline works just fine in Docker environment, if I move files to some local directory, define PartitionedDataSet and build Docker image and provide environment variables through --docker-args
The solution (at least in my case) was to provide AWS_DEFAULT_REGION env variable in the kedro docker run command.
I am having issues running the following script for a longer period of time:
I use ampy to execute the script on the ESP:
sudo ampy --port /dev/ttyUSB0 run photoresistor.py
photoresistor.py:
#!/usr/bin/env python3
import machine
import network
from time import sleep
from urllib.urequest import urlopen
import json
wifiap = network.WLAN(network.AP_IF)
wifiap.active(False)
routercon = network.WLAN(network.STA_IF)
routercon.active(True)
routercon.ifconfig(('10.0.0.128','255.255.255.0','10.0.0.138','10.0.0.138'))
routercon.connect('mywifi', '123')
while not routercon.isconnected():
pass
posturl=('http://10.0.0.156:23102/rest/v2/send')
adc = machine.ADC(0)
gc.enable()
while True:
value = adc.read()
if value < 200:
message = {'username': 'test', 'message': value, 'chatid': 'test', 'password': 'test', 'notifyself': 'false'}
r = urlopen(posturl, data=json.dumps(message).encode())
r.close()
gc.collect()
sleep(1)
It works as expected in the beginning but after some time I get the following stacktrace:
Traceback (most recent call last):
File "/usr/local/bin/ampy", line 11, in <module>
sys.exit(cli())
File "/usr/local/lib/python3.6/dist-packages/click/core.py", line 764, in __call__
return self.main(*args, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/click/core.py", line 717, in main
rv = self.invoke(ctx)
File "/usr/local/lib/python3.6/dist-packages/click/core.py", line 1137, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/usr/local/lib/python3.6/dist-packages/click/core.py", line 956, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/usr/local/lib/python3.6/dist-packages/click/core.py", line 555, in invoke
return callback(*args, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/ampy/cli.py", line 337, in run
output = board_files.run(local_file, not no_output)
File "/usr/local/lib/python3.6/dist-packages/ampy/files.py", line 303, in run
out = self._pyboard.execfile(filename)
File "/usr/local/lib/python3.6/dist-packages/ampy/pyboard.py", line 273, in execfile
return self.exec_(pyfile)
File "/usr/local/lib/python3.6/dist-packages/ampy/pyboard.py", line 267, in exec_
raise PyboardError('exception', ret, ret_err)
ampy.pyboard.PyboardError: ('exception', b'', b'Traceback (most recent call last):\r\n File "<stdin>", line 28, in <module>\r\n File "urequests.py", line 152, in post\r\n File "urequests.py", line 89, in request\r\nOSError: [Errno 103] ECONNABORTED\r\n')
No idea what to do.
I tried to play around with the garbage collection but it didn't help.
I suspect that the board doesn't clean up sockets properly.
If the board is sending post requests quickly in the loop (every second for 1 minute) and let it sit afterwards for a short period of time it fails quickly with above ECONNABORTED.
If the board sends post requests more slowly (like 2 in a minute) it takes way longer for it to fail. To conclude: I suspect the OS does not properly clean up resources and still has active connections after r.close() or I am overseeing something in the code.
I am not sure what else I can do to make sure these sockets are closed.
EDIT:
I found out it fails on connect (https://github.com/micropython/micropython-lib/blob/master/urllib.urequest/urllib/urequest.py):
line 28:
s.connect(ai[-1])
however routercon.isconnected() returns true:
>>> routercon.isconnected()
True
>>>
How can it be that altough there is a active connection I am unable to send an http post request?
EDIT2:
When this happens sometimes I also can't post to another endpoint e.g. the test server with the same webservice
>>> r = urlopen(posturl, data=json.dumps(message).encode())
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "urllib/urequest.py", line 28, in urlopen
OSError: [Errno 103] ECONNABORTED
>>> r = urlopen("http://10.0.0.8:23102/rest/v2/send", data=json.dumps(message).encode())
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "urllib/urequest.py", line 28, in urlopen
OSError: [Errno 103] ECONNABORTED
>>>
Interestingly a http get to google works:
>>> r = urlopen("http://www.google.com")
>>>
If I let it sit idle for some time http post start to work again.
Could it be that the OS is performing a cleanup in the background?
I faced the same problem. Restarting your api endpoint device will solve the issue.
I've started dask-scheduler on windows
Now I attempt to run dask-worker <ip>:<port> in ec2 instance.
I've been thrown at the following error:
distributed.nanny - INFO - Start Nanny at: 'tcp://10.34.33.12:36525'
distributed.diskutils - INFO - Found stale lock file and directory '/dask-worker-space/worker-v_5Vmm', purging
distributed.nanny - ERROR - Failed to start worker
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/distributed/nanny.py", line 541, in run
yield worker._start(*worker_start_args)
File "/usr/lib64/python2.7/site-packages/tornado/gen.py", line 1099, in run
value = future.result()
File "/usr/lib64/python2.7/site-packages/tornado/concurrent.py", line 260, in result
raise_exc_info(self._exc_info)
File "/usr/lib64/python2.7/site-packages/tornado/gen.py", line 315, in wrapper
yielded = next(result)
File "/usr/lib/python2.7/site-packages/distributed/worker.py", line 425, in _start
self.start_services(listen_host)
File "/usr/lib/python2.7/site-packages/distributed/worker.py", line 368, in start_services
self.services[k] = v(self, io_loop=self.loop, **kwargs)
File "/usr/lib/python2.7/site-packages/distributed/bokeh/worker.py", line 634, in __init__
main = Application(FunctionHandler(partial(main_doc, worker, extra)))
File "/usr/lib/python2.7/site-packages/bokeh/application/handlers/function.py", line 11, in __init__
_check_callback(func, ('doc',))
File "/usr/lib/python2.7/site-packages/bokeh/util/callback_manager.py", line 12, in _check_callback
sig = signature(callback)
File "/usr/lib/python2.7/site-packages/bokeh/util/future.py", line 85, in signature
for name in func.keywords.keys():
AttributeError: 'NoneType' object has no attribute 'keys'
distributed.nanny - INFO - Closing Nanny at 'tcp://10.34.33.12:36525'
distributed.dask_worker - INFO - End worker
Can you tell me what is happening?
Is it even possible for dask to connect to a machine for making a cluster with different os?
I am using Azure to run python notebook using Jupyterhub. After spinning up the VM, I was able to access the notebooks just by using my username and password (just like ssh). However, one day later when I switched to another network (I am not claiming that the network might have been a problem) I am unable to access the link. It gives me The site can't be reached error.
So I tried rerunning the process again, and since then I have been struggling to make it run again. I have searched for similar issues on GitHub, but they aren't helpful either.
After the kill the process using kill pid command, I tried running the jupyterhub through this command:
/anaconda/envs/py35/bin/python /anaconda/envs/py35/bin/jupyterhub-singleuser --port=50387 --notebook-dir="~/notebooks" --config=/etc/jupyterhub/jupyterhub_config.py
And it gives me the error:
JUPYTERHUB_API_TOKEN env is required to run jupyterhub-singleuser. Did you launch it manually?
So I searched through github issues similar to this. I tried generating token manually using:
jupyterhub token username
And I added that token to JUPYTERHUB_API_TOKEN via export JUPYTERHUB_API_TOKEN=token. I also added token:username to c.Authenticator.tokens in jupyterhub_config.py. Now I get this error:
Traceback (most recent call last):
File "/anaconda/envs/py35/lib/python3.5/site-packages/traitlets/traitlets.py", line 528, in get
value = obj._trait_values[self.name]
KeyError: 'oauth_client_id'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/anaconda/envs/py35/bin/jupyterhub-singleuser", line 6, in <module>
main()
File "/anaconda/envs/py35/lib/python3.5/site-packages/jupyterhub/singleuser.py", line 455, in main
return SingleUserNotebookApp.launch_instance(argv)
File "/anaconda/envs/py35/lib/python3.5/site-packages/jupyter_core/application.py", line 267, in launch_instance
return super(JupyterApp, cls).launch_instance(argv=argv, **kwargs)
File "/anaconda/envs/py35/lib/python3.5/site-packages/traitlets/config/application.py", line 657, in launch_instance
app.initialize(argv)
File "<decorator-gen-7>", line 2, in initialize
File "/anaconda/envs/py35/lib/python3.5/site-packages/traitlets/config/application.py", line 87, in catch_config_error
return method(app, *args, **kwargs)
File "/anaconda/envs/py35/lib/python3.5/site-packages/notebook/notebookapp.py", line 1296, in initialize
self.init_webapp()
File "/anaconda/envs/py35/lib/python3.5/site-packages/jupyterhub/singleuser.py", line 393, in init_webapp
self.init_hub_auth()
File "/anaconda/envs/py35/lib/python3.5/site-packages/jupyterhub/singleuser.py", line 388, in init_hub_auth
if not self.hub_auth.oauth_client_id:
File "/anaconda/envs/py35/lib/python3.5/site-packages/traitlets/traitlets.py", line 556, in __get__
return self.get(obj, cls)
File "/anaconda/envs/py35/lib/python3.5/site-packages/traitlets/traitlets.py", line 535, in get
value = self._validate(obj, dynamic_default())
File "/anaconda/envs/py35/lib/python3.5/site-packages/traitlets/traitlets.py", line 593, in _validate
value = self._cross_validate(obj, value)
File "/anaconda/envs/py35/lib/python3.5/site-packages/traitlets/traitlets.py", line 599, in _cross_validate
value = obj._trait_validators[self.name](obj, proposal)
File "/anaconda/envs/py35/lib/python3.5/site-packages/traitlets/traitlets.py", line 907, in __call__
return self.func(*args, **kwargs)
File "/anaconda/envs/py35/lib/python3.5/site-packages/jupyterhub/services/auth.py", line 439, in _ensure_not_empty
raise ValueError("%s cannot be empty." % proposal.trait.name)
ValueError: oauth_client_id cannot be empty.
I am not sure where I went wrong in this process. Anybody familiar with this issue?
Try running jupyterhub instead of jupyterhub-singleuser
For your specific use case, the command would be as follows:
sudo /anaconda/envs/py35/bin/python /anaconda/envs/py35/bin/jupyterhub --port=50387 --notebook-dir="~/notebooks" --config=/etc/jupyterhub/jupyterhub_config.py
Make sure that jupyterhub is installed (correctly) in the path you mentioned.
When Running ipyhton notebook on Windows 7 64bit and launching notebook with python 2 kernel I get an error:
Traceback (most recent call last):
File "C:\Users\USER1\Anaconda2\lib\site-packages\notebook\base\handlers.py", line 436, in wrapper
result = yield gen.maybe_future(method(self, *args, **kwargs))
File "C:\Users\USER1\Anaconda2\lib\site-packages\notebook\services\sessions\handlers.py", line 56, in post
model = sm.create_session(path=path, kernel_name=kernel_name)
File "C:\Users\USER1\Anaconda2\lib\site-packages\notebook\services\sessions\sessionmanager.py", line 66, in create_session
kernel_name=kernel_name)
File "C:\Users\USER1\Anaconda2\lib\site-packages\notebook\services\kernels\kernelmanager.py", line 84, in start_kernel
**kwargs)
File "C:\Users\USER1\Anaconda2\lib\site-packages\jupyter_client\multikernelmanager.py", line 109, in start_kernel
km.start_kernel(**kwargs)
File "C:\Users\USER1\Anaconda2\lib\site-packages\jupyter_client\manager.py", line 244, in start_kernel
**kw)
File "C:\Users\USER1\Anaconda2\lib\site-packages\jupyter_client\manager.py", line 190, in _launch_kernel
return launch_kernel(kernel_cmd, **kw)
File "C:\Users\USER1\Anaconda2\lib\site-packages\jupyter_client\launcher.py", line 115, in launch_kernel
proc = Popen(cmd, **kwargs)
File "C:\Users\USER1\Anaconda2\lib\subprocess.py", line 710, in __init__
errread, errwrite)
File "C:\Users\USER1\Anaconda2\lib\subprocess.py", line 958, in _execute_child
startupinfo)
WindowsError: [Error 2] The system cannot find the file specified
I have investigated further and I have added following print lines before proc = Popen(cmd, **kwargs) inside launcher.py file
print cmd
print kwargs
Now I see that proc = Popen(cmd, **kwargs) is called with cmd=
['C:\\Users\\USER1\\Anaconda2_32bit\\python.exe', '-m', 'ipykernel', '-f', '
C:\\Users\\USER1\\AppData\\Roaming\\jupyter\\runtime\\kernel-a3f46334-4491-4
fef-aeb1-6772b8392954.json']
this is a problem because my python.exe is not in
C:\\Users\\USER1\\Anaconda2_32bit\\python.exe
but in
C:\\Users\\USER1\\Anaconda2\\python.exe
However I have checked paths in Computer/Advanced system settings/Advanced/Enviroment variables and \\Anaconda2_32bit\\ is never specified there.
Thus I suspect that the false path is specified somewhere else. Where could this be and how can I fix it?
Also I have previously had an installation of Anaconda in \\Anaconda2_32bit\\ but I have uninstalled it.
The ipython has kernels registered in special configuration files
I have run the command:
ipython kernelspec list
the output was:
Available kernels:
python2 C:\ProgramData\jupyter\kernels\python2
I have looked into C:\ProgramData\jupyter\kernels\python2\kernel.json file and there was a wrong path set for python2. I have fixed the path and it works now.