ValueError: file descriptor out of range in select() in py2neo - neo4j

After one of the recent updates of py2neo we are seeing a lot of randomly appearing errors saying
ValueError: file descriptor out of range in select() we are using py2neo to connect to remote neo4j instance.
client_identifier = request.args.get('tribes_client_id')
graph_obj = generic_helpers.get_graph_object(client_identifier) # <- returns py2neo instance
graph_transaction = graph_obj.begin() # <- this is the line causing the exception
Below is stack trace of the exception being raised
Traceback (most recent call last): File "/env/lib/python3.7/site-packages/flask/app.py", line 2447, in wsgi_app response = self.full_dispatch_request() File "/env/lib/python3.7/site-packages/flask/app.py", line 1952, in full_dispatch_request rv = self.handle_user_exception(e) File "/env/lib/python3.7/site-packages/flask/app.py", line 1821, in handle_user_exception reraise(exc_type, exc_value, tb) File "/env/lib/python3.7/site-packages/flask/_compat.py", line 39, in reraise raise value File "/env/lib/python3.7/site-packages/flask/app.py", line 1950, in full_dispatch_request rv = self.dispatch_request() File "/env/lib/python3.7/site-packages/flask/app.py", line 1936, in dispatch_request return self.view_functionsrule.endpoint File "/srv/neo4j_maintenance_routes.py", line 116, in get_words_list_for_lookup graph_transaction = graph_obj.begin() File "/env/lib/python3.7/site-packages/py2neo/database.py", line 353, in begin return Transaction(self, autocommit) File "/env/lib/python3.7/site-packages/py2neo/database.py", line 781, in __init__ self.transaction = self.connector.begin() File "/env/lib/python3.7/site-packages/py2neo/internal/connectors.py", line 297, in begin tx = self.pool.acquire() File "/env/lib/python3.7/site-packages/neobolt/direct.py", line 715, in acquire return self.acquire_direct(self.address) File "/env/lib/python3.7/site-packages/neobolt/direct.py", line 608, in acquire_direct connection = self.connector(address, error_handler=self.connection_error_handler) File "/env/lib/python3.7/site-packages/py2neo/internal/connectors.py", line 227, in connector encrypted=cx_data["secure"], **kwargs) File "/env/lib/python3.7/site-packages/neobolt/direct.py", line 972, in connect raise last_error File "/env/lib/python3.7/site-packages/neobolt/direct.py", line 964, in connect connection = _handshake(s, address, der_encoded_server_certificate, **config) File "/env/lib/python3.7/site-packages/neobolt/direct.py", line 898, in _handshake ready_to_read, _, _ = select((s,), (), (), 1) ValueError: filedescriptor out of range in select()
py2neo -> version 4.3.0
python -> version 3.7.3
neo4j -> version (Enterprise 3.5.3)
Thing to notice is we aren't getting these errors every time we are creating a new transaction but on random times and it automatically gets fixed after sometime.

I also come across this issue occasionally. The different thing is, I am implementing neo4j-python-driver (1.7.6).
File "/home/my/anaconda3/envs/python37/lib/python3.7/site-packages/neo4j/__init__.py", line 444, in run
self._connect()
File "/home/my/anaconda3/envs/python37/lib/python3.7/site-packages/neo4j/__init__.py", line 383, in _connect
self._connection = self._acquirer(access_mode)
File "/home/my/anaconda3/envs/python37/lib/python3.7/site-packages/neobolt/direct.py", line 715, in acquire
return self.acquire_direct(self.address)
File "/home/my/anaconda3/envs/python37/lib/python3.7/site-packages/neobolt/direct.py", line 608, in acquire_direct
connection = self.connector(address, error_handler=self.connection_error_handler)
File "/home/my/anaconda3/envs/python37/lib/python3.7/site-packages/neo4j/__init__.py", line 232, in connector
return connect(address, **dict(config, **kwargs))
File "/home/my/anaconda3/envs/python37/lib/python3.7/site-packages/neobolt/direct.py", line 972, in connect
raise last_error
File "/home/my/anaconda3/envs/python37/lib/python3.7/site-packages/neobolt/direct.py", line 964, in connect
connection = _handshake(s, address, der_encoded_server_certificate, **config)
File "/home/my/anaconda3/envs/python37/lib/python3.7/site-packages/neobolt/direct.py", line 898, in _handshake
ready_to_read, _, _ = select((s,), (), (), 1)
ValueError: filedescriptor out of range in select()
After I rerun the same code, nothing happened and the program worked well. I think this issue is probably caused by multiple 'login in' action of the same user. When I still in the process of accessing the neo4j database in the browser and use that account to fetch data with neo4j-python-driver at the same time, this error could happened. However, sometimes it seems that it just happened in that way regardless of the fact that I did nothing but run the code.
Extra Info:
Neo4j Graph Database: 3.5 community version

Related

Apache Beam stateful ParDo Work token invalid

I have a stateful DoFn that basically batches the elements that are coming and when the buffer reaches a certain size, the buffer is cleared and the elements are inserted into BigQuery. What I've noticed is that from time to time, the pipeline is raising an exception, the exception is not stopping the job to run. Below is the stacktrace:
Error message from worker: generic::unknown: Traceback (most recent call last):
File "apache_beam/runners/common.py", line 1213, in apache_beam.runners.common.DoFnRunner.process
File "apache_beam/runners/common.py", line 742, in apache_beam.runners.common.PerWindowInvoker.invoke_process
File "apache_beam/runners/common.py", line 867, in apache_beam.runners.common.PerWindowInvoker._invoke_process_per_window
File "/usr/local/lib/python3.7/site-packages/gp/pipelines/common/writer_transforms.py", line 140, in process
self._flush_buffer(buffer_state, count_state, buffer_size_state)
File "/usr/local/lib/python3.7/site-packages/gp/pipelines/common/writer_transforms.py", line 162, in _flush_buffer
rows = self._extract_rows(buffer_state)
File "/usr/local/lib/python3.7/site-packages/gp/pipelines/common/writer_transforms.py", line 197, in _extract_rows
for row in buffer.read():
File "/usr/local/lib/python3.7/site-packages/apache_beam/runners/worker/bundle_processor.py", line 510, in __iter__
for elem in self.first:
File "/usr/local/lib/python3.7/site-packages/apache_beam/runners/worker/sdk_worker.py", line 1039, in _lazy_iterator
self._underlying.get_raw(state_key, continuation_token))
File "/usr/local/lib/python3.7/site-packages/apache_beam/runners/worker/sdk_worker.py", line 846, in get_raw
continuation_token=continuation_token)))
File "/usr/local/lib/python3.7/site-packages/apache_beam/runners/worker/sdk_worker.py", line 886, in _blocking_request
raise RuntimeError(response.error)
RuntimeError: INTERNAL: Work token invalid
This is raised when the process method is called and it tries to extract the elements from the buffer, see rows = self._extract_rows(buffer_state)
The DoFn is implemented exactly like in the example https://beam.apache.org/blog/timely-processing/#example-batched-rpc
I've confirmed this error is expected during work reassignments, e.g. when autoscaling. The work item will be retried on the new machine and the pipeline will continue processing correctly. (I agree the error message could be improved.)

Dask not starting workers

I am trying to use Dask to perform a groupby operation on a Dataframe.
The code below does not work but it seems that if I initialize the Client from another console the code works, even though I can't see anything on the dashboard ( http://localhost:8787/status ): I mean, there is a dashboard, but all the figures look empty. I am on macOS.
Code:
from datetime import datetime
import numpy as np
import os
from dask import dataframe as dd
from dask.distributed import Client
import pandas as pd
client = Client()
# open http://localhost:8787/status
csv_path = 'chicago-complete.monthly.2018-07-01-to-2018-07-31/data.csv'
dir_destination = 'data'
df = dd.read_csv(csv_path,
dtype = {
'timestamp': str,
'node_id': str,
'subsystem': str,
'sensor': str,
'parameter': str,
'value_raw': str,
'value_hrf': str,
},
parse_dates=['timestamp'],
date_parser=lambda x: pd.datetime.strptime(x, '%Y/%m/%d %H:%M:%S')
)
#%%
if not os.path.exists(dir_destination):
os.makedirs(dir_destination)
def create_node_csv(df_node):
# test function
return len(df_node)
res = df.groupby('node_id').apply(create_node_csv, meta=int)
The csv file is simply composed by columns of string. My goal is to group of all the rows that contains a certain value in a column and than save them as separates file using create_node_csv(df_node) (even though right now is a dummy function). Any other way to do it is appreciated, but I would like to understand what's going on here.
When I run it, the console prints multiple times the following errors:
tornado.application - ERROR - Multiple exceptions in yield list
Traceback (most recent call last):
File "/anaconda3/lib/python3.7/site-packages/tornado/gen.py", line 883, in callback
result_list.append(f.result())
File "/anaconda3/lib/python3.7/site-packages/tornado/gen.py", line 1141, in run
yielded = self.gen.throw(*exc_info)
File "/anaconda3/lib/python3.7/site-packages/distributed/deploy/local.py", line 208, in _start_worker
yield w._start()
File "/anaconda3/lib/python3.7/site-packages/tornado/gen.py", line 1133, in run
value = future.result()
File "/anaconda3/lib/python3.7/site-packages/tornado/gen.py", line 1141, in run
yielded = self.gen.throw(*exc_info)
File "/anaconda3/lib/python3.7/site-packages/distributed/nanny.py", line 157, in _start
response = yield self.instantiate()
File "/anaconda3/lib/python3.7/site-packages/tornado/gen.py", line 1133, in run
value = future.result()
File "/anaconda3/lib/python3.7/site-packages/tornado/gen.py", line 1141, in run
yielded = self.gen.throw(*exc_info)
File "/anaconda3/lib/python3.7/site-packages/distributed/nanny.py", line 226, in instantiate
self.process.start()
File "/anaconda3/lib/python3.7/site-packages/tornado/gen.py", line 1133, in run
value = future.result()
File "/anaconda3/lib/python3.7/site-packages/tornado/gen.py", line 1141, in run
yielded = self.gen.throw(*exc_info)
File "/anaconda3/lib/python3.7/site-packages/distributed/nanny.py", line 370, in start
yield self.process.start()
File "/anaconda3/lib/python3.7/site-packages/tornado/gen.py", line 1133, in run
value = future.result()
File "/anaconda3/lib/python3.7/site-packages/distributed/process.py", line 35, in _call_and_set_future
res = func(*args, **kwargs)
File "/anaconda3/lib/python3.7/site-packages/distributed/process.py", line 184, in _start
process.start()
File "/anaconda3/lib/python3.7/multiprocessing/process.py", line 112, in start
self._popen = self._Popen(self)
File "/anaconda3/lib/python3.7/multiprocessing/context.py", line 291, in _Popen
return Popen(process_obj)
File "/anaconda3/lib/python3.7/multiprocessing/popen_forkserver.py", line 35, in __init__
super().__init__(process_obj)
File "/anaconda3/lib/python3.7/multiprocessing/popen_fork.py", line 20, in __init__
self._launch(process_obj)
File "/anaconda3/lib/python3.7/multiprocessing/popen_forkserver.py", line 42, in _launch
prep_data = spawn.get_preparation_data(process_obj._name)
File "/anaconda3/lib/python3.7/multiprocessing/spawn.py", line 143, in get_preparation_data
_check_not_importing_main()
File "/anaconda3/lib/python3.7/multiprocessing/spawn.py", line 136, in _check_not_importing_main
is not going to be frozen to produce an executable.''')
RuntimeError:
An attempt has been made to start a new process before the
current process has finished its bootstrapping phase.
This probably means that you are not using fork to start your
child processes and you have forgotten to use the proper idiom
in the main module:
if __name__ == '__main__':
freeze_support()
...
The "freeze_support()" line can be omitted if the program
is not going to be frozen to produce an executable.
And:
distributed.nanny - WARNING - Worker process 1844 exited with status 1
distributed.nanny - WARNING - Restarting worker
And:
Traceback (most recent call last):
File "/anaconda3/lib/python3.7/multiprocessing/queues.py", line 242, in _feed
send_bytes(obj)
File "/anaconda3/lib/python3.7/multiprocessing/connection.py", line 200, in send_bytes
self._send_bytes(m[offset:offset + size])
File "/anaconda3/lib/python3.7/multiprocessing/connection.py", line 404, in _send_bytes
self._send(header + buf)
File "/anaconda3/lib/python3.7/multiprocessing/connection.py", line 368, in _send
n = write(self._handle, buf)
BrokenPipeError: [Errno 32] Broken pipe
tornado.application - ERROR - Multiple exceptions in yield list
Traceback (most recent call last):
File "/anaconda3/lib/python3.7/site-packages/tornado/gen.py", line 883, in callback
result_list.append(f.result())
File "/anaconda3/lib/python3.7/site-packages/tornado/gen.py", line 1147, in run
yielded = self.gen.send(value)
File "/anaconda3/lib/python3.7/site-packages/distributed/deploy/local.py", line 217, in _start_worker
raise gen.TimeoutError("Worker failed to start")
tornado.util.TimeoutError: Worker failed to start
tornado.application - ERROR - Multiple exceptions in yield list
Traceback (most recent call last):
File "/anaconda3/lib/python3.7/site-packages/tornado/gen.py", line 883, in callback
result_list.append(f.result())
File "/anaconda3/lib/python3.7/site-packages/tornado/gen.py", line 1147, in run
yielded = self.gen.send(value)
File "/anaconda3/lib/python3.7/site-packages/distributed/deploy/local.py", line 217, in _start_worker
EDIT:
Based on the answer:
- How do I prevent the creation of a new Client if I run the program again?
- How can I do the following?
def create_node_csv(df_node):
return len(df_node)
It returns me the following error, is it related to the meta parameter?
ValueError: cannot reindex from a duplicate axis
When you run the script, Client() is causing new Dask workers to be spawned, which also get copies of variables from the original main process. In some some cases, this involves re-importing the script in each worker, each of which, of course, then tries to create a Client and new set of processes.
The best answer, as in general with anything running in processes, is to use functions, and protect the main execution. The following would be a way to do this, without changing your one-script structure:
from datetime import datetime
import numpy as np
import os
from dask import dataframe as dd
from dask.distributed import Client
import pandas as pd
csv_path = 'chicago-complete.monthly.2018-07-01-to-2018-07-31/data.csv'
dir_destination = 'data'
def run():
client = Client()
df = dd.read_csv(csv_path, ...)
if not os.path.exists(dir_destination):
os.makedirs(dir_destination)
def create_node_csv(df_node):
# test function
return len(df_node)
res = df.groupby('node_id').apply(create_node_csv, meta=int)
print(res.compute())
if __name__ == "__main__":
run()
How do I prevent the creation of a new Client if I run the program again?
In the call to Client() you can include the address of an existing cluster, if you know what that would be. Also, some specific types of deployments (are there are a few) may have a concept of the "current cluster".

Unable to Replace a Dask Series Partition

I'm trying to replace a Series dask partition with my own partition.
I've used the code snippet given by #MRocklin in this post.
list_of_delayed = dask_df.to_delayed()
new_partition = dask.delayed(pd.read_csv)(filename)
list_of_delayed[i] = new_partition
new_dask_df = dd.from_delayed(list_of_delayed, meta=dask_df._meta)
I've done exactly the same except dask_df is a series in my case. I'm getting the following error:
Traceback (most recent call last):
File "sdfr_dhruvkmr.py", line 465, in <module>
pts = task[(task.task_date <= dtm.Time.iloc[i]) & (task.T_Date == dtm.Date.iloc[i])]
File "/usr/lib/python2.7/site-packages/edask/dataframe.py", line 130, in __getitem__
new_dask_df = dd.from_delayed(list_of_delayed)
File "/usr/lib/python2.7/site-packages/edask/edask/dask/dataframe/io/io.py", line 493, in from_delayed
type(df).__name__)
TypeError: Expected Delayed object, got Delayed

Error accessing BigQuery using DirectRunner

I'm trying to access a BigQuery table from my pipeline, everything works ok when using DataflowRunner but I get the following error when using DirectRunner.
WARNING:root:Dataset does not exist so we will create it
WARNING:root:Task failed: Traceback (most recent call last):
File "local/lib/python2.7/site-packages/apache_beam/runners/direct/executor.py", line 300, in __call__
result = evaluator.finish_bundle()
File "local/lib/python2.7/site-packages/apache_beam/runners/direct/transform_evaluator.py", line 209, in finish_bundle
bundles = _read_values_to_bundles(reader)
File "local/lib/python2.7/site-packages/apache_beam/runners/direct/transform_evaluator.py", line 196, in _read_values_to_bundles
read_result = [GlobalWindows.windowed_value(e) for e in reader]
File "local/lib/python2.7/site-packages/apache_beam/io/gcp/bigquery.py", line 606, in __iter__
yield self.client.convert_row_to_dict(row, schema)
File "local/lib/python2.7/site-packages/apache_beam/io/gcp/bigquery.py", line 1073, in convert_row_to_dict
for x in value]
TypeError: 'NoneType' object is not iterable
This is code snippet that initializes the runner:
options = {
'project':
args.project_id,
}
pipeline_options = beam.pipeline.PipelineOptions(flags=[], **options)
UPDATE:
Here is a snippet of the relevant part where the pipeline is constructed.
reader = (
pipeline
| 'ReadFromBigQuery %s' % query_path >> beam.io.Read(
beam.io.BigQuerySource(query=query, use_standard_sql=True)))
readers.append(reader)
readers | 'FlattenPaths %s' % ':'.join(path_names) >> beam.Flatten()

Running queue in background in Tensorflow causes strange exceptions

I am implementing such graph in Tensorflow: there is a queue Q, to which a background thread is enqueueing tensors. In the main thread, I sequentially dequeue elements from Q.
My code can be simplified as following:
import time
import threading
import tensorflow as tf
sess = tf.InteractiveSession()
coord = tf.train.Coordinator()
q = tf.FIFOQueue(32, dtypes=tf.int32)
def loop(g):
with g.as_default():
enqueue_op = q.enqueue(1, name="example_enqueue")
for i in range(20):
if coord.should_stop():
return
try:
sess.run(enqueue_op)
except tf.errors.CancelledError:
print("enqueue canncelled")
threads = [
threading.Thread(target=loop, args=(tf.get_default_graph(),))
]
sess.run(tf.initialize_all_variables())
for t in threads: t.start()
# If I sleep 1 seconds, it will be fine!
# time.sleep(1)
print(sess.run(q.dequeue()))
coord.request_stop()
coord.join(threads)
sess.close()
I commented, if I sleep 1 second before running dequeue operation, things will be fine. However, if run immediately, following exception will be raised:
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/tensorflow/python/client/session.py", line 715, in _do_call
return fn(*args)
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/tensorflow/python/client/session.py", line 697, in _run_fn
status, run_metadata)
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/contextlib.py", line 66, in __exit__
next(self.gen)
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/tensorflow/python/framework/errors.py", line 450, in raise_exception_on_not_ok_status
pywrap_tensorflow.TF_GetCode(status))
tensorflow.python.framework.errors.NotFoundError: FetchOutputs node fifo_queue_Dequeue:0: not found
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/runpy.py", line 170, in _run_module_as_main
"__main__", mod_spec)
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/Users/hanxu/Downloads/BrainSeg/playgrounds/7.py", line 32, in <module>
print(sess.run(q.dequeue()))
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/tensorflow/python/client/session.py", line 372, in run
run_metadata_ptr)
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/tensorflow/python/client/session.py", line 636, in _run
feed_dict_string, options, run_metadata)
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/tensorflow/python/client/session.py", line 708, in _do_run
target_list, options, run_metadata)
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/tensorflow/python/client/session.py", line 728, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors.NotFoundError: FetchOutputs node fifo_queue_Dequeue:0: not found
HanXus-MacBook-Pro:BrainSeg hanxu$ python3 -m playgrounds.7
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/tensorflow/python/client/session.py", line 715, in _do_call
return fn(*args)
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/tensorflow/python/client/session.py", line 697, in _run_fn
status, run_metadata)
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/contextlib.py", line 66, in __exit__
next(self.gen)
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/tensorflow/python/framework/errors.py", line 450, in raise_exception_on_not_ok_status
pywrap_tensorflow.TF_GetCode(status))
tensorflow.python.framework.errors.NotFoundError: FetchOutputs node fifo_queue_Dequeue:0: not found
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/runpy.py", line 170, in _run_module_as_main
"__main__", mod_spec)
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/Users/hanxu/Downloads/BrainSeg/playgrounds/7.py", line 34, in <module>
print(sess.run(q.dequeue()))
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/tensorflow/python/client/session.py", line 372, in run
run_metadata_ptr)
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/tensorflow/python/client/session.py", line 636, in _run
feed_dict_string, options, run_metadata)
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/tensorflow/python/client/session.py", line 708, in _do_run
target_list, options, run_metadata)
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/tensorflow/python/client/session.py", line 728, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors.NotFoundError: FetchOutputs node fifo_queue_Dequeue:0: not found
Could anyone help? Thanks very much!!
Update
I am using Tensorflow 9.0rc0.
My real situation is a little more complicated. The enqueued tensor is in fact different at each time, say
def loop(g):
with g.as_default():
for i in range(20):
if coord.should_stop():
return
# Look here!
enqueue_op = q.enqueue(i, name="example_enqueue")
try:
sess.run(enqueue_op)
except tf.errors.CancelledError:
print("enqueue canncelled")
So it is not trivial to move the enqueue operation to main thread:( and I don't know how. Please help:)
This was an issue with old (pre-0.9) versions of TensorFlow, which was fixed in version 0.9. The issue is that adding nodes to the graph (i.e. in your calls to q.dequeue() and q.enqueue()) was not thread-safe when other threads (i.e. your loop() thread) were using the graph.
There are two issues you'd need to fix to avoid the race condition (in pre-0.9 versions):
Don't call q.enqueue() in the loop() thread. Instead create it in the main thread. For example:
q = tf.FIFOQueue(32, dtypes=tf.int32)
enqueue_op = q.enqueue(1, name="example_enqueue")
def loop(g):
for i in range(20):
if coord.should_stop():
return
try:
sess.run(enqueue_op)
except tf.errors.CancelledError:
print("enqueue canncelled")
Move the call to q.dequeue() (which adds a node to the graph) before where you start the loop() thread:
dequeued_t = q.dequeue()
for t in threads: t.start()
print(sess.run(deqeueued_t))

Resources