I think this might be an obvious to the seasoned py2neo users, but I could not get over it since I'm new. I'm trying to follow py2neo online doc: http://book.py2neo.org/en/latest/graphs_nodes_relationships/, but I was able to use the 'Node' methods for the instance returned from
GraphDatabaseService.create, but when I use GraphDatabaseService.node to retrieve the node, all the expected Node methods stop working, I've narrowed it down to an example below using the Node.len method.
Thanks in advance for any helpful insights.
Bruce
My env:
windows 7 professional
pycharm 3.4
py2neo 1.6.4
python2.7.5
Here are the codes:
from py2neo import node, neo4j
db = neo4j.GraphDatabaseService()
db.clear()
a, = db.create(node({'name': ['a']}))
a.add_labels('Label')
b = db.node(a._id)
print db.neo4j_version
print b, type(b)
print "There is %s node in db" % db.order
print len(b)
Here are the outputs:
C:\Python27\python.exe C:/Users/you_zhang/PycharmProjects/py2neo/ex11.py
(2, 0, 3, u'')
(10) <class 'py2neo.neo4j.Node'>
There is 1 node in db
Traceback (most recent call last):
File "C:/Users/you_zhang/PycharmProjects/py2neo/ex11.py", line 11, in <module>
print len(b)
File "C:\Users\you_zhang\AppData\Roaming\Python\Python27\site-packages\py2neo\neo4j.py", line 1339, in __len__
return len(self.get_properties())
File "C:\Users\you_zhang\AppData\Roaming\Python\Python27\site-packages\py2neo\neo4j.py", line 1398, in get_properties
self._properties = assembled(self._properties_resource._get()) or {}
File "C:\Users\you_zhang\AppData\Roaming\Python\Python27\site-packages\py2neo\neo4j.py", line 1349, in _properties_resource
return self._subresource("properties")
File "C:\Users\you_zhang\AppData\Roaming\Python\Python27\site-packages\py2neo\neo4j.py", line 403, in _subresource
uri = URI(self.__metadata__[key])
File "C:\Users\you_zhang\AppData\Roaming\Python\Python27\site-packages\py2neo\neo4j.py", line 338, in __metadata__
self.refresh()
File "C:\Users\you_zhang\AppData\Roaming\Python\Python27\site-packages\py2neo\neo4j.py", line 360, in refresh
self._metadata = ResourceMetadata(self._get().content)
File "C:\Users\you_zhang\AppData\Roaming\Python\Python27\site-packages\py2neo\neo4j.py", line 367, in _get
raise ClientError(e)
py2neo.exceptions.ClientError: Not Found
Your exact code snippet works for me (OS X, neo4j 2.1.2). There shouldn't be any problem. Have you tried to install the latest version of neo4j and run your code on a fresh and untouched database? I have encountered inconsistencies in corrupted databases.
Have you tried to load the node with .find()?
result = db.find('Label')
for n in result:
print(n)
Related
I am loading a large number of Neo4j nodes into the system from a json file. It is failing with this error message "Failed to invoke procedure apoc.merge.node: Caused by: java.lang.NullPointerException" - I am not seeing enough information to figure out what I am doing wrong and as this is the first time i have used this, I just don't see it. This is the last 7 or so errors on the error stack. Looks like the error originates when merge_node is called.
File "F:\ClientSide\current\testload1.py", line 104, in <lambda>
nodes.apply(lambda h: merge_node(h), axis=1)
File "F:\ClientSide\current\testload1.py", line 61, in merge_node
ses.run("UNWIND $batch AS row CALL apoc.merge.node(['ProgNode', row.nodetype], {node:row.node}, apoc.map.removeKeys(properties(row), ['nodetype', 'node'])) YIELD node RETURN 1", batch=BATCH["batch"])
File "C:\Users\Bill Dickenson\AppData\Local\Programs\Python\Python37\lib\site-packages\neo4j\work\simple.py", line 217, in run
self._autoResult._run(query, parameters, self._config.database, self._config.default_access_mode, self._bookmarks, **kwparameters)
File "C:\Users\Bill Dickenson\AppData\Local\Programs\Python\Python37\lib\site-packages\neo4j\work\result.py", line 101, in _run
self._attach()
File "C:\Users\Bill Dickenson\AppData\Local\Programs\Python\Python37\lib\site-packages\neo4j\work\result.py", line 202, in _attach
self._connection.fetch_message()
File "C:\Users\Bill Dickenson\AppData\Local\Programs\Python\Python37\lib\site-packages\neo4j\io\_bolt3.py", line 326, in fetch_message
response.on_failure(summary_metadata or {})
File "C:\Users\Bill Dickenson\AppData\Local\Programs\Python\Python37\lib\site-packages\neo4j\io\_bolt3.py", line 512, in on_failure
raise Neo4jError.hydrate(**metadata)
neo4j.exceptions.ClientError: Failed to invoke procedure `apoc.merge.node`: Caused by: java.lang.NullPointerException
The "batch" data structure contains lists of variables like this one
{'EIEO': True, 'FILECOUNT': 1, 'KDM': 'data:Writes', 'changed': False, 'ctx': '113540257', 'level': 'code', 'location': [55, 8, 55, 94], 'node': 100, 'quvioDensity': 1.0, 'quviolations': 2, 'szAFP': '', 'szaep': 17, 'szlocs': 2, 'text': 'FilemavenWrapperPropertyFile=newFile(baseDirectory,MAVEN_WRAPPER_PROPERTIES_PATH);', 'type': 'localVariableDeclarationStatement'}
and the code that is processing it looks like this, including the print statement that generated the data above.
def merge_node(args):
global INNODE, NODECOUNT
"""
Function to create nodes from a batch.
"""
INNODE += 1
if (INNODE % 10000) == 0:
print("...Sent %s of %s for processing" % (INNODE, NODECOUNT))
if len(BATCH['batch']) == 4:
print(BATCH['batch'][3])
if (len(BATCH['batch']) > 1000) or (INNODE == NODECOUNT):
if INNODE == NODECOUNT:
print("...Final Record (%s) added and transmitted" % INNODE)
BATCH['batch'].append(args.to_dict())
with graphDB_Driver.session() as ses:
ses.run("UNWIND $batch AS row CALL apoc.merge.node(['ProgNode', row.nodetype], {node:row.node}, apoc.map.removeKeys(properties(row), ['nodetype', 'node'])) YIELD node RETURN 1", batch=BATCH["batch"])
reset_batch()
BATCH['batch'].append(args.to_dict())
Oddly enough, this runs locally with this error. When it runs against my remote Neo4j db, it processes fine ( no errors) but does NOT generate anything on the server. So I assume its failing up there but APOC is redirecting the console and just moving on.
Anyone see what I am doing incorrectly ?
The "batch" data structure in your question contains neither of the properties required by your Cypher code:
nodetype
node
You have to make sure that all the elements in the $batch list have at least those 2 properties if you want to use that Cypher code.
I've just started learning my way around Biopython and I'm trying to use ExPASy to retrieve SwissProt records, just like described in page 180 of the Biopython tutorial (http://biopython.org/DIST/docs/tutorial/Tutorial.pdf), but also in a relevant ROSALIND exercise (http://rosalind.info/problems/dbpr/ - click to expand the "Programming shortcut" section).
The code I'm using is basically the same as in the ROSALIND exercise:
from Bio import ExPASy
from Bio import SwissProt
handle = ExPASy.get_sprot_raw('Q5SLP9')
record = SwissProt.read(handle)
However, the SwissProt.read function gives the following error messages (I've trimmed some of the filepaths):
Traceback (most recent call last): File "code.py", line 4, in <module>
record = SwissProt.read(handle) File "lib\site-packages\Bio\SwissProt\__init__.py", line 151, in read
record = _read(handle) File "lib\site-packages\Bio\SwissProt\__init__.py", line 255, in _read
_read_ft(record, line) File "lib\site-packages\Bio\SwissProt\__init__.py", line 594, in _read_ft
assert not from_res and not to_res, line AssertionError: /note="Single-stranded DNA-binding protein"
I found this has been reported in GitHub (https://github.com/biopython/biopython/issues/2417), so I'm not the first one who gets this, but I don't really find any updated version of the package or any way to fix the issue. Maybe it's because I'm very new to using packages. Could someone help me please?
Please update your BioPython to version 1.77. The issue has been fixed with pull request 2484.
I am having some troubles getting my Firebird connection to work, and it all seems related to encodings. I am connecting to the database like this (local_copy is /path/to/database.fdb):
conn = fdb.connect(dsn=local_copy, user='****', password='****', charset="ISO8859_1")
which only works for certain charsets. I need to have the ISO8859_1 charset, which worked before, but not anymore (perhaps because of an update).
Traceback (most recent call last):
File "sync.py", line 10, in <module>
conn = fdb.connect(dsn=local_copy, user='**', password='**', charset="ISO8859_1")
File "/usr/local/lib/python3.6/site-packages/fdb/fbcore.py", line 848, in connect
"Error while connecting to database:")
fdb.fbcore.DatabaseError: ('Error while connecting to database:\n- SQLCODE: -924\n- bad parameters on attach or create database\n- CHARACTER SET ISO8859_1 is not defined', -924, 335544325)
When I use ISO88591, the connect works, but Python is not happy with that:
Traceback (most recent call last):
File "sync.py", line 10, in <module>
conn = fdb.connect(dsn=local_copy, user='***', password='***', charset="ANSI")
File "/usr/local/lib/python3.6/site-packages/fdb/fbcore.py", line 826, in connect
no_reserve, db_key_scope, no_gc, no_db_triggers, no_linger)
File "/usr/local/lib/python3.6/site-packages/fdb/fbcore.py", line 759, in build_dpb
dpb.add_string_parameter(isc_dpb_user_name, user)
File "/usr/local/lib/python3.6/site-packages/fdb/fbcore.py", line 624, in add_string_parameter
value = value.encode(charset_map.get(self.charset, self.charset))
LookupError: unknown encoding: ISO88591
So, I thought perhaps adding an alias ISO88591 to Python would work. I tried to edit the /usr/lib64/python3.6/encodings/aliases.py, but that didn't seem to have any effect.
As a short summary of what was posted on Firebird-support, it looks the fbintl module in Firebird 2.5.8 on CentOS is broken.
As indicated by Philippe Makowski:
Sorry, it is broken, and I don't know how to fix it :
https://bugzilla.redhat.com/show_bug.cgi?id=1636177
but Firebird 3 is ok
https://copr.fedorainfracloud.org/coprs/makowski/firebird/
A possible workaround suggested in https://bugzilla.redhat.com/show_bug.cgi?id=1636177 is to either downgrade to 2.5.7, or to continue using 2.5.8, but replace its fbintl module with the one from 2.5.7.
I am trying to use Dask to perform a groupby operation on a Dataframe.
The code below does not work but it seems that if I initialize the Client from another console the code works, even though I can't see anything on the dashboard ( http://localhost:8787/status ): I mean, there is a dashboard, but all the figures look empty. I am on macOS.
Code:
from datetime import datetime
import numpy as np
import os
from dask import dataframe as dd
from dask.distributed import Client
import pandas as pd
client = Client()
# open http://localhost:8787/status
csv_path = 'chicago-complete.monthly.2018-07-01-to-2018-07-31/data.csv'
dir_destination = 'data'
df = dd.read_csv(csv_path,
dtype = {
'timestamp': str,
'node_id': str,
'subsystem': str,
'sensor': str,
'parameter': str,
'value_raw': str,
'value_hrf': str,
},
parse_dates=['timestamp'],
date_parser=lambda x: pd.datetime.strptime(x, '%Y/%m/%d %H:%M:%S')
)
#%%
if not os.path.exists(dir_destination):
os.makedirs(dir_destination)
def create_node_csv(df_node):
# test function
return len(df_node)
res = df.groupby('node_id').apply(create_node_csv, meta=int)
The csv file is simply composed by columns of string. My goal is to group of all the rows that contains a certain value in a column and than save them as separates file using create_node_csv(df_node) (even though right now is a dummy function). Any other way to do it is appreciated, but I would like to understand what's going on here.
When I run it, the console prints multiple times the following errors:
tornado.application - ERROR - Multiple exceptions in yield list
Traceback (most recent call last):
File "/anaconda3/lib/python3.7/site-packages/tornado/gen.py", line 883, in callback
result_list.append(f.result())
File "/anaconda3/lib/python3.7/site-packages/tornado/gen.py", line 1141, in run
yielded = self.gen.throw(*exc_info)
File "/anaconda3/lib/python3.7/site-packages/distributed/deploy/local.py", line 208, in _start_worker
yield w._start()
File "/anaconda3/lib/python3.7/site-packages/tornado/gen.py", line 1133, in run
value = future.result()
File "/anaconda3/lib/python3.7/site-packages/tornado/gen.py", line 1141, in run
yielded = self.gen.throw(*exc_info)
File "/anaconda3/lib/python3.7/site-packages/distributed/nanny.py", line 157, in _start
response = yield self.instantiate()
File "/anaconda3/lib/python3.7/site-packages/tornado/gen.py", line 1133, in run
value = future.result()
File "/anaconda3/lib/python3.7/site-packages/tornado/gen.py", line 1141, in run
yielded = self.gen.throw(*exc_info)
File "/anaconda3/lib/python3.7/site-packages/distributed/nanny.py", line 226, in instantiate
self.process.start()
File "/anaconda3/lib/python3.7/site-packages/tornado/gen.py", line 1133, in run
value = future.result()
File "/anaconda3/lib/python3.7/site-packages/tornado/gen.py", line 1141, in run
yielded = self.gen.throw(*exc_info)
File "/anaconda3/lib/python3.7/site-packages/distributed/nanny.py", line 370, in start
yield self.process.start()
File "/anaconda3/lib/python3.7/site-packages/tornado/gen.py", line 1133, in run
value = future.result()
File "/anaconda3/lib/python3.7/site-packages/distributed/process.py", line 35, in _call_and_set_future
res = func(*args, **kwargs)
File "/anaconda3/lib/python3.7/site-packages/distributed/process.py", line 184, in _start
process.start()
File "/anaconda3/lib/python3.7/multiprocessing/process.py", line 112, in start
self._popen = self._Popen(self)
File "/anaconda3/lib/python3.7/multiprocessing/context.py", line 291, in _Popen
return Popen(process_obj)
File "/anaconda3/lib/python3.7/multiprocessing/popen_forkserver.py", line 35, in __init__
super().__init__(process_obj)
File "/anaconda3/lib/python3.7/multiprocessing/popen_fork.py", line 20, in __init__
self._launch(process_obj)
File "/anaconda3/lib/python3.7/multiprocessing/popen_forkserver.py", line 42, in _launch
prep_data = spawn.get_preparation_data(process_obj._name)
File "/anaconda3/lib/python3.7/multiprocessing/spawn.py", line 143, in get_preparation_data
_check_not_importing_main()
File "/anaconda3/lib/python3.7/multiprocessing/spawn.py", line 136, in _check_not_importing_main
is not going to be frozen to produce an executable.''')
RuntimeError:
An attempt has been made to start a new process before the
current process has finished its bootstrapping phase.
This probably means that you are not using fork to start your
child processes and you have forgotten to use the proper idiom
in the main module:
if __name__ == '__main__':
freeze_support()
...
The "freeze_support()" line can be omitted if the program
is not going to be frozen to produce an executable.
And:
distributed.nanny - WARNING - Worker process 1844 exited with status 1
distributed.nanny - WARNING - Restarting worker
And:
Traceback (most recent call last):
File "/anaconda3/lib/python3.7/multiprocessing/queues.py", line 242, in _feed
send_bytes(obj)
File "/anaconda3/lib/python3.7/multiprocessing/connection.py", line 200, in send_bytes
self._send_bytes(m[offset:offset + size])
File "/anaconda3/lib/python3.7/multiprocessing/connection.py", line 404, in _send_bytes
self._send(header + buf)
File "/anaconda3/lib/python3.7/multiprocessing/connection.py", line 368, in _send
n = write(self._handle, buf)
BrokenPipeError: [Errno 32] Broken pipe
tornado.application - ERROR - Multiple exceptions in yield list
Traceback (most recent call last):
File "/anaconda3/lib/python3.7/site-packages/tornado/gen.py", line 883, in callback
result_list.append(f.result())
File "/anaconda3/lib/python3.7/site-packages/tornado/gen.py", line 1147, in run
yielded = self.gen.send(value)
File "/anaconda3/lib/python3.7/site-packages/distributed/deploy/local.py", line 217, in _start_worker
raise gen.TimeoutError("Worker failed to start")
tornado.util.TimeoutError: Worker failed to start
tornado.application - ERROR - Multiple exceptions in yield list
Traceback (most recent call last):
File "/anaconda3/lib/python3.7/site-packages/tornado/gen.py", line 883, in callback
result_list.append(f.result())
File "/anaconda3/lib/python3.7/site-packages/tornado/gen.py", line 1147, in run
yielded = self.gen.send(value)
File "/anaconda3/lib/python3.7/site-packages/distributed/deploy/local.py", line 217, in _start_worker
EDIT:
Based on the answer:
- How do I prevent the creation of a new Client if I run the program again?
- How can I do the following?
def create_node_csv(df_node):
return len(df_node)
It returns me the following error, is it related to the meta parameter?
ValueError: cannot reindex from a duplicate axis
When you run the script, Client() is causing new Dask workers to be spawned, which also get copies of variables from the original main process. In some some cases, this involves re-importing the script in each worker, each of which, of course, then tries to create a Client and new set of processes.
The best answer, as in general with anything running in processes, is to use functions, and protect the main execution. The following would be a way to do this, without changing your one-script structure:
from datetime import datetime
import numpy as np
import os
from dask import dataframe as dd
from dask.distributed import Client
import pandas as pd
csv_path = 'chicago-complete.monthly.2018-07-01-to-2018-07-31/data.csv'
dir_destination = 'data'
def run():
client = Client()
df = dd.read_csv(csv_path, ...)
if not os.path.exists(dir_destination):
os.makedirs(dir_destination)
def create_node_csv(df_node):
# test function
return len(df_node)
res = df.groupby('node_id').apply(create_node_csv, meta=int)
print(res.compute())
if __name__ == "__main__":
run()
How do I prevent the creation of a new Client if I run the program again?
In the call to Client() you can include the address of an existing cluster, if you know what that would be. Also, some specific types of deployments (are there are a few) may have a concept of the "current cluster".
While running the PyBBIO examples phant_test.py and analog_test.py I received the following error (I believe 'could' is a typo meant to be 'could not'):
Traceback (most recent call last):
File "analog_test.py", line 47, in <module>
run(setup, loop)
File "/usr/lib/python2.7/site-packages/PyBBIO-0.9-py2.7-linux-armv7l.egg/bbio/bbio.py", line 63, in run
loop()
File "analog_test.py", line 37, in loop
val1 = analogRead(pot1)
File "/usr/lib/python2.7/site-packages/PyBBIO-0.9-py2.7-linux-armv7l.egg/bbio/platform/beaglebone/bone_3_8/adc.py", line 46, in analogRead
raise Exception('*Could load overlay for adc_pin: %s' % adc_pin)
Exception: *Could load overlay for adc_pin: ['/sys/devices/ocp.2/PyBBIO-AIN0.*/AIN0', 'PyBBIO-AIN0', 'P9.39']
I have tried restarting the BeagleBone (rev A6 running Angstrom with a 3.8 kernel, with no capes connected) to clear the /sys/devices/bone_capemgr.7/slots file, but that did not work. It seems PyBBIO is accessing the slots file and adding overlays because the slots file looks like this after the example program runs:
0: 54:PF---
1: 55:PF---
2: 56:PF---
3: 57:PF---
4: ff:P-O-L Override Board Name,00A0,Override Manuf,PyBBIO-ADC
5: ff:P-O-L Override Board Name,00A0,Override Manuf,PyBBIO-AIN0
Since there were some changes being made to the slots file I checked what files the analog_read(adc_pin) function in the adc.py file of PyBBIO was retrieving. With some print statements I figured out the root problem was that the /sys/devices/ocp.2/PyBBIO-AIN0.*/AIN0 file, which apparently stores the analog read values, does not exist. The glob.glob function returns a null array, and ls /sys/devices/ocp.2/PyBBIO-AIN0.10/ shows modalias power subsystem uevent as the only contents. Is there something wrong in the overlay file? Or could there be another program or problem that is preventing the BeagleBone from writing the AIN0 file that PyBBIO is trying to read? The python code seems to be logically correct, but the overlay is working incorrectly or being blocked in some way.