Cypher executed from Stored Procedure much slower than raw cypher - stored-procedures

I have a query and pattern which I used to run with different parameter values to identify and create nodes. I wanted to make writing with the query simpler, so I put the query in a stored procedure, compiled the jar, and began my processing.
While easier to call the stored procedure and pass the parameters, the time to execute was MUCH slower (around 10 times slower), and was getting progressively worse as I loaded more and more data into the graph. When I switched back to using the raw queries (and more copy/paste) my time to execute dropped back down.
It feels as if the database is recompiling and/or replanning every time the query in the stored procedure calls is run.
Is there a way to cache the query from the stored procedure?
From what I can tell, my code is identical from inside, and outside the stored procedure. The stored procedure runs, just very very slow when compared to calling the cypher outside of the procedure.
Below is my raw cypher query
with ['register'] as verbs match (e:Entity {type:'PRODUCT', graphId: $graphId})
USING INDEX e:Entity(graphId)
with e, verbs
match (e)-[:REFERS]->(eWord:Word {graphId:$graphId})<-[:OBJ|OBL|NMOD]-(vb:Word {graphId: $graphId})-[]->(notWord:Word {graphId: $graphId})
USING INDEX vb:Word(graphId)
USING INDEX notWord:Word(graphId)
where vb.lemma in verbs
create (event:Event {graphId: $graphId, type: 'registerFail'})
with event, e, vb, notWord
merge (event)-[:TRIGGER]->(vb)
merge (event)-[:TRIGGER]->(notWord)
merge (event)-[:RELATED_PRODUCT]->(e)
with event
match (event)-[:TRIGGER]->(word:Word {graphId: $graphId})-[:COMPOUND|COMPOUND_PRT]->(compWord:Word {graphId: $graphId})
USING INDEX word:Word(graphId)
USING INDEX compWord:Word(graphId)
merge (event)-[:TRIGGER]->(compWord);
Here is the code to my stored procedure
#Procedure(name = "ie.createInabilityCypher", mode = Mode.WRITE)
public void createInabilityFromProduct(#Name("listOfVerbs") List<String> verbs, #Name("inabilityType") String inabilityType, #Name("graphId") String graphId) {
String cypherQuery = "" +
"with $verbsList as verbs " +
"match (e:Entity {type:'PRODUCT', graphId: $graphId}) " +
"USING INDEX e:Entity(graphId) " +
"with e, verbs " +
"match (e)-[:REFERS]->(eWord:Word {graphId:$graphId})<-[:OBJ|OBL|NMOD]-(vb:Word {graphId: '" + graphId +"'})-[]->(notWord:Word {graphId: $graphId}) " +
"USING INDEX vb:Word(graphId) " +
"USING INDEX notWord:Word(graphId) " +
"where vb.lemma in verbs " +
"create (event:Event {graphId: $graphId, type: $inabilityType}) " +
"with event, e, vb, notWord " +
"merge (event)-[:TRIGGER]->(vb) " +
"merge (event)-[:TRIGGER]->(notWord) " +
"merge (event)-[:RELATED_PRODUCT]->(e) " +
"with event " +
"match (event)-[:TRIGGER]->(word:Word {graphId: $graphId})-[:COMPOUND|COMPOUND_PRT]->(compWord:Word {graphId: $graphId}) " +
"USING INDEX word:Word(graphId) " +
"USING INDEX compWord:Word(graphId) " +
"merge (event)-[:TRIGGER]->(compWord)";
Map<String, Object> params = new HashMap<>();
params.put("graphId", graphId);
params.put("verbsList", verbs);
params.put("inabilityType", inabilityType);
tx.execute(cypherQuery, params);

You missed one place where the $graphId parameter should be used. That is why the Cypher code is being "recompiled" every time.
Try replacing this snippet:
(vb:Word {graphId: '" + graphId +"'})
with this:
(vb:Word {graphId: $graphId})

Related

Neo4j: Sequence of Events as Nodes Not working

I am new to cypher query syntax and tried different types of syntax/relationship to build sequence graph. My data contains group_id and within each group_id a code occurs based on the 'number'. Lowest number is the first sequence and highest number is the last sequence per group id. I am able to load the data from csv and create nodes with properties, however it is not letting me convert to numerical sequence for 'code' nodes. I am reading/referencing this article : this tutorial. Is there special cypher syntax to use to achieve this result?
Sample Data:
group_id,code,date,number
123,abc,2/18/21,4
123,def,11/11/20,3
123,ghi,11/10/20,2
123,jkl,10/1/20,1
456,gtg,11/28/20,5
456,abc,10/30/20,4
456,def,10/5/20,3
456,jkl,10/1/20,2
456,uuu,10/1/20,1
My Code to load data:
LOAD CSV WITH HEADERS FROM "file:///sample2.csv" AS row
WITH row
WHERE row.group_id IS NOT NULL
MERGE (g:group_id {group_id: row.group_id});
LOAD CSV WITH HEADERS FROM "file:///sample2.csv" AS row
WITH row
WHERE row.code IS NOT NULL
MERGE (c:code {code: row.code})
ON CREATE SET c.number = row.number,
c.date = row.date;
Here is what I have tried:
// Building relationship
LOAD CSV WITH HEADERS FROM "file:///sample2.csv" AS row
WITH row
MATCH (g:group_id {group_id: row.group_id})
MATCH (c:code {code: row.code})
MERGE (g)-[:GROUPS]->(c) // Connects ALL codes to group id, but how to connect to 'code' and 'number' sequentially?
MERGE (c:{code: row.number})-[:NEXT]->(c) // Neo.ClientError.Statement.SyntaxError
I have gotten result:
I am trying to get this.
This will be a two step process. First the initial loading of the data as you have outlined. Then an enhancement in which you create the NEXT relationships. We do this in healthcare analytics of patient journeys or trajectories. By analogy, your yellow nodes might be a patient and the blue one an encounter. So each patient has a sequence of encounters.
You can query and sort by the date or other ordering variable. For example, collect a sorted list of encounters:
match (e:encounter) with e order by e.enc_date with e.subjectId as sid,collect(distinct e.enc_date) as eo return sid,size(eo) as ct,eo
I used this in some python code to then iterate through the collection to create the enc_seq edge, equivalent to your NEXT:
> dfeo = Neo4jLib.CypherToPandas("match (e:encounter) with e order by e.enc_date with e.subjectId as sid,collect(distinct e.enc_date) as eo return sid,size(eo) as ct,eo",'ppmi')
csv = dfeo.to_csv(index=False).split('\n')
cts=0
sw = open("c:\\temp\\error.txt","a")
for i in range(1,len(dfeo)):
cc = csv[i].split(',')
for j in range(0,int(cc[1])-1):
try:
q= "match (e1:encounter{subjectId:" + str(dfeo['sid'][i]) + ",enc_date:date('" + str(dfeo['eo'][i][j]) + "')}) match (e2:encounter{subjectId:" + str(dfeo['sid'][i]) + ",enc_date:date('" + str(dfeo['eo'][i][j+1]) + "')}) merge (e1)-[r:enc_seq{subjectId:" + str(dfeo['sid'][i]) + ", seqCt:" + str(j) + "}]-(e2)"
Neo4jLib.CypherNoReturn(q,'ppmi')
except:
cts = cts + 1
sw.write(str(i) + ':' + str(j) + "\n"+ q + "\n")
print("exceptions: " + str(cts))
sw.flush()
sw.close()
You can probably do this within a cypher query using a WITH (each row) followed by a CALL to a function similar to my python code. For my purposes it was more convenient to use python.

Adwords Scripts - access keywords search terms

I'm trying to access the keywords search terms report, and apply query on it by "Add/Exclude" column, cost, etc.
Couldn't find it in the docs, there is any way to get the report?
Thanks
Edit:
There is an existing option to save to search report and scheduled it, so if there is a chance to access the reports sections it would be great either.
Take a look at this solution.
You're able to acquire the data from the search query report through an AWQL query. Example as used in the linked solution:
var report = AdWordsApp.report(
"SELECT Query,Clicks,Cost,Ctr,ConversionRate,CostPerConversion,Conversions,CampaignId,AdGroupId " +
" FROM SEARCH_QUERY_PERFORMANCE_REPORT " +
" WHERE " +
" Conversions > 0" +
" AND Impressions > " + IMPRESSIONS_THRESHOLD +
" AND AverageCpc > " + AVERAGE_CPC_THRESHOLD +
" DURING LAST_7_DAYS");

Hardcoded Query in PetaPoco ORM

We are planning to use PetaPoco in our project. As it is tiny ORM with performance benefit, one of the thing worried me a lot is hard coded queries.
Due to that, whenever our column changes (added/removed), we have issue to find out same in all the queries.
Can we generate Columnname in tt file and use that in place of hard coded query something like:
"Select * from " + Tables.Customer + " Where " + CustomerTable.CustomerId + " = 1"

How to count tag-to-tag relationships without having it explode?

I'm using neo4j, storing a simple "content has-many tags" data structure.
I'd like to find out "what tags co-exist with what other tags the most?"
I've got around 500K content-to-tag relationships, so unfortunately, that works out to 0.5M^2 posible coexist relationships, and then you need to count how many each type of relationship happens! Or do you? Am I doing this the long way?
It never seems to return, and my CPU is pegged out for quite some time now.
final ExecutionResult result = engine.execute(
"START metag=node(*)\n"
+ "MATCH metag<-[:HAS_TAG]-content-[:HAS_TAG]->othertag\n"
+ "WHERE metag.name>othertag.name\n"
+ "RETURN metag.name, othertag.name, count(content)\n"
+ "ORDER BY count(content) DESC");
for (Map<String, Object> row : result) {
System.out.println(row.get("metag.name") + "\t" + row.get("othertag.name") + "\t" + row.get("count(content)"));
}
You should try to decrease your bound points to make the traversal faster. I assume your graph will always have more tags than content so you should make the content your bound points. Something like
start
content = node:node_auto_index(' type:"CONTENT" ')
match
metatag<-[:HAS_CONTENT]-content-[:HAS_CONTENT]->othertag
where
metatag<>othertag
return
metatag.name, othertag.name, count(content)

Why is my hash autosorting itself?

The actual problem I was originally doing was converting a hash of arrays into an array of hashes. If anyone has any comments on doing the conversion then that's fine but the actual question I have is why the order of the hash keys change after editing them.
I'm fully aware of this question but this is not a duplicate. In fact I'm just having specific toruble in the order they are coming out in.
I have one array and one hash.
The array (#headers) contains a list of keys. #contents is a Hash filled with arrays. As explained, my task is to get an Array of hashes. So here's my code, pretty straightforward.
#headers = (params[:headers])
puts "ORIGINAL PARAMS"
puts YAML::dump(#headers)
#contentsArray = [] #The purpose of this is to contain hashes of each object
#contents.each_with_index do |page,contentIndex|
#currentPage = Hash.new
#headers.each_with_index do |key, headerIndex|
#currentPage[key] ="testing"
end
puts YAML::dump(#currentPage)
#contentsArray[contentIndex] =#currentPage
end
puts "UPDATED CONTENTS"
puts YAML::dump(#contentsArray[0])
Heres the bit I cant wrap my head around. The keys of the original params are in a different order than the updated one.
Note the :puts "ORIGINAL PARAMS" & puts "UPDATED CONTENTS" parts. This is their output:
ORIGINAL PARAMS
---
- " Page Title "
- " WWW "
- " Description "
- " Keywords "
- " Internal Links "
- " External Links "
- " Content files "
- " Notes "
---
UPDATED CONTENTS
---
" WWW ": page
" Internal Links ": testing
" External Links ": testing
" Description ": testing
" Notes ": testing
" Content files ": testing
" Page Title ": testing
" Keywords ": testing
Why is this?
for the record. Printing #currentPage after the header loop gives this:
" WWW ": page
" Internal Links ": page
" External Links ": page
" Description ": page
" Notes ": page
" Content files ": page
" Page Title ": page
" Keywords ": page
So it must be the way the values and keys are assigned to the #currentPage and not when It goes into the array.
In Ruby 1.8+ Hashes are UNSORTED lists
The order in which you traverse a hash by either key or value may seem arbitrary, and will generally not be in the insertion order.
While in RUBY 1.9+ they are sorted in order you push items.
Hashes enumerate their values in the order that the corresponding keys were inserted.
http://apidock.com/ruby/v1_9_2_180/Hash
This is because Ruby's Hash type uses an hash table data structure internally, and hash tables do not keep track of the order of their elements.
Ruby1.9's Hash uses a linked hash table, which keeps track of the ordre of their elements.
So in Ruby1.8 hashes are unsorted, and in Ruby1.9 hashes are sorted.

Resources