In an AST, should IF nodes have a huge amount of branches? - parsing

Here's a picture of what I'm describing:
My question is, in an abstract syntax tree, should an IF node have a huge amount of branches? Imagine an IF node which is connected directly to hundreds of statements. It feels wrong and would look absolutely ridiculous in visual form. Is this the correct formation of an AST or am I getting it wrong?

Try it this way. This is also what you get when you use something like CodeDOM.
Practically TrueBody and FalseBody will each have one single child - StatementGroup. That in turn will have as many children as there are statements in that block. Similarly, if there are multiple and/or conditions, then Condition node will have one single node like say AndCondition with as many children as there are clauses.

Related

When should inferred relationships and nodes be used over explicit ones?

I was looking up how to utilise temporary relationships in Neo4j when I came across this question: Cypher temp relationship
and the comment underneath it made me wonder when they should be used and since no one argued against him, I thought I would bring it up here.
I come from a mainly SQL background and my main reason for using virtual relationships was to eliminate duplicated data and do traversals to get properties of something instead.
For a more specific example, let's say we have a robust cake recipe, which has sugar as an ingredient. The sugar is what makes the cake sweet.
Now imagine a use case where I don't like sweet cakes so I want to get all the ingredients of the recipe that make the cake sweet and possibly remove them or find alternatives.
Then there's another use case where I just want foods that are sweet. I could work backwards from the sweet ingredients to get to the food or just store that a cake is sweet in general, which saves time from traversal and makes a query easier. However, as I mentioned before, this duplicates known data that can be inferred.
Sorry if the example is too strange, I suck at making them. I hope the main question comes across, though.
My feeling is that the only valid scenario for creating redundant "shortcut" relationships is this:
Your use case has a stringent time constraint (e.g., average query time must be less than 200ms), but your neo4j query -- despite optimization -- exceeds that constraint, and you have verified that adding "shortcut" relationships will indeed make the response time acceptable.
You should be aware that adding redundant "shortcut" relationships comes with its own costs:
Queries that modify the DB would need to be more complex (to modify the redundant relationships) and also slower.
You'd always have to add the redundant relationships -- even if actually you never need some (most?) of them.
If you want to make concurrent updates to the DB, the chances that you may lose some updates and introduce inconsistencies into the DB would increase -- meaning that you'd have to work even harder to avoid inconsistencies.
NOTE: For visualization purposes, you can use virtual nodes and relationships, which are temporary and not actually stored in the DB.

How should I represent tasks and sub-tasks in neo4j?

I'm trying to decide on a good data model for representing tasks and sub-tasks. It's a two part problem:
First, I want to be able to get a string of tasks (task1)-[:NEXT]->(task2)-[:NEXT]->(task3) etc. And I want to be able to gather them starting with the first one and display them in order. The cypher is simple enough ... something like
p = match(first:Task)-[:NEXT*]->(others:Task)
return o.name, o.instructions
order by length(p) // or something like this, probably with a union to get both the first task and other tasks in the same output
However, I'd also like to let a sub-task have children. For instance, I might have a set of tasks that constitute "How to make coffee", but then when I'm creating a set of tasks that constitute "How to make breakfast", I'd like to point to the "How to make coffee" set of tasks and re-use them.
It would be nice to get cypher to return a staggered list (e.g. 1, 1.1, 1.1.1, 2, etc.), but I'd actually be equally happy with just 1, 2, 3 ... n.
I've been looking and haven't seen a clear solution anywhere. Here's a picture of what I'm imagining. Any directions, thoughts, or references much appreciated.
I'm going to reuse my answer to another question, but it really is the best solution to this problem. The problem you have is that your scheme starts to fall apart once you have multiple distinct but intersecting paths. So you end up trying to corrupt your data to try and resolve the conflicts generated.
1) For each chain, create a node to represent that chain.
2) Create a relation from that node to each node in the chain, and add an index property on the relationship.
3) Run Cyphers on your "chain" nodes instead.
To expand on the above for your case...
In this instance, the chain node represents a list of tasks that have to be done in order; And can itself also be a task. Separating the list from the task is more work, but it would allow you to define multiple ways to complete a certain task. For example, I can make coffee by starting my coffee machine, asking Google to make it (which will start the machine), or go to Starbucks. This should be flexible enough to support anything you need to represent, without instances trampling on each-other. Part of the key here is relationships can have properties too! Don't be afraid to use that to make them distinct. (You could just add a 'group-id' to each task chain to make them distinct, but that will have scaling issues)

Compound relationship in neo4j

I'm playing around with neo4j - seeing what I can and can't do with it before suggesting it for something serious. One thing that I'm trying to work out is if you can have what I'm calling a compound relationship.
In my playing, I'm doing a family tree - it seems an ideal fit. I'm wanting to express that a life event occurred between two people - getting married for example - and where it happened. The MARRIED_TO relationship between two PERSON nodes is easy. I'm struggling with the relationship to the PLACE node though.
In my head, it seems that what I really want is a relationship that goes from the PLACE node to the MARRIED_TO relationship, and I don't think that's possible.
Alternatively, I could see the MARRIED_TO relationship going between three nodes, but that not only doesn't feel right but also isn't possible.
The best i can see to do is either have a EVENT node representing the marriage, which feels clunky, or else have relationships from both PERSON nodes to the PLACE, which is then duplication of data.
Is there a proper way of managing this kind of data? Or am I just missing something?
Consider "Marriage" being a vital part of your domain. Anything being an entity deserves a separate node - so "Marriage" (or Event) becomes a node. That node then can be connected to the two people and the location.

Query templating in cypher? How to avoid repeating myself

My group has many queries that tend to refer to a class of relationship types. So we tend to write a lot of repetitive queries that look like this:
match (n:Provenance)-[r:`input to`|triggered|contributed|generated]->(m:Provenance)
where (...etc...)
return n, r, m
The question has do to with the repetition of the set of different relationship types. Really we're looking for any relationship in a set of relationship types. Is there a way to enumerate a bunch of relationship types into a set ("foo relationships") and then use that as a variable to avoid repeating myself over and over in many queries? This repetitive querying of relationship types tends to create problems when we might add a new relationship type; now many queries distributed through the code base need to all be updated.
Enumerating all possible relationships isn't such a big deal in an individual query, but it starts to get difficult to manage and update when distributed across dozens (or hundreds) of queries. What's the recommended solution pattern here? Query templating?
This is not currently possible as a built-in feature, but it seems like an interesting feature. I would encourage you to post this to the ideas trello board here:
https://trello.com/b/2zFtvDnV/public-idea-board
Perhaps suggesting something like allowing parameters for relationship types:
MATCH (n)-[r:{types}]->(p)
Of course, that makes it much harder for the query engine to optimize queries ahead of time.. A relationship type hierarchy could work, but we are incredibly hesitant to introduce new abstractions to the model unless absolutely necessary. Still, suggestions for improvements are very welcome!
For now, yes, something like you suggest with templates would solve it. Ideally, you'd send the query to neo containing all the relationship types you are interested in, and with other items parameterized, to allow optimal planning. So to do that, you'd do some string replacement on your side to inject the long list of reltypes into the query before sending it off.

Create Unique Relationship is taking much amount of time

START names = node(*),
target=node:node_auto_index(target_name="TARGET_1")
MATCH names
WHERE NOT names-[:contains]->()
AND HAS (names.age)
AND (names.qualification =~ ".*(?i)B.TECH.*$"
OR names.qualification =~ ".*(?i)B.E.*$")
CREATE UNIQUE (names)-[r:contains{type:"declared"}]->(target)
RETURN names.name,names,names.qualification
Iam consisting of nearly 1,80,000 names nodes, i had iterated the above process to create unique relationships above 100 times by changing the target. its taking too much amount of time.How can i resolve it..
i build the query with java and iterated.iam using neo4j 2.0.0.5 and java 1.7 .
I edited your cypher query because I think I understand it, but I can barely read the rest of your question. If you edit it with white spaces and punctuation it might be easier to understand what you are trying to do. Until then, here are some thoughts about your query being slow.
You bind all the nodes in the graph, that's typically pretty slow.
You bind all the nodes in the graph twice. First you bind universally in your start clause: names=node(*), and then you bind universally in your match clause: MATCH names, and only then you limit your pattern. I don't quite know what the Cypher engine makes of this (possibly it gets a migraine and goes off to make a pot of coffee). It's unnecessary, you can at least drop the names=node(*) from your start clause. Or drop the match clause, I suppose that could work too, since you don't really do anything there, and you will still need a start clause for as long as you use legacy indexing.
You are using Neo4j 2.x, but you use legacy indexing instead of labels, at least in this query. Without knowing your data and model it's hard to know what the difference would be for performance, but it would certainly make it much easier to write (and read) your queries. So, that's a different kind of slow. It's likely that if you had labels and label indices, the query performance would improve.
So, first try removing one of the universal bindings of nodes, then use the 2.x schema tools to structure your data. You should be able to write queries like
MATCH target:Target
WHERE target.target_name="TARGET_1"
WITH target
MATCH names:Name
WHERE NOT names-[:contains]->()
AND HAS (names.age)
AND (names.qualification =~ ".*(?i)B.TECH.*$"
OR names.qualification =~ ".*(?i)B.E.*$")
CREATE UNIQUE (names)-[r:contains{type:"declared"}]->(target)
RETURN names.name,names,names.qualification
I have no idea if such a query would be fast on your data, however. If you put the "Name" label on all your nodes, then MATCH names:Name will still bind all nodes in the database, so it'll probably still be slow.
P.S. The relationships you create have a TYPE called contains, and you give them a property called type with value declared. Maybe you have a good reason, but that's potentially very confusing.
Edit:
Reading through your question and my answer again I no longer think that I understand even your cypher query. (Why are you returning both the bound nodes and properties of those nodes?) Please consider posting sample data on console.neo4j.org and explain in more detail what your model looks like and what you are trying to do. Let me know if my answer meets your question at all or I'll consider removing it.

Resources