I am trying to use a filter step to filter vertices by id, but unsure of how to do this.
Here is roughly what I am trying to do.
g.V().has(label, 'Users').filter(id().is(eq("Users:77287168:1051")))
g.V().has(label, 'Users').filter(id().is("Users:77287168:1051"))
Both of the above always return 0 records. However
g.V().has(label, 'Users').filter(hasId("Users:77287168:1051"))
This works as intended and I am getting the a User with the given id.
Again the above is only representational and I would just like to know how/what would it take to do an id() comparison inside my filter function. For instance I am expecting my traversal argument to filter to traverse vertices and I would like to compare it against the current traversal's vertex id.
Related
I get the following message when I try to validate the mapping (see Warning attached):
...Joiner jnr_Normal_jnr_Master_ZC_OR_Delay_Reason must have exactly two inputs.
WARNING: Joiner transformation jnr_Normal_jnr_Master_ZC_OR_Delay_Reason Condition field OR_CASE_ID1 is unconnected.
I have a joiner (jnr_Master_ZC_OR_Delay_Reason) and expression (exp_Text) that I would like to join. I tried to do this with a normal joiner (jnr_Normal_jnr_Master_ZC_OR_Delay_Reason). However, the data from the jnr_Master_ZC_OR_Delay_Reason does not connect to this jnr_Normal_jnr_Master_ZC_OR_Delay_Reason. See Joiners-Two Inputs attached.
Should I be using a different transformation to join the joiner and expression?
I tried to use Sorting but I still get the same error message. Am I using the Sorting correctly? Please see the attached images.enter image description here
enter image description here
If you want to join flows that originate from the same source (let's call that a self-join), you need to have the data sorted on both branches of the flow and check the Sorted Input property on the Joiner Transformation (jnr_Normal_jnr_Master_ZC_OR_Delay_Reason in this case).
A self-join is only allowed if both flows are sorted. Depending on your flow, it may be enough to sort data only once, before the flow gets split.
Now, if you enable the Sorted Input property but the data will not be sorted, you will get an error while session execution.
I'm working on a pipeline that takes a PCollection from BigQuery PCollection<TableRow> and filters it based on a cell value.
Is it better to filter it with a ParDo like in this example or should I be using the Class Filter<T>?
Basically I'd like to be able to filter based on personType. For example:
if(personType == 'customer') {
then c.output(outputTableRow);
}
What's the difference, how am I approaching this wrong and what should I try instead?
They are pretty much the same. All of the Filter transforms are implemented using a ParDo with a DoFn much like you mentioned (see Filter.java).
The Filter transform exists to be a convenient short-hand for filtering. If it works, it is probably more concise. The only major difference is that the Filter transform can only filter based on the input element. For example, if you wanted to use a side-input containing a list of elements that should be passed through, then you would need to use a ParDo. If you're just filtering on "does this field equal 'customer'", then the Filter is probably fine.
nowadaya i m learning new traverse api of neo4j and i followed the link below
http://neo4j.com/docs/stable/tutorial-traversal-java-api.html
so now i know how to use uniqueness,evaluater etc.
that is i know how to change beahviours of the api.
but the thing i want to know is that how exactly it traverse.
for example im trying to find neighbours of a node.
does neo4j use index to find this?
does neo4j keep a hash to find neighbours?
more specifically, when i write the following code for example.
TraversalDescription desc = database.traversalDescription().breadthFirst().evaluator( Evaluators.toDepth( 3) );
node =database.getNodeById(4601410);
Traverser traverser = desc.traverse(node);
in my description i used breadthFirst. So it means that when i give node to traverse, the code should find the first neighbours. So how the api finds the first neighbours is the thing i want to know. Is there a pointer to neighbours in node? So when i say traverse until to depth 3 it finds the first neighbours and then take the neighbours as node in a recursive function and so on? So if we say to depth 10 then it can be slow?
so what i want exactly is how i can change the natural behaviour of the api to traverse?
Simplified, Neo4j stores records representing nodes and relationships a.s.o. in its store. Every node is represented by a node record on disk, that record contains a pointer (direct offset into relationship store) for the first relationship (neighbour if you will). Relationship records link to each other, so getting all neighbours for a node will read the node record, its relationship pointer to that relationship record and continue following those forward pointers until the end of that chain. Does that answer your question?
TraversalDescription features a concept of PathExpander - that is the component deciding which relationships will be used for the next step. Use TraversalDescription.expand() for this.
You can either use your own implementation for PathExpander or use one of the predefined methods in PathExpanders.
If you just want your traversal follow specific relationship types you can use TraversalDescription.relationships() to specify those.
I have the following two cypher calls that I'd like to combine into one;
start r=relationship:link("key:\"foo\" and value:\"bar\"") return r.guid
This returns a relationship that contains a guid that I need based on a key value pair (in this case key:foo and value:bar).
Lets assume r.guid above returns 12345.
I then need all the property relationships for the object in question based on the returned guid and a property type key;
start r=relationship:properties("to:\"12345\" and key:\"baz\"") return r
This returns several relationships which have the values I need, in this case all property types baz that belong to guid 12345.
How do I combine these two calls into one? I'm sure its simple but I'm stumbling..
The answer I've gotten is that there is no way to perform an index lookup in the middle of a Cypher query, or to use a variable you have declared to perform the lookup.
Perhaps in later version of Cypher, as this ability should be standard especially with the dense node issue and the suggested solution of indexing.
I have to create a graph with its self-defined node type and the nodes & connections are read from a txt file one by one.
The file format is like this: startNode attibutes endNode.
Every time I read one line, I created 2 node objects: startNode & endNode. and add edge between them..
However, the startNode may exist in several lines..
e.g. V1 ... V2 ; V1 ... V3
Therefore, I have to check whether my graph has contained the node before I add edges..and I should use the vertex in graphs instead of the node newly created..
Does jung have any built-in method to solve this problem?
Or any suggestions?
The short answer is: by contract, JUNG's graph implementations take care of this for you, as long as your custom node/edge objects' implementations of equals() and hashCode() do the Right Thing.
If you try to add a vertex to a graph and it's already present in the graph, the addVertex() method will return false (meaning 'nothing done') just as with the analogous add() method in Set.
Also note that the addEdge() methods will add the connected vertices to the graph for you if they're not already present.
JUNG considers the vertices (and edges) different as long as they are referenced to different objects. If you create two vertex objects with same properties, they will be considered as different vertices and you will be able to insert both of them into the graph. JUNG doesn't have an equals method that you can override (to check the vertex object's properties) to perform a check whether two vertex objects are the same or not. Therefore you need to manually maintain the list of vertices (and edges) in your graph to avoid adding a vertex you already have in your graph. However you can easily do that with a HashMap (if your graph is not too big).