Logical node deletion in Neo4j embedded - neo4j

I have the following graph in an embedded Neo4j instance:
I want to find all the people who are not greeted by anyone else. That's simple enough: MATCH (n) WHERE NOT ()-[:GREETS]->(n) RETURN n.
However, whenever I find non-greeted people, I want to remove those node from the db and repeat the query, as long as it matches one or more nodes. In other words, starting from the graph in the picture, I want to:
Run the query, which returns "Alice"
Remove "Alice" from the db
Run the query, which returns "Bob"
Remove "Bob" from the db
Run the query, which returns no matches
Return the names "Alice" and "Bob"
Moreover, I want to execute this algorithm without actually removing any nodes from the database - i.e., a sort of "logical deletion".
One solution I have found is to not call success() on the transaction, so that node deletions are not committed to the db, as in the following code:
import org.neo4j.graphdb.*;
import org.neo4j.graphdb.factory.GraphDatabaseFactory;
import java.io.File;
import java.util.*;
public class App
{
static String dbPath = "~/neo4j/data/databases/graph.db";
private enum RelTypes implements RelationshipType { GREETS }
public static void main(String[] args) {
File graphDirectory = new File(dbPath);
GraphDatabaseService graph = new GraphDatabaseFactory().newEmbeddedDatabase(graphDirectory);
Set<String> notGreeted = new HashSet<>();
try (Transaction tx = graph.beginTx()) {
while (true) {
Node notGreetedNode = getFirstNode(graph, "MATCH (n) WHERE NOT ()-[:GREETS]->(n) RETURN n");
if (notGreetedNode == null) {
break;
}
notGreeted.add((String) notGreetedNode.getProperty("name"));
detachDeleteNode(graph, notGreetedNode);
}
// Here I do NOT call tx.success()
}
System.out.println("Non greeted people: " + String.join(", ", notGreeted));
graph.shutdown();
}
private static Node getFirstNode(GraphDatabaseService graph, String cypherQuery) {
try (Result r = graph.execute(cypherQuery)) {
if (!r.hasNext()) {
return null;
}
Collection<Object> nodes = r.next().values();
if (nodes.size() == 0) {
return null;
}
return (Node) nodes.iterator().next();
}
}
private static boolean detachDeleteNode(GraphDatabaseService graph, Node node) {
final String query = String.format("MATCH (n) WHERE ID(n) = %s DETACH DELETE n", node.getId());
try (Result r = graph.execute(query)) {
return true;
}
}
}
This code works correctly and prints "Non greeted people: Bob, Alice".
My question is: does this approach (i.e. keeping a series of db operations within an open transaction) have any drawbacks that I should be aware of (e.g. potential memory issues)? Are there other approaches I could use to accomplish this?
I have also considered using a boolean property on the nodes to mark them as either deleted or not deleted. My concern is that the actual application I'm working on contains thousands of nodes and various kinds of relationships, and the actual queries are non-trivial, so I'd rather not change them to accommodate a soft-deletion boolean property (but I am open to doing that, if that turns out to be the best approach).
Also, please note that I am not simply looking for nodes that are not in cycles. Rather, the underlying idea is as follows. I have some nodes that satisfy a certain condition c; I want to (logically) remote those nodes; this will potentially make new nodes satisfy the same condition c, and so on, until the set of nodes that satisfy c is empty.

Related

Having trouble getting Entity's "child" loaded via #Query in Sprint Neo4j

I have an entity like this:
#NodeEntity
public class Move {
#Id #GeneratedValue private Long id;
#Property("placement")
private String placement;
#Relationship(type="PARENT", direction=Relationship.INCOMING)
BoardPosition parent;
#Relationship("CHILD")
BoardPosition child;
[...]
When I "load" it like this:
Move the_move = m_store.findByPlacement("C1");
log.info("Move direct load: " + the_move.toString());
it works fine. Both the parent and child properties have the correct value.
However, when I access it via a query defined like this:
public interface MoveStore extends Neo4jRepository<Move, Long> {
public Move findByPlacement(String placement);
#Query("MATCH (n:Move) RETURN n")
public List<Move> findAll();
and accessed like this:
ListIterator<Move> moves = m_store.findAll().listIterator();
while(moves.hasNext() ) {
log.info(moves.next().toString());
}
it is missing the child value (is null).
This experiment is strange:
while(moves.hasNext() ) {
Move m = moves.next();
log.info(m.toString()); // This is missing m.child's value
// find the same Move, by passing in the id of this one:
m = m_store.findById(m.id).orElse(null);
log.info(m.toString()); // This has the correct value for m.child!
}
What am I missing? How can I make the query load the child property?
When you are using the custom query, you have also to return the relationship and related nodes to get the child populated.
e.g. #Query("MATCH (n:Move)-[rel:CHILD]-(c:BoardPosition) RETURN n, rel, c") would do the job for a 1:1 relationship otherwise a collect(...) is needed to get the list in the same result "row" as the node you are querying for.

How to dynamically change entity types in neo4j-ogm or spring-data-neo4j?

There was a question on "how to add labels dynamically to nodes in Neo4j". Is there a way to dynamically change the entity types?
Take an example:
#NodeEntity
public class User {
#Properties(prefix = "custom")
private Map userProperties;
}
I see from https://neo4j.com/blog/spring-data-neo4j-5-0-release/ that I can create dynamic properties. Can I have dynamic types during run-time as well? I want to change "User" type to "Consumer"/"Admin"/"Producer" dynamically when needed. The entity types are non-exhaustive.
Thanks in advance! :)
There is an #Labels annotation on a Set<String> that is stored/managed in addition to the main type from the class and interfaces.
see: https://docs.spring.io/spring-data/neo4j/docs/current/reference/html/#reference:annotating-entities:node-entity:runtime-managed-labels
The #Labels mechanism is great and for many use cases the best solution I'd say.
If you want to have another class back out of your repository, than there's indeed much more work needed.
I'm doing this in a music related project. I have Artist (not abstract and totally usable for anything where I don't know whether it's a band or not) and Band and SoloArtist extending from Artist, with additional labels:
#NodeEntity
public class Artist {}
#NodeEntity
public class Band extends Artist{}
What I do know in a custom repository extension is this:
interface ArtistRepository<T extends Artist> extends Repository<T, Long>, ArtistRepositoryExt {
Optional<T> findOneByName(String name);
// Specifying the relationships is necessary here because the generic queries won't recognize that
// Band has a relationship to country that _should_ be loaded with default depth of 1.
#Query("MATCH (n:Artist) WITH n MATCH p=(n)-[*0..1]-(m) RETURN p ORDER BY n.name")
List<T> findAllOrderedByName();
#Query("MATCH (n:Artist) WHERE id(n) = $id WITH n MATCH p=(n)-[*0..1]-(m) RETURN p")
Optional<T> findById(#Param("id") Long id);
<S extends T> S save(S artist);
}
interface ArtistRepositoryExt {
Band markAsBand(Artist artist);
SoloArtist markAsSoloArtist(Artist artist);
}
class ArtistRepositoryExtImpl implements ArtistRepositoryExt {
private static final String CYPHER_MARK_AS_BAND = String.format(
"MATCH (n) WHERE id(n) = $id\n" +
"OPTIONAL MATCH (n) - [f:BORN_IN] -> (:Country)\n" +
"REMOVE n:%s SET n:%s\n" +
"DELETE f",
SoloArtist.class.getSimpleName(),
Band.class.getSimpleName());
private static final String CYPHER_MARK_AS_SOLO_ARTIST = String.format(
"MATCH (n) WHERE id(n) = $id\n" +
"OPTIONAL MATCH (n) - [f:FOUNDED_IN] -> (:Country)\n" +
"REMOVE n:%s SET n:%s\n" +
"DELETE f",
Band.class.getSimpleName(),
SoloArtist.class.getSimpleName());
private final Session session;
public ArtistRepositoryExtImpl(Session session) {
this.session = session;
}
#Override
public Band markAsBand(Artist artist) {
session.query(CYPHER_MARK_AS_BAND, Map.of("id", artist.getId()));
// Needs to clear the mapping context at this point because this shared session
// will know the node only as class Artist in this transaction otherwise.
session.clear();
return session.load(Band.class, artist.getId());
}
#Override
public SoloArtist markAsSoloArtist(Artist artist) {
session.query(CYPHER_MARK_AS_SOLO_ARTIST, Map.of("id", artist.getId()));
// See above
session.clear();
return session.load(SoloArtist.class, artist.getId());
}
}
While this works neat, I'll get the idea of effort in a deeper nested class scenario. Also, you have do redeclare derived finder methods as I deed if you want to use an repository in a polymorphic way.
I keep dedicated repositories, too.
If this question was still relevant to you, let me know if works for you, too. You'll find the whole project here:
https://github.com/michael-simons/bootiful-music

Attaching a subscriber/listener within EPL module to a statement with context partitions

I have the following EPL module which successfully deploys:
module context;
import events.*;
import configDemo.*;
import annotations.*;
import main.*;
import subscribers.*;
import listeners.*;
#Name('schemaCreator')
create schema InitEvent(firstStock String, secondStock String, bias double);
#Name('createSchemaEvent')
create schema TickEvent as TickEvent;
#Name('contextCreator')
create context TwoStocksContext
initiated by InitEvent as initEvent;
#Name('compareStocks')
#Description('Compare the difference between two different stocks and make a decision')
#Subscriber('subscribers.MySubscriber')
context TwoStocksContext
select * from TickEvent
match_recognize (
measures A.currentPrice as a_currentPrice, B.currentPrice as b_currentPrice,
A.stockCode as a_stockCode, B.stockCode as b_stockCode
pattern (A C* B)
define
A as A.stockCode = context.initEvent.firstStock,
B as A.currentPrice - B.currentPrice >= context.initEvent.bias and
B.stockCode = context.initEvent.secondStock
);
I have a problem with the listeners/subscribers. According to my checks and debugging, the classes don't have any problems, the annotations work, they are attached to the statement upon deployment, and yet neither of them receive any updates from the events.
This is my subscriber, I simply want to print that it has been received:
package subscribers;
import java.util.Map;
public class MySubscriber {
public void update(Map row) {
System.out.println("got it");
}
}
I previously had the same module without any context partitions and then the subscribers worked without a problem. After I added the context, it stopped.
So far I have tried:
Checking if the statement has any subscriber/listener attached (it does)
Checking their names
Remove the annotations and set them manually within Java code after deployment (same thing - they attach, I can retrieve their name but still don't receive updates)
Debugging the subscriber class. The program either doesn't go there at all to stop at a break point or I get an error (missing line number attribute error - ("can't place a break point there" which I tried to fix to no avail)
Any idea what could cause this or what is the best way to set a subscriber to a statement which has context partitions?
This is a continuation of a previous problem which was solved here - Creating instances of Esper's epl
EDIT: Events being sent in the format I use them and in the EPL online tool format:
I first get the pair to be followed from the user:
System.out.println("First stock:");
String first = scanner.nextLine();
System.out.println("Second stock:");
String second = scanner.nextLine();
System.out.println("Difference:");
double diff= scanner.nextDouble();
InitEvent init = new InitEvent(first, second, diff);
After that I have an engine thread the continuously sends events, but before it starts InitEvents is sent as such:
#Override
public void run() {
runtime.sendEvent(initEvent);
while (contSimulation) {
TickEvent tick1 = new TickEvent(Math.random() * 100, "YAH");
runtime.sendEvent(tick1);
TickEvent tick2 = new TickEvent(Math.random() * 100, "GOO");
runtime.sendEvent(tick2);
TickEvent tick3 = new TickEvent(Math.random() * 100, "IBM");
runtime.sendEvent(tick3);
TickEvent tick4 = new TickEvent(Math.random() * 100, "MIC");
runtime.sendEvent(tick4);
try {
TimeUnit.SECONDS.sleep(1);
} catch (InterruptedException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
latch.countDown();
}
}
I haven't used the online tool before but I think I got it working. This is the module text:
module context;
create schema InitEvent(firstStock String, secondStock String, bias double);
create schema TickEvent(currentPrice double, stockCode String);
create context TwoStocksContext
initiated by InitEvent as initEvent;
context TwoStocksContext
select * from TickEvent
match_recognize (
measures A.currentPrice as a_currentPrice, B.currentPrice as b_currentPrice,
A.stockCode as a_stockCode, B.stockCode as b_stockCode
pattern (A C* B)
define
A as A.stockCode = context.initEvent.firstStock,
B as A.currentPrice - B.currentPrice >= context.initEvent.bias and
B.stockCode = context.initEvent.secondStock
);
And the sequence of events:
InitEvent={firstStock='YAH', secondStock = 'GOO', bias=5}
TickEvent={currentPrice=55.6, stockCode='YAH'}
TickEvent={currentPrice=50.4, stockCode='GOO'}
TickEvent={currentPrice=30.8, stockCode='MIC'}
TickEvent={currentPrice=24.9, stockCode='APP'}
TickEvent={currentPrice=51.6, stockCode='YAH'}
TickEvent={currentPrice=45.8, stockCode='GOO'}
TickEvent={currentPrice=32.8, stockCode='MIC'}
TickEvent={currentPrice=28.9, stockCode='APP'}
The result I get using them:
At: 2001-01-01 08:00:00.000
Statement: Stmt-4
Insert
Stmt-4-output={a_currentPrice=55.6, b_currentPrice=50.4, a_stockCode='YAH',
b_stockCode='GOO'}
At: 2001-01-01 08:00:00.000
Statement: Stmt-4
Insert
Stmt-4-output={a_currentPrice=51.6, b_currentPrice=45.8, a_stockCode='YAH',
b_stockCode='GOO'}
If I make the second set of events having a difference less than 5 between YAH/GOO, I only get output from the first pair which makes sense. This is, I think what it is supposed to do.
In case needed, those two methods read and process the annotations of the EPL module (I didn't write them myself, they are taken from coinTrader Context class that could be found here - https://github.com/timolson/cointrader/blob/master/src/main/java/org/cryptocoinpartners/module/Context.java ):
private static Object getSubscriber(String className) throws Exception {
Class<?> cl = Class.forName(className);
return cl.newInstance();
}
private static void processAnnotations(EPStatement statement) throws Exception {
Annotation[] annotations = statement.getAnnotations();
for (Annotation annotation : annotations) {
if (annotation instanceof Subscriber) {
Subscriber subscriber = (Subscriber) annotation;
Object obj = getSubscriber(subscriber.className());
System.out.println(subscriber.className());
statement.setSubscriber(obj);
} else if (annotation instanceof Listeners) {
Listeners listeners = (Listeners) annotation;
for (String className : listeners.classNames()) {
Class<?> cl = Class.forName(className);
Object obj = cl.newInstance();
if (obj instanceof StatementAwareUpdateListener) {
statement.addListener((StatementAwareUpdateListener) obj);
} else {
statement.addListener((UpdateListener) obj);
}
}
}
}
}
Well, after a month of struggle I finally solved it. In case anyone has similar problem in the future, here's where the problem was. The epl worked fine in the online tool but not in my code. Eventually, I figured out initial events aren't firing, hence context partitions aren't being created and as a result the subscribers and listeners do not receive any updates. My mistake was that I had POJO InitEvent fired, but the event that the context was using was created within the EPL module via create schema. I don't know what I was thinking, it makes sense now that it didn't work. As a result, the events I fire within Java aren't the events that the context uses. My solution was only within the EPL. Since I couldn't figure out if I can fire events in Java that are created within the module, I created a schema which is populated by my POJO and the stream is then used by the context as such:
#Name('schemaCreator')
create schema StartEvent(firstStock string, secondStock string, difference
double);
#Name('insertInitEvent')
insert into StartEvent
select * from InitEvent;
All else remains the same, as well as the Java code.

Flink Gelly Path/Trail Usecase

our team is new to Gelly api. We are looking to implement a simple use case that will list all paths originating from an initial vertice - for e.g.
input edge csv file is
1,2\n2,3\n3,4\n1,5\n5,6
the required output will be (the full path that starts from 1)
1,2,3,4\n1,5,6
Can someone please help.
You can use one of Gelly's iteration abstractions, e.g. vertex-centric iterations. Starting from the source vertex, you can iteratively extend the paths, one hop per superstep. Upon receiving a path, a vertex appends its ID to the path and propagates it to its outgoing neighbors. If a vertex has no outgoing neighbors, then it prints / stores the path and does not propagate it further. To avoid loops a vertex could also check if its ID exists in the path, before propagating. The compute function could look like this:
public static final class ComputePaths extends ComputeFunction<Integer, Boolean, NullValue, ArrayList<Integer>> {
#Override
public void compute(Vertex<Integer, Boolean> vertex, MessageIterator<ArrayList<Integer>> paths) {
if (getSuperstepNumber() == 1) {
// the source propagates its ID
if (vertex.getId().equals(1)) {
ArrayList<Integer> msg = new ArrayList<>();
msg.add(1);
sendMessageToAllNeighbors(msg);
}
}
else {
// go through received messages
for (ArrayList<Integer> p : paths) {
if (!p.contains(vertex.getId())) {
// if no cycle => append ID and forward to neighbors
p.add(vertex.getId());
if (!vertex.getValue()) {
sendMessageToAllNeighbors(p);
}
else {
// no out-neighbors: print p
System.out.println(p);
}
}
else {
// found a cycle => print the path and don't propagate further
System.out.println(p);
}
}
}
}
}
In this code I have assumed that you have pre-processed vertices to mark the ones that have no out-neighbors with a "true" value. You could e.g. use graph.outDegrees() to find those.
Have in mind that enumerating all paths in a large and dense graph is expensive to compute. The intermediate paths state can explode quite quickly. You could look into using a more compact way for representing paths than an using ArrayList of ints, but beware of the cost if you have a dense graph with large diameter.
If you don't need the paths themselves but you're only interested in reachability or shortest paths, then there exist more efficient algorithms.

Neo4J traversal in correct order for two different relationships?

I'm using the Neo4J Traversal API and trying to traverse from "1" to find nodes "2" and "3" fitting the pattern below:
1-[:A]-2-[:B]-3
However, in my traversal, I'm getting caught out because the following relationship also exists:
1-[:B]-3
As far as I understand, my TraversalDescription needs to specify both relationship types, but I'm unsure of the most elegant way to traverse the :A relationship first, and then branch out to the :B relationship. Unfortunately the relationship Direction can't be used to differentiate in my case.
My Scala code is:
db.traversalDescription()
.evaluator(isAThenBRelationship)
.breadthFirst()
.relationships(A)
.relationships(B)
private val isAThenBRelationship = new PathEvaluator.Adapter() {
override def evaluate(path: Path, state: BranchState[Nothing]): Evaluation = {
if (path.length == 0) {
EXCLUDE_AND_CONTINUE
} else if (path.length == 1) {
Evaluation.of(false, path.relationships().iterator().next().getType.name() == A.toString)
} else {
Evaluation.of(path.relationships().iterator().next().getType.name() == B.toString, false)
}
}
}
As an aside, what's a better way of comparing relationships than this?
path.relationships().iterator().next().getType.name() == MyRelationship.toString
Using relationships() multiple times does not imply an order. Instead there is a internal list which relationships() adds something to.
To limit a certain relationship type to a certain depth, you need to implement and use your own PathExpander. The example below uses Java and implements the PathExpander using an anonymous inner class:
traversalDescription.expand(new PathExpander<Object>() {
#Override
public Iterable<Relationship> expand(Path path, BranchState<Object> state) {
switch (path.length()) {
case 0:
return path.endNode().getRelationships(
DynamicRelationshipType.withName("A") );
case 1:
return path.endNode().getRelationships(
DynamicRelationshipType.withName("B") );
default:
return Iterables.empty();
}
}
#Override
public PathExpander<Object> reverse() {
// not used for unidirectional traversals
throw new UnsupportedOperationException();
}
});
Regarding your second question:
Neo4j contains a very convenient class IteratorUtils. With that your snippet can be written as (assuming MyRelationship is a instance of RelationshipType:
IteratorUtil.first(path.relationships()).getType().equals(MyRelationship)
Additionally to #StefanArmbruster's answer, here's the equivalent Scala code:
db.traversalDescription()
.breadthFirst()
.expand(isAThenBRelationship)
private val isAThenBRelationship =
new PathExpander[Object]() {
override def expand(path: Path, state: BranchState[Object]) =
path.length() match {
case 0 => path.endNode().getRelationships(DynamicRelationshipType.withName("A"))
case 1 => path.endNode().getRelationships(DynamicRelationshipType.withName("B"))
case _ => Iterables.empty()
}
override def reverse(): PathExpander[Object] = ???
}
Note that the expander must come after the relationships.

Resources