SDN4 or neo4j-ogm performances issue - neo4j

I wrote some simple java code and I encountered some bad performances with SDN4 that I didn't have with SDN3. I suspect the find repositories methods depth parameter to not work exactly in the way it should be. Let me explain the problem:
Here are my java classes(it's just an example) in which I removed getters, setters, contructors, ...
First class is 'Element' :
#NodeEntity
public class Element {
#GraphId
private Long id;
private int age;
private String uuid;
#Relationship(type = "HAS_VALUE", direction = Relationship.OUTGOING)
private Set<Value> values = new HashSet<Value>();
Second one is 'Attribute'
#NodeEntity
public class Attribute {
#GraphId
private Long id;
#Relationship(type = "HAS_PROPERTIES", direction = Relationship.OUTGOING)
private Set<HasInterProperties> properties;
The 'value' class allow my user to add a value on an Element for a specific attribute :
#RelationshipEntity(type = "HAS_VALUE")
public class Value {
#GraphId
private Long id;
#StartNode
Element element;
#EndNode
Attribute attribute;
private Integer value;
private String uuid;
public Value() {
}
public Value(Element element, Attribute attribute, Integer value) {
this.element = element;
this.attribute = attribute;
this.value = value;
this.element.getValues().add(this);
this.uuid = UUID.randomUUID().toString();
}
'Element' classe really need to know its values but 'Attribute' class do not care at all about values.
An attribute has references on InternationalizedProperties class which is like that :
#NodeEntity
public class InternationalizedProperties {
#GraphId
private Long id;
private String name;
The relationship entity between an attribute and it InternationalizedProperties is like the following :
#RelationshipEntity(type = "HAS_PROPERTIES")
public class HasInterProperties {
#GraphId
private Long id;
#StartNode
private Attribute attribute;
#EndNode
private InternationalizedProperties properties;
private String locale;
I then created a little main method to create two attributes and 10000 elements. All my elements have a specific value for the first attribute but no values for the second one (no relation between them). Both attributes hav two differents internationalizedProperties. Here is a sample :
public static void main(String[] args) {
ApplicationContext context = new ClassPathXmlApplicationContext("spring/*.xml");
Session session = context.getBean(Session.class);
session.query("START n=node(*) OPTIONAL MATCH n-[r]-() WHERE ID(n) <> 0 DELETE n,r", new HashMap<String, Object>());
ElementRepository elementRepository = context.getBean(ElementRepository.class);
AttributeRepository attributeRepository = context.getBean(AttributeRepository.class);
InternationalizedPropertiesRepository internationalizedPropertiesRepository = context.getBean(InternationalizedPropertiesRepository.class);
HasInterPropertiesRepository hasInterPropertiesRepository = context.getBean(HasInterPropertiesRepository.class);
//Creation of an attribute object with two internationalized properties
Attribute att = new Attribute();
attributeRepository.save(att);
InternationalizedProperties p1 = new InternationalizedProperties();
p1.setName("bonjour");
internationalizedPropertiesRepository.save(p1);
InternationalizedProperties p2 = new InternationalizedProperties();
p2.setName("hello");
internationalizedPropertiesRepository.save(p2);
hasInterPropertiesRepository.save(new HasInterProperties(att, p1, "fr"));
hasInterPropertiesRepository.save(new HasInterProperties(att, p2, "en"));
LOGGER.info("First attribut id is {}", att.getId());
//Creation of 1000 elements having a differnt value on a same attribute
for(int i = 0; i< 10000; i++) {
Element elt = new Element();
new Value(elt, att, i);
elementRepository.save(elt);
if(i%50 == 0) {
LOGGER.info("{} elements created. Last element created with id {}", i+1, elt.getId());
}
}
//Another attribut without any values from element.
Attribute att2 = new Attribute();
attributeRepository.save(att2);
InternationalizedProperties p12 = new InternationalizedProperties();
p12.setName("bonjour");
internationalizedPropertiesRepository.save(p12);
InternationalizedProperties p22 = new InternationalizedProperties();
p22.setName("hello");
internationalizedPropertiesRepository.save(p22);
hasInterPropertiesRepository.save(new HasInterProperties(att2, p12, "fr"));
hasInterPropertiesRepository.save(new HasInterProperties(att2, p22, "en"));
LOGGER.info("Second attribut id is {}", att2.getId());
Finally, in another main method, I try to get several times the first attribute and the second one :
private static void getFirstAttribute(AttributeRepository attributeRepository) {
StopWatch st = new StopWatch();
st.start();
Attribute attribute = attributeRepository.findOne(25283L, 1);
LOGGER.info("time to get attribute (some element have values on it) is {}ms", st.getTime());
}
private static void getSecondAttribute(AttributeRepository attributeRepository) {
StopWatch st = new StopWatch();
st.start();
Attribute attribute2 = attributeRepository.findOne(26286L, 1);
LOGGER.info("time to get attribute (no element have values on it) is {}ms", st.getTime());
}
public static void main(String[] args) {
ApplicationContext context = new ClassPathXmlApplicationContext("spring/*.xml");
AttributeRepository attributeRepository = context.getBean(AttributeRepository.class);
getFirstAttribute(attributeRepository);
getSecondAttribute(attributeRepository);
getFirstAttribute(attributeRepository);
getSecondAttribute(attributeRepository);
getFirstAttribute(attributeRepository);
getSecondAttribute(attributeRepository);
getFirstAttribute(attributeRepository);
getSecondAttribute(attributeRepository);
}
Here are the logs of this execution :
time to get attribute (some element have values on it) is 2983ms
time to get attribute (no element have values on it) is 4ms
time to get attribute (some element have values on it) is 1196ms
time to get attribute (no element have values on it) is 2ms
time to get attribute (some element have values on it) is 1192ms
time to get attribute (no element have values on it) is 3ms
time to get attribute (some element have values on it) is 1194ms
time to get attribute (no element have values on it) is 3ms
Getting the second attribut (and its internationalized properties thanks to depth=1) is very quick but to get the first one remains very slow. I know that there are many relations (10000 exactly) which are pointing on the first attribute, but when I want to get an attribute with its internationalized properties I clearly do not want to get all the values which are pointing on it. (since Set is not specified on Attribute class).
That's why I think there is a performance problem here. Or may be I do something wrong ?
Thanks for your help

When loading data from the graph we don't currently analyse how your domain model is wired together, so we may potentially bring back related nodes that you do not require. These will then be discarded if they are not mappable in your domain, but if there are many of them, it could potentially impact response times.
There are two reasons for this approach.
It is obviously much simpler to create generic queries to any depth,than it would be dynamically analyse your domain model to any arbitrary depth and generate on-the-fly custom queries; its also much simpler to analyse and prove the correctness of generic queries.
We want to preserve the capability to support polymorphic domain
models in the future, where we don't necessarily know what's in the
database from one day to the next, but we want to adapt our domain
model hydration according to what we find.
In this case I would suggest writing a custom query to load the Attribute objects, to ensure you don't bring back all the unwanted relationships.

Related

Ho to use a custom mapper with mapstruct with nested values and conditional values

I am trying to map one object to another using mapstrut and currently facing some challenges on how to use it for some cases.
public class TargetOrderDto {
String id;
String preferedItem;
List<Item> items;
String status;
Address address;
}
public class Item {
String id;
String name;
}
public abstract class TargetOrderMapper {
#Autowired
private StatusRepository statusRepository;
#Mappings({
#Mapping(target = "id", source = "reference"),
#Mapping(target = "preferedItem", source = ""), // Here I need to loop through these values checking for a single value with a specific tag
#Mapping(target = "items", source = "items"), // List of objects to another list of different data types.
#Mapping(target = "status", source = "remoteStatus") // may need to extract a value from a repository
})
abstract OrderDto toTargetOrderDto(RemoteOrder remoteOrder);
}
// Remote Data
public class RemoteOrder {
String reference;
List<Item> items;
String remoteStatus;
}
public class RemoteItem {
String id;
String flag;
String description;
}
These are the current scenarios that I have failed to get my head around (maybe I am mapping a complex object).
preferedItem :
for this, I need to loop though the items in the order and identify the item with a specific flag. (if it matches then I take that value else I use null)
items :
I need to convert this to a list of 2 different lists; List from List, all have different mapping rules of their own.
remoteStatus :
This one is abit more tricky, I need to extract the status from remoteOrder then lookit up in the db using the statusRepository for an alternate mapped value in db.
any help is highly appreciated.
You can't do business logic with MapStruct. So keep mappings simple and define your own methods were it comes to conditional mappings in list. Note: you can write your own method and MapStruct will select it. Also, from this own implementation you can refer to MapStruct methods again.
public abstract class TargetOrderMapper {
#Autowired
private StatusRepository statusRepository;
#Mappings({
#Mapping(target = "id", source = "reference"),
#Mapping(target = "preferedItem", source = ""), // Here I need to loop through these values checking for a single value with a specific tag
#Mapping(target = "items", source = "items"), // List of objects to another list of different data types.
#Mapping(target = "status", source = "remoteStatus") // may need to extract a value from a repository
})
abstract OrderDto toTargetOrderDto(RemoteOrder remoteOrder);
protected List<Item> toItemList(List<Item> items) {
// do what ever you want..
// and call toItem during iterating.
}
protected abstract Item toItem(Item item);
}
The same goes for status. I added a FAQ entry some time ago about list (mainly about updating, but I guess the same applies here).
About lookups, you can use #MappingContext to pass down a context that contains the logic to access a DB. See here

How to do group by key on custom logic in cloud data flow

I am trying to achieve the Groupby key based on custom object in cloud data flow pipe line.
public static void main(String[] args) {
Pipeline pipeline = Pipeline.create(PipelineOptionsFactory.create());
List<KV<Student,StudentValues>> studentList = new ArrayList<>();
studentList.add(KV.of(new Student("pawan", 10,"govt"),
new StudentValues("V1", 123,"govt")));
studentList.add(KV.of(new Student("pawan", 13223,"word"),
new StudentValues("V2", 456,"govt")));
PCollection<KV<Student,StudentValues>> pc =
pipeline.apply(Create.of(studentList));
PCollection<KV<Student, Iterable<StudentValues>>> groupedWords =
pc.apply(GroupByKey.<Student,StudentValues>create());
}
I just wanted to groupBy both the PCollection record based on the Student object.
#DefaultCoder(AvroCoder.class)
static class Student /*implements Serializable*/{
public Student(){}
public Student(String n, Integer i, String sc){
name = n;
id = i;
school = sc;
}
public String name;
public Integer id;
public String school;
#Override
public boolean equals(Object obj) {
System.out.println("obj = "+obj);
System.out.println("this = "+this);
Student stObj= (Student)obj;
if (stObj.Name== this.Name){
return true;
} else{
return false;
}
}
}
I have overridden the equals method of my custom class, but each time i am getting same instance of Student object to compare inside equals method.
Ideally it sholud compare first student key with second one.
Whats wrong i am doing here.
Why do you think you are doing anything wrong? The keys of each element are serialized (using the AvroCoder you specified) and the GroupByKey can group all of the elements with the same serialized representation together. After that it doesn't need to compare the students to make sure that the values with the same key have been grouped together.

Neo4j-spring-data(4.1.1) self relationship (parent-child) is duplicated

I am using spring-data-4.1.1 & Neo4j 2.3.2 with ogm annotations
Below is my entity
#NodeEntity(label = "Component")
public class Component extends BaseEntity {
.........
#Relationship(type = Relation.LINK_TO)
private Set<Link> links = new HashSet<>();
#Relationship(type = Relation.PARENT)
private Set<Component> parents = new HashSet<>();
.........
.........
}
And Link class
#RelationshipEntity(type = Relation.LINK_TO)
public class Link extends BaseEntity {
#Property(name = "isSelfLink")
private boolean isSelfLink;
#StartNode
private Component component;
#EndNode
private Component linkComponent;
}
I've removed getter/setter/hashcode/equals for keeping it clean
Now, here is my code to add two component parent/child and a Link
Component parentcomp = new Component(1, name);
Component childcomp = new Component(2, name);
childcomp.getParents().add(parentcomp);
Link link = new Link();
link.setComponent(parentcomp);
link.setLinkComponent(childcomp);
parentcomp.getLinks().add(link);
componentRepository.save(parentcomp,-1);
Now, as per the logic
object parentcomp property 'parent' should be empty
object childcomp property 'parent' should have parentcomp object
And parentcomp property 'links' should have childcomp
(parentcomp)----LINKS_TO---->(childcomp)
(parentcomp)<----PARENT----(childcomp)
Note: My equirement is such that we need two way relationship..
But, below is the result when I load parent or child entity
object parentcomp property 'parent' has both childcomp,parentcomp instead of empty
object childcomp property 'parent' has both childcomp,parentcomp instead of only parentcomp
This behavior persist until a Neo4j sessions clears out internally. After some time(or after app restart) the mappings shows up correctly.
I tried cleaning up the session using neo4joperations.clear() still problem persists. But if I query
match (c:Component)-[:PARENT]->(p) where c.componentId = {0} return p
results are correct.
I am not sure how to solve this problem...

How to get the direct relationship entities and directly related nodes in custom query in SDN4?

I have an annotated finder method in my repository:
#Query("MATCH (me:User)<-[ab:ASKED_BY]-(q:Question) WHERE id(me) = {0} RETURN q")
Iterable<Question> findQuestionsByUserId(Long id);
My objects like:
#NodeEntity
public class Question {
private AskedBy askedBy;
#Relationship(type = "TAGGED_WITH")
private Set<Tag> tags = new HashSet<>();
//...
}
#RelationshipEntity(type = "ASKED_BY")
public class AskedBy {
#GraphId private Long id;
#StartNode
private User user;
#EndNode
private Question question;
// other props
}
When I call the repository method, the askedBy field is null in the result. How can I populate that field with the relationship?
Update:
I have tried to load the relationship with session loadAll(collection) but it did not help.
final Collection<Question> questions = (Collection<Question>) questionRepository.findQuestionsByUserId(user.getId());
final Question q = questions.iterator().next();
System.out.println("After `findQuestionsByUserId`:");
System.out.println("`q.getTags().size()`: " + q.getTags().size());
System.out.println("`q.getAskedBy()`: " + q.getAskedBy());
neo4jOperations.loadAll(questions, 1);
System.out.println("After `neo4jOperations.loadAll(questions, 1)`:");
System.out.println("`q.getTags().size()`: " + q.getTags().size());
System.out.println("`q.getAskedBy()`: " + q.getAskedBy());
final Collection<AskedBy> askedByCollection = neo4jOperations.loadAll(AskedBy.class);
System.out.println("`askedByCollection.size()`: " + askedByCollection.size());
The above snippet outputs
After findQuestionsByUserId:
q.getTags().size(): 0
q.getAskedBy(): null
After neo4jOperations.loadAll(questions, 1):
q.getTags().size(): 1
q.getAskedBy(): null
askedByCollection.size(): 0
So it seems the default depth is 0 for the custom query, and for some unknown reason I can not load the relationship entity.
The graph looks okay:
At the moment, custom queries do not support a depth parameter (it's on the roadmap), so you have the following options-
a) Use repository.findOne(userId) (the default is depth 1 so it should load AskedBy). Or customize the depth with repository.findOne(userId,depth). Or use Neo4jTemplate.load(type,id,depth)
b) If you need to query on more than the id, use the loadAll methods on the org.neo4j.ogm.session.Session that accept a set of org.neo4j.ogm.cypher.Filter. Examples available in MusicIntegrationTest
c) Continue with the custom query but after you get the entity ID back, load it via the load* methods providing a custom depth.

Neo4j. How to store additional parameters to entity already saved?

I need to have different parameters for Device\Software from Entity to Entity.
Device\Software will be stored at DB, and once new Entity created - it will make a relationship with some Device\Software which already stored, but their parameters could vary for different Entities(either the amount of parameters or their values).
I tried to save additional Parameters directly in the edge of the graph (in RelationshipEntity between Entity and Device\Software), but seems like it could add complexity during implementing some network comparison algorithm.
Does anyone had similar scenario? What is the best practice\approach?
#NodeEntity
public class Entity {
#GraphId
protected Long id;
protected String title;
/**
* More fields here
*/
#Fetch
#RelatedTo(direction = Direction.BOTH, type = "has_devices")
protected Set<Device> devices = new HashSet<>();
#Fetch
#RelatedTo(direction = Direction.BOTH, type = "has_software")
protected Set<Software> software = new HashSet<>();
}
#NodeEntity
public class Device {
#GraphId
protected Long id;
protected String identifier;
/**
* More fields here
*/
#Fetch
#RelatedTo(direction = Direction.BOTH, type = "has_parameter")
protected Collection<ParameterEntity> parameter = new HashSet<>();
}
#NodeEntity
public class Software {
#GraphId
protected Long id;
protected String identifier;
/**
* More fields here
*/
#Fetch
#RelatedTo(direction = Direction.BOTH, type = "has_parameter")
protected Collection<ParameterEntity> parameter = new HashSet<>();
}
ParameterEntity is just Key-Value object.
Not really sure what you mean with parameters. But saving them on the relationship should work. You can ignore the params in your algorithm if you don't need them.
It really depends if you parameter just quantify the relationship or if they are real entities in their own right.

Resources