How to count cypher labels with specific condition? - neo4j

I have a graph database with information about different companies and their subsidiaries. Now my task is to display the structure of the company. This I have achieved with d3 and vertical tree.
But additionally I have to write summary statistics about the company that is currently displayed. Companies can be chosen from a dropdown list which is fetching this data dynamically via AJAX call.
I have to write in the same HTML a short summary like :
Total amount of subsidiaries for CompanyA: 300
Companies in Corporate Havens : 45%
Companies in Tax havens 5%
My database consists of two nodes: Company and Country, and the country has label like CH and TH.
CREATE (:TH:Country{name:'Nauru', capital:'Yaren', lng:166.920867,lat:-0.5477})
WITH 1 as dummy MATCH (a:Company), (b:Country) WHERE a.name=‘CompanyA ' AND b.name='Netherlands' CREATE (a)-[:IS_REGISTERED]->(b)
So how can I find amount of subsidiaries of CompanyA that are registered in corporate and tax havens? And how to pass this info further to html
I found different cypher queries to query all the labels as well as apocalyptic.stats but this does not allow me to filter on mother company. I appreciate help.

The cypher is good because you write a query almost in natural language (the query below may be incorrect - did not check, but the idea is clear):
MATCH (motherCompany:Company {name: 'CompanyA'})-[:HAS_SUBSIDIARY]->(childCompany:Company)
WITH motherCompany,
childCompany
MATCH (childCompany)-[:IS_REGISTERED]->(country:Country)
WITH motherCompany,
collect(labels(country)) AS countriesLabels
WITH motherCompany,
countriesLabels,
size([countryLabels IN countriesLabels WHERE 'TH' IN countryLabels ]) AS inTaxHeaven
RETURN motherCompany,
size(countriesLabels) AS total,
inTaxHeaven,
size(countriesLabels) - inTaxHeaven AS inCorporateHeaven

Related

Correct order of operations in neo4j - LOAD, MERGE, MATCH, WITH, SET

I am loading simple csv data into neo4j. The data is simple as follows :-
uniqueId compound value category
ACT12_M_609 mesulfen 21 carbon
ACT12_M_609 MNAF 23 carbon
ACT12_M_609 nifluridide 20 suphate
ACT12_M_609 sulfur 23 carbon
I am loading the data from the URL using the following query -
LOAD CSV WITH HEADERS
FROM "url"
AS row
MERGE( t: Transaction { transactionId: row.uniqueId })
MERGE(c:Compound {name: row.compound})
MERGE (t)-[r:CONTAINS]->(c)
ON CREATE SET c.category= row.category
ON CREATE SET r.price =row.value
Next I do the aggregation to count total orders for a compound and create property for a node in the following way -
MATCH (c:Compound) <-[:CONTAINS]- (t:Transaction)
with c.name as name, count( distinct t.transactionId) as ord
set c.orders = ord
So far so good. I can accomplish what I want but I have the following 2 questions -
How can I create the orders property for compound node in the first step itself? .i.e. when I am loading the data I would like to perform the aggregation straight away.
For a compound node I am also setting the property for category. Theoretically, it can also be modelled as category -contains-> compound by creating Categorynode. But what advantage will I have if I do it? Because I can execute the queries and get the expected output without creating this additional node.
Thank you for your answer.
I don't think that's possible, LOAD CSV goes over one row at a time, so at row 1, it doesn't know how many more rows will follow.
I guess you could create virtual nodes and relationships, aggregate those and then use those to create the real nodes, but that would be way more complicated. Virtual Nodes/Rels
That depends on the questions/queries you want to ask.
A graph database is optimised for following relationships, so if you often do a query where the category is a criteria (e.g. MATCH (c: Category {category_id: 12})-[r]-(:Compound) ), it might be more performant to create a label for it.
If you just want to get the category in the results (e.g. RETURN compound.category), then it's fine as a property.

Grails 3 - return list in query result from HQL query

I have a domain object:
class Business {
String name
List subUnits
static hasMany = [
subUnits : SubUnit,
]
}
I want to get name and subUnits using HQL, but I get an error
Exception: org.springframework.orm.hibernate4.HibernateQueryException: not an entity
when using:
List businesses = Business.executeQuery("select business.name, business.subUnits from Business as business")
Is there a way I can get subUnits returned in the result query result as a List using HQL? When I use a left join, the query result is a flattened List that duplicates name. The actual query is more complicated - this is a simplified version, so I can't just use Business.list().
I thought I should add it as an answer, since I been doing this sort of thing for a while and a lot of knowledge that I can share with others:
As per suggestion from Yariash above:
This is forward walking through a domain object vs grabbing info as a flat list (map). There is expense involved when having an entire object then asking it to loop through and return many relations vs having it all in one contained list
#anonymous1 that sounds correct with left join - you can take a look at 'group by name' added to end of your query. Alternatively when you have all the results you can use businesses.groupBy{it.name} (this is a cool groovy feature} take a look at the output of the groupBy to understand what it has done to the
But If you are attempting to grab the entire object and map it back then actually the cost is still very hefty and is probably as costly as the suggestion by Yariash and possibly worse.
List businesses = Business.executeQuery("select new map(business.name as name, su.field1 as field1, su.field2 as field2) from Business b left join b.subUnits su ")
The above is really what you should be trying to do, left joining then grabbing each of the inner elements of the hasMany as part of your over all map you are returning within that list.
then when you have your results
def groupedBusinesses=businesses.groupBy{it.name} where name was the main object from the main class that has the hasMany relation.
If you then look at you will see each name has its own list
groupedBusinesses: [name1: [ [field1,field2,field3], [field1,field2,field3] ]
you can now do
groupedBusinesses.get(name) to get entire list for that hasMany relation.
Enable SQL logging for above hql query then compare it to
List businesses = Business.executeQuery("select new map(b.name as name, su as subUnits) from Business b left join b.subUnits su ")
What you will see is that the 2nd query will generate huge SQL queries to get the data since it attempts to map entire entry per row.
I have tested this theory and it always tends to be around an entire page full of query if not maybe multiple pages of SQL query created from within HQL compared to a few lines of query created by first example.

Match nodes of variable depth

I have users who like different geographies (could be a country, state or city) and I want to match those users who like geographies in the same country.
For eg.
user A likes USA
user B likes USA
user C likes San Jose
user D likes France
then I want user A to be matched to users B and C.
What cypher query will get me the results? This is what I tried:
/** node id of user A is 0 **/
START u=node(0) MATCH (u:users) - [:likes] - (g1) - [:contains*0..5] - (g2) - [:likes] - (o:users) RETURN o;
This query is not working as expected. What would be a right syntax?
If I understood you correctly, something like this might work in your case. But pay attention - there might be certain issues in case of circular paths.
The main idea behind this is setting not only relationships but their directions as well.

Neo4j Facet Fields like Solr

I'm using Neo4j in a community e-commerce built in PHP and using the REST interface.
I need to get all categories related to the search results like Amazon. This feature is available in other engines like Solr (another implementation of Lucene) as Faceted Search
How can I do a Faceted Search in Neo4j? or What's the best way (performance grade) to recreate this feature?
All required modules related to this feature are excluded from the core package of neo4j. I want to know if someone try to do something like this without transverse all nodes in the graph, grab some properties, and make a groupCount of this values. If we have 200k nodes, the transverse took 10sec to only get the categories.
This is my Gremlin approach.
(new Neo4jVertexSequence(
g.getRawGraph().index().forNodes('products').query(
new org.neo4j.index.lucene.QueryContext('category:?')
), g
))._().groupBy{it.category}.cap.next();
Results in 90 rows and took 54 seconds.
Books = 12002
Movies_Music_Games = 19233
Electronics_Computers = 60540
Home_Garden_Tools = 9123
Grocery_Health_Beauty = 15643
Toys_Kids_Baby = 15099
Clothing_Shoes_Jewelry = 12543
Sports_Outdoors = 10342
Automotive_Industrial = 9638
... (more rows)
Of course, I can't put this results in cache, because, this is for "non input search". If the user makes a query like "Iphone", the query looks like
(new Neo4jVertexSequence(
g.getRawGraph().index().forNodes('products').query(
new org.neo4j.index.lucene.QueryContext('search:"iphone" AND category:?')
), g
))._().groupBy{it.category}.cap.next();
What about your domain model? Did you just put everything in the index? Usually you would model your categories as nodes and have your products being related to the category nodes.
(product)-[:HAS_CATEGORY]->(category)<-[:IS_CATEGORY]-(categories)
In your query you would just traverse this little tree and count the relationships of type :HAS_CATEGORY starting from each category node.
start categories=node(x)
match (product)-[:HAS_CATEGORY]->(category)<-[:IS_CATEGORY]-(categories)
return category.name, count(*)

Can Neo4j be effectively used to show a collection of nodes in a sortable and filterable table?

I realise this may not be ideal usage, but apart from all the graphy goodness of Neo4j, I'd like to show a collection of nodes, say, People, in a tabular format that has indexed properties for sorting and filtering
I'm guessing the Type of a node can be stored as a Link, say Bob -> type -> Person, which would allow us to retrieve all People
Are the following possible to do efficiently (indexed?) and in a scalable manner?
Retrieve all People nodes and display all of their names, ages, cities of birth, etc (NOTE: some of this data will be properties, some Links to other nodes (which could be denormalised as properties for table display's and simplicity's sake)
Show me all People sorted by Age
Show me all People with Age < 30
Also a quick how to do the above (or a link to some place in the docs describing how) would be lovely
Thanks very much!
Oh and if the above isn't a good idea, please suggest a storage solution which allows both graph-like retrieval and relational-like retrieval
if you want to operate on these person nodes, you can put them into an index (default is Lucene) and then retrieve and sort the nodes using Lucene (see for instance How do I sort Lucene results by field value using a HitCollector? on how to do a custom sort in java). This will get you for instance People sorted by Age etc. The code in Neo4j could look like
Transaction tx = neo4j.beginTx();
idxManager = neo4j.index()
personIndex = idxManager.forNodes('persons')
personIndex.add(meNode,'name',meNode.getProperty('name'))
personIndex.add(youNode,'name',youNode.getProperty('name'))
tx.success()
tx.finish()
'*** Prepare a custom Lucene query context with Neo4j API ***'
query = new QueryContext( 'name:*' ).sort( new Sort(new SortField( 'name',SortField.STRING, true ) ) )
results = personIndex.query( query )
For combining index lookups and graph traversals, Cypher is a good choice, e.g.
START people = node:people_index(name="E*") MATCH people-[r]->() return people.name, r.age order by r.age asc
in order to return data on both the node and the relationships.
Sure, that's easily possible with the Neo4j query language Cypher.
For example:
start cat=node:Types(name='Person')
match cat<-[:IS_A]-person-[born:BORN]->city
where person.age > 30
return person.name, person.age, born.date, city.name
order by person.age asc
limit 10
You can experiment with it in our cypher console.

Resources