solr join - return parent and child document - join

I am using Solr's (4.0.0-beta) join capability to query an index that has documents with parent/child relationships. The join query works great, but I only get the parent documents in the search results. I believe this is the expected behavior.
Is it possible, though, to get both the parent and the child documents to be returned in the search results? (as separate search hits).
For example:
Parents:
SolrDocument{uid=m_1, media_id=1}<br/>
SolrDocument{uid=m_2, media_id=2}<br/>
SolrDocument{uid=m_3, media_id=3}
Children:
SolrDocument(uid=p_1, page_id=1, fk_media_id=[1], partNumber=[abc, def, xyz]}<br/>
SolrDocument(uid=p_2, page_id=2, fk_media_id=[1,2], partNumber=[123, 456]}<br/>
SolrDocument(uid=p_3, page_id=3, fk_media_id=[1,3], partNumber=[100, 101]}
I query by partNumber like this:
{!join from=fk_media_id to=media_id}partNumber:abc
and I get the parent document (uid=m_1) in the results, as expected. But I would like, in this case, both the parent and the child to be returned in the results. Is that possible?

No, It´s not posible. According to Solr Wiki:
For people who are used to SQL, it's important to note that Joins in Solr are not really equivalent to SQL Joins because no information about the table being joined "from" is carried forward into the final result. A more appropriate SQL analogy would be an "inner query""
http://wiki.apache.org/solr/Join
You have to denormalize all your data to do that or run two different querys.

Related

Vote How to view Cypher queries back for a database

I have used Neo4J ETL tool and create a Neo4J database from Postgres SQL and this looks perfect. I can see all nodes, relationships, data, etc.
Now I want to see all the database file, the Cypher queries for all node and relationship creation along with different constraint applied to this database.
How can I view this? I can see database folder is empty for Neo4J home,
C:\Users\I\.Neo4jDesktop\relate-data\dbmss\dbms-9cf178b6-f37f-4139-8b80-dadf0fa03866\data\databases
2nd question, can I generates graphql schema from the Cypher script using any tool or some mean?
Thanks!
I think below codes can help you to get meta-data and schema
// Show meta-graph
CALL db.schema.visualization()
// List node labels
CALL db.labels()
// List relationship types
CALL db.relationshipTypes()

Create relationship with properties using a query in Cypher

I would like to know if this is possible. I have a query that produces a nice report showing a relationship between two entities through two other nodes. There can be more than one path. I now want to create a direct relationship between those two nodes and count the number of paths and sum based upon data in the nodes in between. the report query is below.
match (bo:BuyerAgency)<-[:IS_FOR_BO]-(sol:Solicitation)-[:SELECTED]->(prop:Proposal)<-[:OWNS_BID]-(so:VendorOrg)
where sol.currStatus='Awarded'
return bo.AgencyName, count(sol.Number) as awards, so.orgName, sum(prop.finalPrice) as awardVolume;
What I want to do is similar to below which will not work.
match (bo:BuyerAgency)<-[:IS_FOR_BO]-(sol:Solicitation)-[:SELECTED]->(prop:Proposal)<-[:OWNS_BID]-(so:VendorOrg)
where sol.currStatus='Awarded'
create (bo)-[:HAS_AWARDED{awardCount: count(sol.Number), awardVolume: sum(prop.finalPrice)}]->(so);
If I remove the properties for the relationship, it works but want to add the properties without to much programing.
I am using the most recent version of Neo4j 3.2.
thanks
The problem here is you are trying to use count() and sum() functions in an invalid context. The below query should work:
match (bo:BuyerAgency)<-[:IS_FOR_BO]-(sol:Solicitation)-[:SELECTED]->(prop:Proposal)<-[:OWNS_BID]-(so:VendorOrg)
where sol.currStatus='Awarded'
with bo, so, count(sol.Number) as count_sol, sum(prop.finalPrice) as sum_finalPrice
create (bo)-[:HAS_AWARDED{awardCount: count_sol, awardVolume: sum_finalPrice}]->(so);
This query uses WITH to pass bo, so and the result of the aggregation functions count(sol.Number) and sum(prop.finalPrice) to the next context. After, these values are used to create the new relation between bo and so.

Neo4j ordered tree

We are working with a hierarchy tree structure where a parent has zero or more children, and a child has either one or zero parents. When we query for a list of direct children for a given parent the query returns the children in random order. We need the children to return in the order we define when we create or update the children.
I have added a relationship between children -[:Sibling]-> so the 'top' sibling has only an incoming :Sibling relationship, and the 'bottom' sibling has only an outgoing relationship.
Given this, is there a Cypher query to return the children in sibling order?
I have a query that returns each child, and its sibling, but now I have to write some code to return the list in the correct order.
An alternative approach might be to add a sort number to each child node. This would need to be updated for all the children if one of them changes order. This approach seems slightly foreign to the graph database concept.
If this problem has been encountered before, is there a standard algorithm for solving it programatically?
Update1
sample data as requested by Bruno
(parent1)
(child1)-[:ChildOf]->(parent1)
(child2)-[:ChildOf]->(parent1) (child2)-[:Sibling]->(child1)
(child3)-[:ChildOf]->(parent1) (child3)-[:Sibling]->(child2)
is there a cypher query to return child1, child2, child3 in that order?
if not, then the ordering can be done programatically
using properties instead of relationships
(parent1)
(child1)-[:ChildOf]->(parent1) (child1:{order:1})
(child2)-[:ChildOf]->(parent1) (child2:{order:2})
(child3)-[:ChildOf]->(parent1) (child3:{order:3})
`match (c)-[:ChildOf]->(parent1) return c ordered by c:order`
I do not expect that there is a cypher query that can update the order of children.
Update2
I have now arrived at the following query which returns children in the right order
`match (firstChild)-[:FirstChildOf]->(parent) match (sibling)-[:Sibling*]->(firstChild) return firstChild,sibling`
This query depends on adding a -[:FirstChildOf]->(parent) relationship.
If I don't hear otherwise I'll set this to the answer.
Shall I assume there is no cypher query for inserting a node into an ordered list?
When you create your graph model you should concentrate on the questions you want to answered with your data model.
If you want to get child of a parent ordered by a "create or update" property then you should store it since the relationship in general does not represent an order.
It is not foreign to the graph database concept, since the graph databases use properties. It is not always easy task to decide
to store something as a relationship or a property. It is all about the proper modelling.
If you have the concept of '[:NEXT_SIBLING]' or similar, then it will be a pain in the back, when you have to remove a node. So I should use it when this state is constant. For example in a timetree, where the days are after each other and it does not change.
In general, if the creation order should be used, I should use timestamps like this:
create (n:Node {created:timestamp()});
Then you can use the timestamp to order.
match (p:Parent)-[:HAS_CHILDREN]->(n:Child) where p.name='xy' return n order by n.created;
And you can use timestamps for relationships too.
Similar question is here:
Modeling an ordered tree with neo4j
Here are some tips I use for dealing with ordered lists of nodes in Neo4J.
Reverse the relation direction to (child1)<-[:HasChild]-(parent1). (mostly just logical reinforcement of the next items, since a "parent has list of children")
Add the property index to HasChild. This will let you do WITH child, hasChild.index as sid ORDER BY sid ASC for sibling order. (This is how I maintain arbitrary order information on lists of children) This is on the relationship because it assumes that a node can be part of more than one ordered list.
You can use SIZE(shortestpath((root)-[*]->(child1)) as depth to then order them by depth from a root.
Since this is for arbitrary order, you must update all the indexes to update the order (You can do something like WITH COLLECT(child) as children, FILTER(c IN COLLECT(child) WHERE c.index >=3) as subset FOREACH (c IN subset| SET c.index+=1) for a basic insert, otherwise you will have to just rewrite them all to arbitrarily change the order.
If you don't actually care about the order, just that it is consistent, you can use WITH child, ID(child) as sid ORDER BY sid ASC. This essentially is "Order by node age"
One other option is to use a meta relationship. So :HasChild would act as a list of nodes, and then something like :NextMember would tell you what the next item from this one is. This is more flexible, but in my opinion harder to work with (You need to case for doesn't have next, to get the right order you have to do a 'trace' on the nodes, doesn't work if you want to add this node to another ordered list later, ect.)
Of course, if the order isn't arbitrary (based on name or age or something), than it is much better to just sort on the non-arbitrary logic.
The query for returning the children in the right order is
match (firstChild)-[:FirstChildOf]->(parent) match (sibling)-[:Sibling*]->(firstChild) return firstChild,sibling

ActiveRecord .joins breaking other queries

I'm writing a Rails API on top of a legacy database with tons of tables. The search feature gives users the ability to query around 20 separate columns spread across 13 tables. I have a number of queries that check the params to see if they need to return results. They look like this:
results << Company.where('city LIKE ?', "#{params[:city]}").select('id') unless params[:city].blank?
and they work fine. However, I just added another query that looks like this:
results << Company.joins("JOIN Contact ON Contact.company_id = Company.id").where("Contact.first_name LIKE ?", "%#{params[:first_name]}%").select('company_id') unless params[:first_name].blank?
and suddenly my first set of queries started returning null, rather than the list of IDs they had been returning. The query with the join works perfectly well whether the other queries are functional or not. When I comment the join query out, the previous queries start working again. Is there some reason the query with a join would break other queries on the page?
I can't think of a particular reason why the join would be breaking your previous queries however I do have some suggestions for your query overall.
Assuming you've modelled these relationships correctly you shouldn't need to define the join manually. On another note, you're not querying against the company at all so you can use an includes instead of a join - this will allow you to access its data without firing another query.
If you wanted to access company data (ie. query.company.name) use an includes like so:
Contact.includes(:company).where('first_name LIKE ?', param).select(:company_id).distinct
However it appears all you really want is an array of ID's (which exists on the contact model), because of this you can lighten things up and not include the company at all.
Contact.where('first_name LIKE ?', param).select(:company_id).distinct
Whenever you get stuck never forget to checkout the great resources over at: http://api.rubyonrails.org/ - they are an absolute life saver sometimes!
It turned out that the queries with a join needed to be placed above the queries without a join. I'm not sure why it behaves this way, but hopefully this helps someone else down the line.

Apache Solr - Count of subquery as a superquery parameter

I'm having a little trouble trying to make a query in Solr.
The problem is: I must be able retrieve documents that have the same value for a specified field, but they should only be retrieved if this value appeared more than X times for a specified user.
In pseudosql it would be something like:
select user_id from documents
where my_field="my_value"
and
(select count(*) from documents where my_field="my_value" and user_id=super.user_id) > X
I Know that solr return a 'numFound' for each query you make, but I dont know how to retrieve this value in a subquery.
My Solr is organized in a way that a user is a document, and the properties of the user (such as name, age, etc) are grouped in another document with a 'root_id' field.
So lets suppose the following query that gets all the root documents whose children have the prefix "some_prefix".
is_root:true AND _query_:"{!join from=root_id to=id}requests_prefix:\"some_prefix\""
Now, how can I get the root documents (users in some sense) that have more than X children matching 'requests_prefix:"some_prefix"' or any other condition?
Is it possible?
P.S. It must be done in a single query, fields can be added at will, but the root/children structure should be preserved (preferentially).
As it turns out, Solr didn't match my needs and I ended up using ElasticSearch with its nativa parent-child mapping.

Resources