SOLR Analysis Query - join

I have a SOLR instance with millions of documents. The schema is well defined (i.e. all fields are typed). All the searching/faceting etc. works ok without any issues.
However, I am trying to do something new which I "think" is not supported in current version. I am running SOLR 3.5 on Windows using Jetty.
To simplify the question, my document contains some fields like:
Id,
Name,
City,
JobTitle
Lets say I have a sample data like:
P Wood, London, Director
J Smith, London, Project Manager
D Lock, Brighton, Developer
K Pracy, London, Developer
For the sake of example, assume that this is a matching system which allows people to find each other. Also assume that Id is unique Id.
I want to write a "sampling" query which should find me the set of records that will match other records for any criteria.
So for example, I want to define a criteria like:
Find me the people who will match people in different cities with differfent job titles:
If the above schema was a RDBMS-SQL table (lets say People), the approximate query would be something like this:
SELECT P.Id,
(
SELECT COUNT(1)
FROM People PI Where PI.Id != P.Id
AND PI.City != P.City
AND PI.JobTitle != P.JobTitle
) AS FindCount
FROM
People P
Well, the query may not be workable but you get the idea. Anyway, there are other requirements also that Findcount should be greater than x and less than y.
Can someone let me know if this is possible in SOLR or if this is something not meant for SOLR. I know SOLR 4 is coming with a Join operator but that seems to me more like an IN clause which limits the use. For example, consider that I want the matching Id's also in above query rather than counts.

I don't think that is doable in 1 query and you might end up with running "inner select" as separate query for every person

Related

Can graph database query "nodes that a given node has no relationship with"?

I am working on a dating app where users can "like" or "dislike" other users and get matched.
As you can imagine the most important query of the app would be:
Give me a stack of nearby user profiles that I have NOT liked/disliked before.
I tried to work on this with a document database (Firestore) and figured it's simply not suitable for such kind of application and hence landed in the graph database world which is new and fascinating to me.
I understand that by nature a graph database retrieves data by tracing through the relationships and make relationships first-class citizens. My question now is that what if the nodes that I am trying to get are those with no relationship from the given node? What would the query look like? Can anyone provide an example query?
Edit:
- added nearby criteria to the query statement
This is definitely possible, here is a query example :
MATCH (me:Profile {name: "Chris"})
MATCH (other:Profile) WHERE NOT (other)-[:LIKES]->(me)
As stated in the comments of your original question, on a large dataset it might not scale well, that said it is pretty uncommon that you would use only one criteria for matching, for example, the list of possible profiles to match from can be grouped by :
geolocation
profiles in depth 2 ( who is liking me, then find who other people they like, do those people like me ? )
shared interests
age group
skin color
...

How to Order by relevance in a SQL query in Rails?

I am using a SQL query for a specified and faster search in my Rails app, currently I have the following query that works wonderfully for gathering the data from the initial search:
SELECT * FROM selected_tables
WHERE field1 LIKE '%#{present_param}'
AND field2 LIKE '%#{present_param2}'
And so on like that, with each LIKE line only appearing if the relevant parameter is present from the form.
So I am now able to get back a large amount of results from this query, but they're not ordered in any helpful way. I need some way of ordering the results based on their relevance to the original user input from the form, but I can't seem to find anything on google about it. Is there a way in SQL (specifically postgresql) that I can order the results based on this?
To be clear, when I say relevance I mean that a given search keyword should be in the title or company name for the result, not just present somewhere in the content.
For example: if you search "Sony" you get Sony Electronics first, not another listing containing Sony somewhere in the middle of its name.
I ended up using a series of Case/When statements that were weighted with various integer scores to apply priority to my results. They work wonderfully and turned out something like this:
SELECT title, company, user, CASE
WHEN upper(company_name) LIKE '%#{word[0].upcase}%' THEN 3
WHEN upper(company_name) LIKE '%#{company_name.upcase}%' THEN 2
ELSE 0 END as score
FROM selected_tables
WHERE company_name LIKE '%#{company_name}%'
ORDER BY score DESC;

Neo4j first node to meet relationship in a movie model

I have read the Neo4j manual and saw the numerous short examples regarding movie graph. I have also installed it locally and played with the cypher.
Here is the setup:
I have the following nodes: Movies (with name and id, owned by friend), Actors(with name and ids) Directors (with names and id), Genre (with id and name)
Relations are: Actors acted in Movies (1 movie - many actors), Directors directed a movie (1 director per movie but a director can direct many movies), and Movies has several genre "(many to many)
1) Owned by friend I dont know why but following the LOAD CSV example they put USA as a node rather than a property but is there a logical reason why its better to put it as a node rather than a property like i did?
2)
What I want to search is similar to the answer given to this question:
Nearest nodes to a give node, assigning dynamically weight to relationship types
However - I do not have a weight on the relationship and its more of a "go find the first give nodes connected to it"
Given that the "owned by friend" can only be owned by 1 person.
If given movie title "Spider-Man" (which for example purpose is owned by frank) go find the next occurrence of a movie that is owned by John.
So after reading Neo4j I believe that I dont need to specify which relationship is needed to traverse but just go find the next movie that meets my criteria, right?
So Following the above link
MATCH (n:Start { title: 'Spider-Man' }),
(n)-[:CONNECTED*0..2]-(x)
RETURN x
So go to node Spider-Man and go find me X as long as it is connected but I got stump by *0..2 because its the range...what if I just say "go find me the first you that means the own by John"
3) following up to #2 - how do i insert the fitler "own by john" ?
There are a number of things in your question that don't quite make sense. Here's a stab at an answer.
1) Making 'USA' a node rather than a property is useful if you want to search based on country. If 'USA' is a node, you are able to limit your search by starting at the 'USA' node. If you don't care to do this, then it doesn't really matter. It may also save a small amount of space for longer country names to store the name once and link to it via relationships.
2) Your example doesn't match your described graph. I can't really speak to it without a better example.
3) This is probably easy to answer once you improve your example.
OK. Based on the comments to the answer, here's what you need. To find one movie owned by John that is connected via common actors, directors, etc to the movie Spider-man owned by Frank (that is, sub-graphs like (movie)<--(actor)-->(movie) ) you can write:
MATCH (n:Movie {title : 'Spider-Man', owned_by : 'Frank'})<-[*2]->(m:Movie {owned_by : 'John'})
RETURN m LIMIT 1
If you want more responses, alter or remove the LIMIT on the RETURN clause. If you want to allow chains that pass through chains like (movie)<--(actor)-->(movie)<--(director)-->(movie), you can increase the number of relationships matched (the *2) to 4, 6, 8, etc. You probably shouldn't just write the relationship part of the MATCH clause as -[*]-, because this could get into infinite loops.

How to query recommendation using Cypher

I'm trying to query Book nodes for recommendation by Cypher.
I want to recommend A:Book and C:Book for A:User.
i'm sorry I need some graph to explain this question, but I could't up graph image because my lepletion lacks for upload function.
I wrote query below.
match (u1:User{uid:'1003'})-->(o1:Order)-->(b1:Book)<--(o2:Order)
<--(u2:User)-->(o3:Order)-->(b2:Book)
return b2
This query return all Books(A,B,C,D) dispite cypher's Uniqueness.
I expect to only return A:Book and C:Book.
Is this behavior Neo4j' specification?
How do I get expected return? Thanks, everyone.
environment:
Neo4j ver.v2.0.0-RC1
Using Neo4j Server with REST API
Without the sample graph its hard to say why you get something back when you expected something else. You can share a sample graph by including a create statement that would generate said graph, or by creating it in Neo4j console and putting the link in your question. Here is an example of the latter: console.neo4j.org/r/fnnz6b
In the meantime, you probably want to declare the type of the relationships in your pattern. If a :User has more than one type of outgoing relationships you will be excluding those other paths based on the labels of the nodes on the other end, which is much less efficient than to only traverse the right relationships to begin with.
To my mind its not clear whether (u:User)-->(o:Order)-->(b:Book) means that a user has one or more orders, and each order consists of one or more books; or if it means only that a user ordered a book. If you can share a sample, hopefully that will be clear too.
Edit:
Great, so looking at the graph: You get B and D back because others who bought B also bought D, and others who bought D also bought B, which is your criterion for recommendation. You can add a filter in the WHERE clause to exclude those books that the user has already bought, something like
WHERE NOT (u1)-[:BUY]->()-[:CONTAINS]->(b2)
This will give you A, C, C back, since there are two matching paths to C. It's probably not important to get two result items for C, so you can either limit the return to give only distinct values
RETURN DISTINCT(b2)
or group the return values by counting the matching paths for each result as a 'recommendation score'
RETURN b2, COUNT(b2) as score
Also, if each order only [CONTAINS] one book, you could try modelling without order, just (:User)-[:BOUGHT]->(:Book).

how to design my dataset using neo4j and gremlin

i have a dataset containg fields like below:
id amount date s_pName s_cName b_pName b_cName
1 100 2/3/2012 IBM IBM_USA Pepsi Pepsi_USA
2 200 21/3/2012 IBM IBM_USA Coke Coke_UK
3 300 12/3/2012 IBM IBM_USA Pepsi Pepsi_USA
4 1100 22/3/2012 Pepsi IBM_Aus IBM IBM_USA
here all 4 fields like s_pName s_cName b_pName b_cName can be saler or buyer.
how to models this dataset in neo4j so that when I query using gremlin like,
select b_CName,id,amount,date from tableName where s_cName = IBM_USA,IBM_AUS;
I noted your question on the gremlin-users mailing list as well (where you provided a bit more information about things you'd tried): https://groups.google.com/forum/#!topic/gremlin-users/AxsF2eJvpOA
I'm sure there are a few ways to approach this modelling issue, so I'll just provide some things to consider and hopefully that will inspire you to solution. First, instead of thinking of buyers and sellers, just think about the fact that you have "companies" that sells things to other companies and that companies have hierarchy (meaning that a company can have a parent). Your model then comes down to:
company --sellsTo--> company
company --parent--> company
Place your transaction amount and date on the "sellsTo" edge creating one such edge per row in your dataset. Create a key index on the "companyName" field of the company vertex so that you can look up the company. Your Gremlin would then be something like:
['IBM_USA','IBM_AUS'].collect{g.V('companyName',it).next()}._().outE('sellsTo').as('tx').inV.as('buyer').select{[it.id, it.amount, it.date]}{it.companyName}
so breaking that down you do a lookup of your two companies you care about by key index on companyName and get them into a pipeline with _(). Then you traverse out to the companies those two companies sold to. You use select to grab the tx (transaction edge) and buyer vertex executing a closure on each of them to transform them into the fields you want which will yield you something like (for one result, your Gremlin would likely return several of these with your full dataset obviously):
[[1,100,2/3/2012],Pepsi_USA]
You could use some Groovy JDK (http://groovy.codehaus.org/groovy-jdk/) operations to transform it further from there if that's not the final format you need.

Resources