Is JOIN possible between two indexes on different ports in SOLR? - join

JOIN can be made between multiple cores(Is Solr 4.0 capable of using 'join" for multiple core?) but is it possible to JOIN 2 cores but both are present at different ports?
For example:
Instance 1: http://example:8983/solrInd1/#/person/
Instance 2: http://example:9097/solrInd2/#/engineers/
I want to get age, qualification etc from person index and engineering information from engineers index.
Thanks

The short answer is no for solr 4 out of the box, but this might change in the future. And longer answer is yes, by roll your own join plugin, or perform the join on the client(like mongodb). In your example, it might be easier to use multiple cores in a single solr instance.

Related

Joining three tables in solr

I am trying to join three cassandra tables using Solr. According to datastax documentation ,
DataStax Enterprise 4.0.2 and later supports the OS Solr query time
join through a custom implementation. You can join Solr documents,
including those having different Solr cores under these conditions:
Solr cores need to have the same keyspace and same Cassandra partition key.
Both Cassandra tables that support the Solr cores to be joined have to be either Thrift- or CQL-compatible. You cannot have one
that is Thift-compatible and one that is CQL-compatible.
The type of the unique key (Cassandra key validator of the partition key) are the same.
The order of table partition keys and schema unique keys are the same.
I could join two tables as per the documentation. I was wondering whether I can join three tables together satisfying the conditions. Couldn't find any documentation on joining more than two tables. I desperately need to join three tables. Is it really possible or should I drop the idea right now?
What I needed was a recursive join. From the same documentation as the above example, I could find this example:
Use a nested join query to recursively join the songs and lyrics documents with the videos document, and to select the song that mentions love and also won a video award.
http://localhost:8983/solr/internet.songs/select/?q=
{!join+fromIndex=internet.lyrics}words:love AND _query_:
{!join+fromIndex=internet.videos}award:true&indent=true&wt=json
Output is:
"response":{"numFound":1,"start":0,"docs":[
{
"song":"a3e64f8f-bd44-4f28-b8d9-6938726e34d4",
"title":"Dangerous",
"artist":"Big Data"}]
}}

solr join vs lucene join

I am trying to find how Solr join compares with respect to the Lucene joins. Specifically, if Lucene joins uses any filter cache during the JOIN operation. I looked into code and it seems that in the QParser there is a reference to cache, but I am not sure if it's a filter cache. If somebody has any experience on this, please do share, or please tell me how can I find that.
The Solr join wiki states
Fields or other properties of the documents being joined "from" are not available for use in processing of the resulting set of "to" documents (ie: you can not return fields in the "from" documents as if they were a multivalued field on the "to" documents).
I am finding it hard to understand the above limitation of solr join,does it means that unlike the traditional RDMS joins that can have columns from both the TO and FROM field, solr joins will only have fields from the TO documents ? Is my understanding correct ? If yes, then why this limitation ?
Also, there's some difference with respect to scoring too and towards that the wiki says
The Join query produces constant scores for all documents that match -- scores computed by the nested query for the "from" documents are not available to use in scoring the "to" documents
Does it mean the subquery's score is not available the main query? If so again why solr scoring took this approach ?
If there are any other differences that are worth considering when moving from Lucene join to Solr, please share.
this post is quite old, but I jump on it. Sorry if it's not active any more.
To tell the truth, it's far better to avoid the join strategy on solr/lucene. You have to think as object as a whole, joining is much an SQL approch that is not close to the phylosophy of SOLR.
Despite that, solr enables very limited joins operations. Take a look to this very good reference join solr lucene! And also this document about the block join support in solr

In Lucene/Solr what is the difference between Join and BlockJoin?

Join is described as pseudo-Join, because it's more equivalent to an SQL inner-query.
Whereas BlockJoin is described as more like a SQL join but requiring a sophisticated indexing schema, one that anticipates all the possible joins you'd want to make.
Could someone explain the difference between these features in terms of how to implement them at index time and query time. And what are the implications for performance?
I don't think blockjoinquery is a Solr function. I think its Lucene feature.
The solr join doesn't score documents in the from query and it doesn't return combined results. So its best used as a filter query. This will allow the main query.to score.
Block join on the other hand does use scoring and returns both results.( not 100% sure)
You can also use querytime join. This has serval scoring options. This is also a lucene feature but doesn't require special indexing blocks. I've used this in combination with a solr query parser plugin. The performance is a bit lower then blockjoin but it Works.
I have only used solr join and querytimejoin So I can't really say much about blockjoin.
As I understand, BlockJoin is for joining against nested/child documents within the same core. Join is for joining against a separate core.

SOLR : joining from separate indices

Is there a good way to join data from SOLR indices ? Of course, I assume server side support for this is limited, but i want to do it client side (right now, im manually doing this in java, using hashmaps and loops .... Im assuming there might be a better way to combine data from different indices).
If with join you mean relational joining a la SQL, then no.
If with join you mean merging then server side support is far from limited.
What you are looking for is index sharding.
This is not "fast" since searches are distributed and then merged, but it scales really well.
Give a read to the following articles:
http://lucidworks.lucidimagination.com/display/solr/Distributed+Search+with+Index+Sharding
http://wiki.apache.org/solr/DistributedSearch

Order Solr results by degrees of friendship

I am currently using Solr 1.4 (soon to upgrade to 3.3). The friendship table is pretty standard:
id | follower_id | user_id
I would like to perform a regular keyword solr search and order the results by degrees of separation as well as the standard score ordering. From the result set, given the keyword matched any of my immediate friends, they would show up first. Secondly would be the friends of my friends, and thirdly friends by 3rd degree of separation. All other results would come after.
I am pretty sure Solr doesn't offer any 'pre-baked' way of doing this therefore I would likely have to do a join on MySQL to properly order the results. Curious if anyone has done this before and/or has some insights.
It's simply not possible in Solr. However, if you aren't too restricted and could use another platform for this, consider neo4j?
This "connections" and degrees is exactly where Neo4j steps in.
http://neo4j.org/
One way might be to create fields like degree_1, degree_2 etc. and store the list of friends at degree x in the field degree_x. Then you could fire multiple queries - the first restricting the results to those who have you in degree_1, the second restricting the results to those who have you in degree_2 and so on.
It is a bit complicated, but the only solution I could think of using Solr.
I haven't represented a graph in solr before, but I think at a high level, this is what you could do. First, represent people as nodes and the social network as a graph in the database. Implement transitive closure function in sql to allow you to walk the graph. Then you would index the result into solr with the social network info stored into payloads, for example.
I was able to achieve this by performing multiple queries and with the scope "with" to restrict to the id's of colleagues, 2nd and 3rd degree colleagues, using the id's and using mysql to do the select.
#search_1 = perform_search(1, options)
#search_2 = perform_search(2, options)
if degree == 1
with(:id).any_of(options[:colleague_ids])
elsif degree == 2
with(:id).any_of(options[:second_degree_colleagues])
end
It's kinda of a dirty solution as I have to perform multiple solr queries, but until I can use dynamic field sorting options (solr 3.3, not currently supported by sunspot) I really don't know any other way to achieve this.

Resources