how to get solr results in given order specified in query - grails

I have framed query to submit to solr which is of following format.
id:95154 OR id:68209 OR id:89482 OR id:94233 OR id:112481 OR id:93843
i want to get records according to order from starting. say i need to get document with id 95154 document first then id 68209 next and so on. but its not happening right now its giving last id 93843 first and some times random.i am using solr in grails 2.1 and my solr version is 1.4.0. here is sample way i am getting documents from solr
def server = solrService.getServer('provider')
SolrQuery sponsorSolrQuery = new SolrQuery(solarQuery)
def queryResponse = server.query(sponsorSolrQuery);
documentsList = queryResponse.getResults()

As #injecteer mentions, there is nothing built-in to Lucene to consider the sequence of clauses in a boolean query, but:
You are able to apply boosts to each term, and as long as the field is a basic field (meaning, not a TextField), the boosts will apply cleanly to give you a decent sort by score.
id:95154^6 OR id:68209^5 OR id:89482^4 OR id:94233^3 OR id:112481^2 OR id:93843

there's no such thing in Lucene (I strongly assume, that in Solr as well). In Lucene you can sort the results based on contents of documents' fields, but not on the order of clauses in a query.
that means, that you have to sort the results yourself:
documentsList = queryResponse.getResults()
def sordedByIdOrder = solarQueryAsList.collect{ id -> documentList.find{ it.id == id } }

Related

How to get total number of db-hits from Cypher query within a Java code?

I am trying to get total number of db-hits from my Cypher query. For some reason I always get 0 when calling this:
String query = "PROFILE MATCH (a)-[r]-(b)-[p]-(c)-[q]-(a) RETURN a,b,c";
Result result = database.execute(query);
while (result.hasNext()) {
result.next();
}
System.out.println(result.getExecutionPlanDescription().getProfilerStatistics().getDbHits());
The database seems to be ok. Is there something wrong about the way of reaching such value?
ExecutionPlanDescription is a tree like structure. Most likely the top element does not directly hit the database by itself, e.g. a projection.
So you need to write a recursive function using ExecutionPlanDescription.getChildren() to drill to the individual parts of the query plan. E.g. if one of the children (or sub*-children) is a plan of type Expand you can use plan.getProfilerStatistics().getDbHits().

Adding index on neo4j node property value

I have imported freebase dump to neo4j. But currently i am facing issue with get queries because of size of db. While import i just created node index and indexed URI property to index for each node. For each node i am adding multiple properties like label_en, type_content_type_en.
props.put(URI_PROPERTY, subject.stringValue());
Long subjectNode = db.createNode(props);
tmpIndex.put(subject.stringValue(), subjectNode);
nodeIndex.add(subjectNode, props);
Now my cypher queries are like this. Which are timing out. I am unable to add index on label_en property. Can anybody help?
match (n)-[r*0..1]->(a) where n.label_en=~'Hibernate.*' return n, a
Update
BatchInserter db = BatchInserters.inserter("ttl.db", config);
BatchInserterIndexProvider indexProvider = new LuceneBatchInserterIndexProvider(db);
BatchInserterIndex index = indexProvider.nodeIndex("ttlIndex", MapUtil.stringMap("type", "exact"));
Question: When i have added node in nodeindex i have added with property URI
props.put(URI_PROPERTY, subject.stringValue());
Long subjectNode = db.createNode(props);
nodeIndex.add(subjectNode, props);
Later in code i have added another property to node(Named as label_en). But I have not added or updated nodeindex. So as per my understanding lucene does not have label_en property indexed. My graph is already built so i am trying to add index on label_en property of my node because my query is on label_en.
Your code sample is missing how you created your index. But I'm pretty sure what you're doing is using a legacy index, which is based on Apache Lucene.
Your Cypher query is using the regex operator =~. That's not how you use a legacy index; this seems to be forcing cypher to ignore the legacy index, and have the java layer run that regex on every possible value of the label_en property.
Instead, with Cypher you should use a START clause and use the legacy indexing query language.
For you, that would look something like this:
START n=node:my_index_name("label_en:Hibernate.*")
MATCH (n)-[r*0..1]->(a)
RETURN n, a;
Notice the string label_en:Hibernate.* - that's a Lucene query string that says to check that property name for that particular string. Cypher/neo4j is not interpreting that; it's passing it through to Lucene.
Your code didn't provide the name of your index. You'll have to change my_index_name above to whatever you named it when you created the legacy index.

Ascending sort order Index versus descending sort order index when performing OrderBy

I am working on an asp.net mvc web application, and I am using Sql server 2008 R2 + Entity framework.
Now on the sql server I have added a unique index on any column that might be ordered by . for example I have created a unique index on the Sql server on the Tag colum and I have defined that the sort order for the index to be Ascending. Now I have some queries inside my application that order the tag ascending while other queries order the Tag descending, as follow:-
LatestTechnology = tms.Technologies.Where(a=> !a.IsDeleted && a.IsCompleted).OrderByDescending(a => a.Tag).Take(pagesize).ToList(),;
TechnologyList = tms.Technologies.Where(a=> !a.IsDeleted && a.IsCompleted).OrderBy (a => a.Tag).Take(pagesize).ToList();
So my question is whether the two OrderByDescending(a => a.Tag). & OrderBy(a => a.Tag), can benefit from the asending unique index on the sql server on the Tag colum ? or I should define two unique indexes on the sql server one with ascending sort order while the other index with decedning sort order ?
THanks
EDIT
the following query :-
LatestTechnology = tms.Technologies.Where(a=> !a.IsDeleted && a.IsCompleted).OrderByDescending(a => a.Tag).Take(pagesize).ToList();
will generate the following sql statement as mentioned by the sql server profiler :-
SELECT TOP (15)
[Extent1].[TechnologyID] AS [TechnologyID],
[Extent1].[Tag] AS [Tag],
[Extent1].[IsDeleted] AS [IsDeleted],
[Extent1].[timestamp] AS [timestamp],
[Extent1].[TypeID] AS [TypeID],
[Extent1].[StartDate] AS [StartDate],
[Extent1].[IT360ID] AS [IT360ID],
[Extent1].[IsCompleted] AS [IsCompleted]
FROM [dbo].[Technology] AS [Extent1]
WHERE ([Extent1].[IsDeleted] <> cast(1 as bit)) AND ([Extent1].[IsCompleted] = 1)
ORDER BY [Extent1].[Tag] DESC
To answer your question:
So my question is whether the two OrderByDescending(a => a.Tag). &
OrderBy(a => a.Tag), can benefit from the asending unique index on the
sql server on the Tag colum ?
Yes, SQL Server can read an index in both directions: as in index definition or in the exact opposite direction.
However, from your intro I suspect that you still have a wrong impression how indexing works for order by. If you have both, a where clause and an order by clause, you must make sure to have a single index that covers both clauses! It does not help to have on index for the where clause (like on isDeleted and isCompleted — whatever that is in your example) and another index on tag. You need to have a single index that first has the columns of the where clause followed by the columns of the order by clause (multi-column index).
It can be tricky to make it work correctly, but it's worth the effort especially if your are only fetching the first few rows (like in your example).
If it doesn't work out right away, please have a look at this:
http://use-the-index-luke.com/sql/sorting-grouping/indexed-order-by
It is generally best to show the actual SQL query—not the .NET source code—when asking for performance advice. Then I could tell you which index to create exactly. At the moment I'm unsure about isDeleted and isCompleted — are these table columns or expressions that evaluate upon other columns?
EDIT (after you added the SQL query)
There are two ways to make your query work as indexed top-n query:
http://sqlfiddle.com/#!6/260fb/4
The first option is a regular index on the columns from the where clause followed by those from the order by clause. However, as you query uses this filter IsDeleted <> cast(1 as bit) it cannot use the index in a order-preserving way. If, however, you re-phrase the query so that it reads like this IsDeleted = cast(0 as bit) then it works. Please look at the fiddle, I've prepared everything there. Yes, SQL Server could be smart enough to know that, but it seems like it isn't.
I don't know how to tweak EF to produce the query in the above described way, sorry.
However, there is a second option using a so called filtered index — that is an index that only contains a sub-set of the table rows. It's also in the SQL Fiddle. Here it is important that you add the where clause to the index definition in the very same way as it appears in your query.
In both ways it still works if you change DESC to ASC.
The important part is that the execution plan doesn't show a sort operation. You can also verify this in SQL Fiddle (click on 'View execution plan').

Can Neo4j be effectively used to show a collection of nodes in a sortable and filterable table?

I realise this may not be ideal usage, but apart from all the graphy goodness of Neo4j, I'd like to show a collection of nodes, say, People, in a tabular format that has indexed properties for sorting and filtering
I'm guessing the Type of a node can be stored as a Link, say Bob -> type -> Person, which would allow us to retrieve all People
Are the following possible to do efficiently (indexed?) and in a scalable manner?
Retrieve all People nodes and display all of their names, ages, cities of birth, etc (NOTE: some of this data will be properties, some Links to other nodes (which could be denormalised as properties for table display's and simplicity's sake)
Show me all People sorted by Age
Show me all People with Age < 30
Also a quick how to do the above (or a link to some place in the docs describing how) would be lovely
Thanks very much!
Oh and if the above isn't a good idea, please suggest a storage solution which allows both graph-like retrieval and relational-like retrieval
if you want to operate on these person nodes, you can put them into an index (default is Lucene) and then retrieve and sort the nodes using Lucene (see for instance How do I sort Lucene results by field value using a HitCollector? on how to do a custom sort in java). This will get you for instance People sorted by Age etc. The code in Neo4j could look like
Transaction tx = neo4j.beginTx();
idxManager = neo4j.index()
personIndex = idxManager.forNodes('persons')
personIndex.add(meNode,'name',meNode.getProperty('name'))
personIndex.add(youNode,'name',youNode.getProperty('name'))
tx.success()
tx.finish()
'*** Prepare a custom Lucene query context with Neo4j API ***'
query = new QueryContext( 'name:*' ).sort( new Sort(new SortField( 'name',SortField.STRING, true ) ) )
results = personIndex.query( query )
For combining index lookups and graph traversals, Cypher is a good choice, e.g.
START people = node:people_index(name="E*") MATCH people-[r]->() return people.name, r.age order by r.age asc
in order to return data on both the node and the relationships.
Sure, that's easily possible with the Neo4j query language Cypher.
For example:
start cat=node:Types(name='Person')
match cat<-[:IS_A]-person-[born:BORN]->city
where person.age > 30
return person.name, person.age, born.date, city.name
order by person.age asc
limit 10
You can experiment with it in our cypher console.

Search records having comma seperated values that contains any element from the given list

I have a domain class Schedule with a property 'days' holding comma separated values like '2,5,6,8,9'.
Class Schedule {
String days
...
}
Schedule schedule1 = new Schedule(days :'2,5,6,8,9')
schedule1.save()
Schedule schedule2 = new Schedule(days :'1,5,9,13')
schedule2.save()
I need to get the list of the schedules having any day from the given list say [2,8,11].
Output: [schedule1]
How do I write the criteria query or HQL for the same. We can prefix & suffix the days with comma like ',2,5,6,8,9,' if that helps.
Thanks,
Hope you have a good reason for such denormalization - otherwise it would be better to save the list to a child table.
Otherwise, querying would be complicated. Like:
def days = [2,8,11]
// note to check for empty days
Schedule.withCriteria {
days.each { day ->
or {
like('username', "$day,%") // starts with "$day"
like('username', "%,$day,%")
like('username', "%,$day") // ends with "$day"
}
}
}
In MySQL there is a SET datatype and FIND_IN_SET function, but I've never used that with Grails. Some databases have support for standard SQL2003 ARRAY datatype for storing arrays in a field. It's possible to map them using hibernate usertypes (which are supported in Grails).
If you are using MySQL, FIND_IN_SET query should work with the Criteria API sqlRestriction:
http://grails.org/doc/latest/api/grails/orm/HibernateCriteriaBuilder.html#sqlRestriction(java.lang.String)
Using SET+FIND_IN_SET makes the queries a bit more efficient than like queries if you care about performance and have a real requirement to do denormalization.

Resources