I have a compound index as follows.
index({ account_id: 1, is_private: 1, visible_in_list: 1, sent_at: -1, user_id: 1, status: 1, type: 1, 'tracking.last_opened_at' => -1 }, {name: 'email_page_index'})
Then I have a query with these exact fields,
selector:
{"account_id"=>BSON::ObjectId('id'), "is_private"=>false, "visible_in_list"=>{:$in=>[true, false]}, "status"=>{:$in=>["ok", "queued", "processing", "failed"]}, "sent_at"=>{"$lte"=>2021-03-22 15:29:18 UTC}, "tracking.last_opened_at"=>{"$gt"=>1921-03-22 15:29:18 UTC}, "user_id"=>BSON::ObjectId('id')}
options: {:sort=>{"tracking.last_opened_at"=>-1}}
The winningPlan is the following
"inputStage": {
"stage": "SORT_KEY_GENERATOR",
"inputStage": {
"stage": "FETCH",
"filter": {
"$and": [
{
"account_id": {
"$eq": {
"$oid": "objectid"
}
}
},
{
"is_private": {
"$eq": false
}
},
{
"sent_at": {
"$lte": "2021-03-22T14:06:10.000Z"
}
},
{
"tracking.last_opened_at": {
"$gt": "1921-03-22T14:06:10.716Z"
}
},
{
"status": {
"$in": [
"failed",
"ok",
"processing",
"queued"
]
}
},
{
"visible_in_list": {
"$in": [
false,
true
]
}
}
]
},
"inputStage": {
"stage": "IXSCAN",
"keyPattern": {
"user_id": 1
},
"indexName": "user_id_1",
"isMultiKey": false,
"multiKeyPaths": {
"user_id": []
},.....
And the rejected plan has the compound index and forms as follows
"rejectedPlans": [
{
"stage": "FETCH",
"inputStage": {
"stage": "SORT",
"sortPattern": {
"tracking.last_opened_at": -1
},
"inputStage": {
"stage": "SORT_KEY_GENERATOR",
"inputStage": {
"stage": "IXSCAN",
"keyPattern": {
"account_id": 1,
"is_private": 1,
"visible_in_list": 1,
"sent_at": -1,
"user_id": 1,
"status": 1,
"type": 1,
"tracking.last_opened_at": -1
},
"indexName": "email_page_index",
"isMultiKey": false,
"multiKeyPaths": {
"account_id": [],
"is_private": [],
"visible_in_list": [],
"sent_at": [],
"user_id": [],
"status": [],
"type": [],
"tracking.last_opened_at": []
},
"isUnique": false,
The problem is that the winningPlan is slow, wouldn't be better if mongoid choose the compound index? Is there a way to force it?
Also, how can I see the execution time for each separate STAGE?
I am posting some information that can help resolve the issue of performance and use an appropriate index. Please note this may not be the solution (and the issue is open to discussion).
...Also, how can I see the execution time for each separate STAGE?
For this, generate the query plan using the explain with the executionStats verbosity mode.
The problem is that the winningPlan is slow, wouldn't be better if
mongoid choose the compound index? Is there a way to force it?
As posted the plans show a "stage": "SORT_KEY_GENERATOR", implying that the sort operation is being performed in the memory (that is not using an index for the sort). That would be one (or main) of the reasons for the slow performance. So, how to make the query and the sort use the index?
A single compound index can be used for a query with a filter+sort operations. That would be an efficient index and query. But, it requires that the compound index be defined in a certain way - some rules need to be followed. See this topic on Sort and Non-prefix Subset of an Index - as is the case in this post. I quote the example from the documentation for illustration:
Suppose there is a compound index: { a: 1, b: 1, c: 1, d: 1 }
And, all the fields are used in a query with filter+sort. The ideal query is, to have a filter+sort as follows:
db.test.find( { a: "val1", b: "val2", c: 1949 } ).sort( { d: 1 })
Note the query filter has three fields with equality condition (there are no $gt, $lt, etc.). Then the query's sort has the last field d of the index. This is the ideal situation where the index will be used for the query''s filter as well as sort operations.
In your case, this cannot be applied from the posted query. So, to work towards a solution you may have to define a new index so as to take advantage of the rule Sort and Non-prefix Subset of an Index.
Is it possible? It depends upon your application and the use case. I have an idea like this and it may help. Create a compound index like the follows and see how it works:
account_id: 1,
is_private: 1
visible_in_list: 1,
status: 1,
user_id: 1,
'tracking.last_opened_at': -1
I think having a condition "tracking.last_opened_at"=>{"$gt"=>1921-03-22 15:29:18 UTC}, in the query''s filter may not help for the usage of the index.
Also, include some details like the version of the MongoDB server, the size of collection and some platform details. In general, query performance depends upon many factors, including, indexes, RAM memory, size and type of data, and the kind of operations on the data.
The ESR Rule:
When using compound index for a query with multiple filter conditions and sort, sometimes the Equality Sort Range rule is useful for optimizing the query. See the following post with such a scenario: MongoDB - Index not being used when sorting and limiting on ranged query
Related
I am trying to get only the matched data from nested array of elastic search class. but I am not able to get it..the whole nested array data is being returned as output.
this is my Query:-
QueryBuilders.nestedQuery("questions",
QueryBuilders.boolQuery()
.must(QueryBuilders.matchQuery("questions.questionTypeId", quesTypeId)), ScoreMode.None)
.innerHit(new InnerHitBuilder());
I am using querybuilders to get data from nested class.Its working fine but not able to get only the matched data.
Request Body :
{
"questionTypeId" : "MCMC"
}
when questionTypeId = "MCMC"
this is the output i am getting..Here I want to exclude the output for which the questionTypeId = "SCMC".
output :
{
"id": "46",
"subjectId": 1,
"topicId": 1,
"subtopicId": 1,
"languageId": 1,
"difficultyId": 4,
"isConceptual": false,
"examCatId": 3,
"examId": 1,
"usedIn": 1,
"questions": [
{
"id": "46_31",
"pid": 31,
"questionId": "QID41336691",
"childId": "CID1",
"questionTypeId": "MCMC",
"instruction": "This is a single correct multiple choice question.",
"question": "Who holds the most english premier league titles?",
"solution": "Manchester United",
"status": 1000,
"questionTranslation": []
},
{
"id": "46_33",
"pid": 33,
"questionId": "QID41336677",
"childId": "CID1",
"questionTypeId": "SCMC",
"instruction": "This is a single correct multiple choice question.",
"question": "Who holds the most english premier league titles?",
"solution": "Manchester United",
"status": 1000,
"questionTranslation": []
}
]
}
As you have tagged this with spring-data-elasticsearch:
Support to return inner hits was recently added to version 4.1.M1 and so will be included in the next released version. Then in a SearchHit you will get the complete top level document, but in the innerHits property only the matching inner hits will be returned.
I have this relatively complex search query that's already being built and working with perfect sorting.
But I think here searching is slow just because of script so all I want to remove script and write query accordingly.
current code :-
"sort": [
{
"_script": {
"type": "number",
"script": {
"lang": "painless",
"source": "double pscore = 0;for(id in params.boost_ids){if(params._source.midoffice_master_id == id){pscore = -999999999;}}return pscore;",
"params": {
"boost_ids": [
3,
4,
5
]
}
}
}
}]
Above code explaination:-
For example, if a match query would give a result like:
[{m_id: 1, name: A}, {m_id: 2, name: B}, {m_id: 3, name: C}, {m_id: 4, name: D}, ...]
So I want to boost document with m_id array [3, 4, 5] which would then transform the result into:
[{m_id: 3, name: C}, {m_id: 4, name: D}, {m_id: 1, name: A}, {m_id: 2, name: B}, ...]
You can make use of the below query using Function Score Query(for boosting) and Terms Query (used to query array of values)
Note that the logic I've mentioned is in the should clause of the bool query.
POST <your_index_name>/_search
{
"query": {
"bool": {
"must": [
{
"match_all": {} //just a sample must clause to retrieve all docs
}
],
"should": [
{
"function_score": { <---- Function Score Query
"query": {
"terms": { <---- Terms Query
"m_id": [
3,4,5
]
}
},
"boost": 100 <---- Boosting value
}
}
]
}
}
}
So basically, you can remove the sort logic completely and add the above function query in your should clause, which would give you the results in the order you are looking for.
Note that you'd have to find a way to add the logic correctly in case if you have much complex query, and if you are struggling with anything, do let me know. I'd be happy to help!!
Hope this helps!
I am new to Neo4j. I am trying to populate Yelp dataset in Neo4j. Basically, I am interested in three json file provided by them i.e.
user.json
{
"user_id": "-lGwMGHMC_XihFJNKCJNRg",
"name": "Gabe",
"review_count": 277,
"yelping_since": "2014-10-31",
"friends": ["Oa84FFGBw1axX8O6uDkmqg", "SRcWERSl4rhm-Bz9zN_J8g", "VMVGukgapRtx3MIydAibkQ", "8sLNQ3dAV35VBCnPaMh1Lw", "87LhHHXbQYWr5wlo5W7_QQ"],
"useful": 45,
"funny": 4,
"cool": 55,
"fans": 17,
"elite": [],
"average_stars": 4.72,
"compliment_hot": 5,
"compliment_more": 1,
"compliment_profile": 0,
"compliment_cute": 1,
"compliment_list": 0,
"compliment_note": 11,
"compliment_plain": 20,
"compliment_cool": 15,
"compliment_funny": 15,
"compliment_writer": 1,
"compliment_photos": 8
}
I have omitted several entries from friends array to make output readable
business.json
{
"business_id": "YDf95gJZaq05wvo7hTQbbQ",
"name": "Richmond Town Square",
"neighborhood": "",
"address": "691 Richmond Rd",
"city": "Richmond Heights",
"state": "OH",
"postal_code": "44143",
"latitude": 41.5417162,
"longitude": -81.4931165,
"stars": 2.0,
"review_count": 17,
"is_open": 1,
"attributes": {
"RestaurantsPriceRange2": 2,
"BusinessParking": {
"garage": false,
"street": false,
"validated": false,
"lot": true,
"valet": false
},
"BikeParking": true,
"WheelchairAccessible": true
},
"categories": ["Shopping", "Shopping Centers"],
"hours": {
"Monday": "10:00-21:00",
"Tuesday": "10:00-21:00",
"Friday": "10:00-21:00",
"Wednesday": "10:00-21:00",
"Thursday": "10:00-21:00",
"Sunday": "11:00-18:00",
"Saturday": "10:00-21:00"
}
}
review.json
{
"review_id": "VfBHSwC5Vz_pbFluy07i9Q",
"user_id": "-lGwMGHMC_XihFJNKCJNRg",
"business_id": "YDf95gJZaq05wvo7hTQbbQ",
"stars": 5,
"date": "2016-07-12",
"text": "My girlfriend and I stayed here for 3 nights and loved it.",
"useful": 0,
"funny": 0,
"cool": 0
}
As we can see in the sample files that relationship between user and business is associated via the review.json file. How can I create a relationship edge between user and business using the review.json file.
I have also seen Mark Needham tutorial where he has shown StackOverflow data population but in that case, relationship file was already present with sample data. Do I need to build a similar file? If yes, how should I approach this problem? or is there any other way to build relationship between user & business?
It very much depends on your model as to what you want to do, but you could do 3 imports:
//Create Users - does assume the data is unique
CALL apoc.load.json('file:///c://temp//SO//user.json') YIELD value AS user
CREATE (u:User)
SET u = user
then add the businesses:
CALL apoc.load.json('file:///c://temp//SO//business.json') YIELD value AS business
CREATE (b:Business {
business_id : business.business_id,
name : business.name,
neighborhood : business.neighborhood,
address : business.address,
city : business.city,
state : business.state,
postal_code : business.postal_code,
latitude : business.latitude,
longitude : business.longitude,
stars : business.stars,
review_count : business.review_count,
is_open : business.is_open,
categories : business.categories
})
For the businesses, we can't just do the SET b = business because the JSON has nested maps. So you might want to decide if you want them, and might have to go down a different route.
Lastly, the reviews, which is where we join it all up.
CALL apoc.load.json('file:///c://temp//SO//review.json') YIELD value AS review
CREATE (r:Review)
SET r = review
WITH r
//Match user to a review
MATCH (u:User {user_id: r.user_id})
CREATE (u)-[:HAS_REVIEW]->(r)
WITH r, u
//Match business to a review, and a user to a business
MATCH (b:Business {business_id: r.business_id})
//Merge here in case of multiple reviews
MERGE (u)-[:HAS_REVIEWED]->(b)
CREATE (b)-[:HAS_REVIEW]->(r)
Obviously - change labels/relationships to types you want, and it might need tuning depending on the size of data etc, so you might need to use apoc.periodic.iterate to work it.
Apoc is here if you need it (and you should use it!)
I am trying to use cypher to perform the query in full text index. It returns results, but they are not ranked. Is there a way to get the match score?
start recordEmployee=node:fidx_RecordEmployee("F01:Leela* OR F01:Ph*") return recordEmployee.F01
Returns this, and I cannot find match score:
{
"results": [
{
"columns": [
"recordEmployee.F01"
],
"data": [
{
"row": [
"Philip"
],
"graph": {
"nodes": [],
"relationships": []
}
},
{
"row": [
"Leela"
],
"graph": {
"nodes": [],
"relationships": []
}
}
],
"stats": {
"contains_updates": false,
"nodes_created": 0,
"nodes_deleted": 0,
"properties_set": 0,
"relationships_created": 0,
"relationship_deleted": 0,
"labels_added": 0,
"labels_removed": 0,
"indexes_added": 0,
"indexes_removed": 0,
"constraints_added": 0,
"constraints_removed": 0
}
}
],
"errors": []
}
It's not possible in Cypher yet, but with stored procedures in Neo4j 3.0 it will be again.
Until then if you really need the score you can use the REST endpoint.
http://neo4j.com/docs/stable/rest-api-indexes.html#rest-api-find-node-by-query
Getting the results with a predefined ordering requires adding the
request parameter
?order=<ordering>
where <ordering> is one of index, relevance or score. In this case an
additional field will be added to each result, named score, that holds
the float value that is the score reported by the query result.
I'm experimenting with using Falcor to front the Guild Wars 2 API and want to use it to show game item details. I'm especially interested in building a router that can use multiple datasources to combine the results of different APIs.
The catch is, Item IDs in Guild Wars 2 aren't contiguous. Here's an example:
[
1,
2,
6,
11,
24,
56,
...
]
So I can't just write paths on the client like items[100..120].name because there's almost certainly going to be a bunch of holes in that list.
I've tried adding a route to my router so I can just request items, but that sends it into an infinite loop on the client. You can see that attempt on GitHub.
Any pointers on the correct way to structure this? As I think about it more maybe I want item.id instead?
You shouldn't find your self asking for ids from a Falcor JSON Graph object.
It seems like you want to build an array of game ids:
{
games: [
{ $type: "ref", value: ["gamesById", 352] },
{ $type: "ref", value: ["gamesById", 428] }
// ...
],
gamesById: {
352: {
gameProp1: ...,
},
428: {
gameProp2: ...
}
}
}
[games, {from: 5, to: 17 }, "gameProp1"]
Does that work?
You can use 'get' API of Falcor, It retrieves multiple values.. You can pass any number of required properties as shown below
var model=new falcor.Model({
cache:{
genereList:[
{name:"Recently Watched",
titles:[
{id:123,
name: "Ignatius",
rating: 4}
]
},
{name:"New Release",
titles:[
{id:124,
name: "Jessy",
rating: 3}
]
}
]
}
});
Getting single value
model.getValue('genereList[0].titles[0].name').
then(function(value){
console.log(value);
});
Getting multiple values
model.get('genereList[0..1].titles[0].name', 'genereList[0..1].titles[0].rating').
then(function(json){
console.log(JSON.stringify(json, null, 4));
})