Bulk Data Delete in elasticsearch - ruby-on-rails

This is my code:
HTTParty.delete("http://#{SERVER_DOMAIN}:9200/monitoring/mention_reports/_query?q=id:11321779,11321779", {
})
I want to delete data in bulk using id but this query is not deleting data from elasticsearch
Can anyone help me figuring out how can I delete data in bulk?

index_name should be as provided as per the index name in your code. Provide the ids to be deleted in the array(1,2,3).
CGI::escape is the URL encoder.
HTTParty.delete "http://#{SERVER_DOMAIN}:9200/index_name/_query?source=#{CGI::escape("{\"terms\":{\"_id\":[1,2,3]}}")}"
This actually uses the delete by query api of elasticsearch.

Incase if you are using tire ruby client to connect to elasticsearch:
id_array = [1,2,3]
query = Tire.search do |search|
search.query { |q| q.terms :_id, id_array }
end
index = Tire.index('<index_name>') # provide the index name as you have in your code
Tire::Configuration.client.delete "#{index.url}/_query?source=#{Tire::Utils.escape(query.to_hash[:query].to_json)}"
Reference: https://github.com/karmi/tire/issues/309

Provision is provided using: https://www.elastic.co/guide/en/elasticsearch/reference/1.4/docs-bulk.html
{ "index" : { "_index" : "test", "_type" : "type1", "_id" : "1" } }
{ "field1" : "value1" }
{ "delete" : { "_index" : "test", "_type" : "type1", "_id" : "2" } }
{ "create" : { "_index" : "test", "_type" : "type1", "_id" : "3" } }
{ "field1" : "value3" }
{ "update" : {"_id" : "1", "_type" : "type1", "_index" : "index1"} }
{ "doc" : {"field2" : "value2"} }
OR
curl -XPOST 'localhost:9200/customer/external/_bulk?pretty' -d '
{"update":{"_id":"1"}}
{"doc": { "name": "John Doe becomes Jane Doe" } }
{"delete":{"_id":"2"}}
'
Refer : How to handle multiple updates / deletes with Elasticsearch?

Related

How to boost the closest created_at field in Elasticsearch?

I want to sort my query results following some boost rules and in the same time i want them to be sorted as possible by creation date, if i add a created_at sort, it changes everything and my results are not relevant anymore. So i guess the only way to do that is to boost created_at field (the newest has the biggest bonus in calculating score for that boost) but i dont know how to implement it. This is my query:
query = {
"query" : {
"bool" : {
"must" : [
{
"range" : {
"deadline" : {
"gte" : "2016-05-30T11:39:10+02:00"
}
}
},
{
"terms" : {
"state" : [
"open"
]
}
},
{
"query_string" : {
"query" : "chant",
"default_operator" : "AND",
"analyzer" : "search_francais",
"fields" : [
"title^6",
"description",
"brand",
"category_name"
]
}
}
]
}
},
"filter" : {
"and" : [
{
"geo_distance" : {
"distance" : "40km",
"location" : {
"lat" : 48.855736,
"lon" : 2.32927300000006
}
}
}
]
},
"sort" : [
{
"_score" : "desc"
},
#{
# "created_at" : "desc" ==> i tried this but it doesnt change results
#}
]
}
Try adding your condition in should block.
i)If the created date should be closer to come value in the search query or you have any idea on how close the date should be, give a range query.
ii) If you are not sure of all those values, decay function can be used. In this case, query shall be changed to function query.
{
"query" : {
"bool" : {
"must" : [
{
"range" : {
"deadline" : {
"gte" : "2016-05-30T11:39:10+02:00"
}
}
},
{
"terms" : {
"state" : [
"open"
]
}
},
{
"query_string" : {
"query" : "chant",
"default_operator" : "AND",
"analyzer" : "search_francais",
"fields" : [
"title^6",
"description",
"brand",
"category_name"
]
}
}
],
"should": [
{"created_at" : "condition here .. "}
]
}
},
"filter" : {
"and" : [
{
"geo_distance" : {
"distance" : "40km",
"location" : {
"lat" : 48.855736,
"lon" : 2.32927300000006
}
}
}
]
}
}

How to check elasticsearch tokens after running a query in Rails?

My problem is the following:
I run an elasticsearch query in a rails app using specific settings to my index and my search analyzer, the problem is that it doesnt return any results in the app, in the other hand when i try to run it directly from my elasticsearch docker, i have tokens returned. If i use these tokens in my app query, i get results...
so this is my elasticsearch query:
curl -XGET 'localhost:9200/development-stoot-services/_analyze?analyzer=search_francais' -d 'cours de guitare'
{"tokens":[{"token":"cour","start_offset":0,"end_offset":5,"type":"<ALPHANUM>","position":1},{"token":"guitar","start_offset":9,"end_offset":16,"type":"<ALPHANUM>","position":3}]}
here is the query from my rails app to elasticsearch:
query = {
"query" : {
"bool" : {
"must" : [
{
"range" : {
"deadline" : {
"gte" : "2016-05-26T10:27:19+02:00"
}
}
},
{
"terms" : {
"state" : [
"open"
]
}
},
{
"query_string" : {
"query" : "cours de guitare",
"default_operator" : "AND",
"fields" : [
"title",
"description",
"brand",
"category_name"
]
}
}
]
}
},
"filter" : {
"and" : [
{
"geo_distance" : {
"distance" : "40km",
"location" : {
"lat" : 48.855736,
"lon" : 2.32927300000006
}
}
}
]
},
"sort" : [
{
"created_at" : "desc"
}
]
}
the last query does not return any result, but if i try a query with the tokens returned by elasticsearch ('cour', 'guitar') i have expected results. So i guess there is a problem between rails and elasticsearch that i dont find...
Can anyone help on that ?
Try to modify your query like this, i.e. you need to specify the search_francais analyzer in your query_string in order to analyze cours de guitare the same way you did with the _analyze endpoint:
...
{
"query_string" : {
"query" : "cours de guitare",
"default_operator" : "AND",
"analyzer": "search_francais", <--- add this line
"fields" : [
"title",
"description",
"brand",
"category_name"
]
}
},
...

Tire gem: How to access Elasticsearch's 'highlight' property?

I have some Rails models that are indexed in Elasticsearch (via Tire gem). I can index new documents and query the existing index.
What I can't seem to do is get ahold of the highlight attached to a record from within my Rails app. I can however see that highlight is returned in the json when I interact with Elasticsearch directly via curl.
When I try to access the highlight property of my record I get: undefined method 'highlight' for #<Report:0x007fe8afa54700>
# app/views/reports/index.html.haml
%h1 Listing reports
...
- #reports.results.each do |report|
%tr
%td= report.title
%td= raw report.highlight.attachment.first.to_s
But if I use curl I can see the highlight is returned to Tire...
$ curl -X GET "http://localhost:9200/testapp_development_reports/report/_search?load=true&pretty=true" -d '{query":{"query_string":{"query":"contains","default_operator":"AND"}},"highlight":{"fields":{"attachment":{}}}}'
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"failed" : 0
},
"hits" : {
"total" : 2,
"max_score" : 0.111475274,
"hits" : [ {
"_index" : "testapp_development_reports",
"_type" : "report",
"_id" : "1",
"_score" : 0.111475274, "_source" : {"id":1,"title":"Sample Number One",...,"attachment":"JVBERi0xMJ1Ci... ...UlRU9GCg==\n"},
"highlight" : {
"attachment" : [ "\nThis <em>contains</em> one\n\nodd\n\n\n" ]
}
}, {
"_index" : "testapp_development_reports",
"_type" : "report",
"_id" : "2",
"_score" : 0.111475274, "_source" : {"id":2,"title":"Number two",...,"attachment":"JVBERi0xLKM3OA... ...olJVPRgo=\n"},
"highlight" : {
"attachment" : [ "\nThis <em>contains</em> two\n\neven\n\n\n" ]
}
} ]
}
}
The search method in the model:
...
def self.search(params)
tire.search(load: true) do
query { string params[:query], default_operator: "AND" } if params[:query].present?
highlight :attachment
end
end
...
Method highlight is inaccessible when you are using load: true option. This should be fixed in future versions of Tire.
edit: you can use each_with_hit method to access returned elasticsearch values now
For example:
results = Article.search 'One', :load => true
results.each_with_hit do |result, hit|
puts "#{result.title} (score: #{hit['_score']})"
end
You can find my answer right at this post
Elasticsearch/Lucene highlight
My method works fine for me and wish you can get it work as well.

Rails & Mongoid unique results

Consider the following example of mongo collection:
{"_id" : ObjectId("4f304818884672067f000001"), "hash" : {"call_id" : "1234"}, "something" : "AAA"}
{"_id" : ObjectId("4f304818884672067f000002"), "hash" : {"call_id" : "1234"}, "something" : "BBB"}
{"_id" : ObjectId("4f304818884672067f000003"), "hash" : {"call_id" : "1234"}, "something" : "CCC"}
{"_id" : ObjectId("4f304818884672067f000004"), "hash" : {"call_id" : "5555"}, "something" : "DDD"}
{"_id" : ObjectId("4f304818884672067f000005"), "hash" : {"call_id" : "5555"}, "something" : "CCC"}
I would like to query this collection and get only the first entry for each "call_id", in other words i'm trying to get unique results based on "call_id".
I tried to use .distinct method:
#result = Myobject.all.distinct('hash.call_id')
but the resulting array will contain only the unique call_id fields:
["1234", "5555"]
and I need all the other fields too.
Is it possible to make a query like this one?:
#result = Myobject.where('hash.call_id' => Myobject.all.distinct('hash.call_id'))
Thanks
You cannot simply return the document(or subset) by using the distinct. As per the documentation it only returns the distinct array of values based on the given key. But you can achieve this by using map-reduce
var _map = function () {
emit(this.hash.call_id, {doc:this});
}
var _reduce = function (key, values) {
var ret = {doc:[]};
var doc = {};
values.forEach(function (value) {
if (!doc[value.doc.hash.call_id]) {
ret.doc.push(value.doc);
doc[value.doc.hash.call_id] = true; //make the doc seen, so it will be picked only once
}
});
return ret;
}
The above code is self explanatory, on map function i am grouping it by key hash.call_id and returning the whole doc so it can be processed by reduce funcition.
On reduce function, just loop through the grouped result set and pick only one item from the grouped set (among the multiple duplicate key values - distinct simulation).
Finally create some test data
> db.disTest.insert({hash:{call_id:"1234"},something:"AAA"})
> db.disTest.insert({hash:{call_id:"1234"},something:"BBB"})
> db.disTest.insert({hash:{call_id:"1234"},something:"CCC"})
> db.disTest.insert({hash:{call_id:"5555"},something:"DDD"})
> db.disTest.insert({hash:{call_id:"5555"},something:"EEE"})
> db.disTest.find()
{ "_id" : ObjectId("4f30a27c4d203c27d8f4c584"), "hash" : { "call_id" : "1234" }, "something" : "AAA" }
{ "_id" : ObjectId("4f30a2844d203c27d8f4c585"), "hash" : { "call_id" : "1234" }, "something" : "BBB" }
{ "_id" : ObjectId("4f30a2894d203c27d8f4c586"), "hash" : { "call_id" : "1234" }, "something" : "CCC" }
{ "_id" : ObjectId("4f30a2944d203c27d8f4c587"), "hash" : { "call_id" : "5555" }, "something" : "DDD" }
{ "_id" : ObjectId("4f30a2994d203c27d8f4c588"), "hash" : { "call_id" : "5555" }, "something" : "EEE" }
and running this map reduce
> db.disTest.mapReduce(_map,_reduce, {out: { inline : 1}})
{
"results" : [
{
"_id" : "1234",
"value" : {
"doc" : [
{
"_id" : ObjectId("4f30a27c4d203c27d8f4c584"),
"hash" : {
"call_id" : "1234"
},
"something" : "AAA"
}
]
}
},
{
"_id" : "5555",
"value" : {
"doc" : [
{
"_id" : ObjectId("4f30a2944d203c27d8f4c587"),
"hash" : {
"call_id" : "5555"
},
"something" : "DDD"
}
]
}
}
],
"timeMillis" : 2,
"counts" : {
"input" : 5,
"emit" : 5,
"reduce" : 2,
"output" : 2
},
"ok" : 1,
}
You get the first document of the distinct set. You can do the same in mongoid by first stringify the map/reduce functions and call mapreduce like this
MyObject.collection.mapreduce(_map,_reduce,{:out => {:inline => 1},:raw=>true })
Hope it helps

How to join query in mongodb?

I have user document collection like this:
User {
id:"001"
name:"John",
age:30,
friends:["userId1","userId2","userId3"....]
}
A user has many friends, I have the following query in SQL:
select * from user where in (select friends from user where id=?) order by age
I would like to have something similar in MongoDB.
To have everything with just one query using the $lookup feature of the aggregation framework, try this :
db.User.aggregate(
[
// First step is to extract the "friends" field to work with the values
{
$unwind: "$friends"
},
// Lookup all the linked friends from the User collection
{
$lookup:
{
from: "User",
localField: "friends",
foreignField: "_id",
as: "friendsData"
}
},
// Sort the results by age
{
$sort: { 'friendsData.age': 1 }
},
// Get the results into a single array
{
$unwind: "$friendsData"
},
// Group the friends by user id
{
$group:
{
_id: "$_id",
friends: { $push: "$friends" },
friendsData: { $push: "$friendsData" }
}
}
]
)
Let's say the content of your User collection is the following:
{
"_id" : ObjectId("573b09e6322304d5e7c6256e"),
"name" : "John",
"age" : 30,
"friends" : [
"userId1",
"userId2",
"userId3"
]
}
{ "_id" : "userId1", "name" : "Derek", "age" : 34 }
{ "_id" : "userId2", "name" : "Homer", "age" : 44 }
{ "_id" : "userId3", "name" : "Bobby", "age" : 12 }
The result of the query will be:
{
"_id" : ObjectId("573b09e6322304d5e7c6256e"),
"friends" : [
"userId3",
"userId1",
"userId2"
],
"friendsData" : [
{
"_id" : "userId3",
"name" : "Bobby",
"age" : 12
},
{
"_id" : "userId1",
"name" : "Derek",
"age" : 34
},
{
"_id" : "userId2",
"name" : "Homer",
"age" : 44
}
]
}
Edit: this answer only applies to versions of MongoDb prior to v3.2.
You can't do what you want in just one query. You would have to first retrieve the list of friend user ids, then pass those ids to the second query to retrieve the documents and sort them by age.
var user = db.user.findOne({"id" : "001"}, {"friends": 1})
db.user.find( {"id" : {$in : user.friends }}).sort("age" : 1);
https://docs.mongodb.org/manual/reference/operator/aggregation/lookup/
This is the doc for join query in mongodb , this is new feature from version 3.2.
So this will be helpful.
You can use in Moongoose JS .populate() and { populate : { path : 'field' } }.
Example:
Models:
mongoose.model('users', new Schema({
name:String,
status: true,
friends: [{type: Schema.Types.ObjectId, ref:'users'}],
posts: [{type: Schema.Types.ObjectId, ref:'posts'}],
}));
mongoose.model('posts', new Schema({
description: String,
comments: [{type: Schema.Types.ObjectId, ref:'comments'}],
}));
mongoose.model('comments', new Schema({
comment:String,
status: true
}));
If you want to see your friends' posts, you can use this.
Users.find(). //Collection 1
populate({path:'friends', //Collection 2
populate:{path:'posts' //Collection 3
}})
.exec();
If you want to see your friends' posts and also bring all the comments, you can use this and too, you can indentify the collection if this not find and the query is wrong.
Users.find(). //Collection 1
populate({path:'friends', //Collection 2
populate:{path:'posts', //Collection 3
populate:{path:'commets, model:Collection'//Collection 4 and more
}}})
.exec();
And to finish, if you want get only some fields of some Collection, you can use the propiertie select Example:
Users.find().
populate({path:'friends', select:'name status friends'
populate:{path:'comments'
}})
.exec();
MongoDB doesn't have joins, but in your case you can do:
db.coll.find({friends: userId}).sort({age: -1})
one kind of join a query in mongoDB, is ask at one collection for id that match , put ids in a list (idlist) , and do find using on other (or same) collection with $in : idlist
u = db.friends.find({"friends": ? }).toArray()
idlist= []
u.forEach(function(myDoc) { idlist.push(myDoc.id ); } )
db.friends.find({"id": {$in : idlist} } )
Only populate array friends.
User.findOne({ _id: "userId"})
.populate('friends')
.exec((err, user) => {
//do something
});
Result is same like this:
{
"_id" : "userId",
"name" : "John",
"age" : 30,
"friends" : [
{ "_id" : "userId1", "name" : "Derek", "age" : 34 }
{ "_id" : "userId2", "name" : "Homer", "age" : 44 }
{ "_id" : "userId3", "name" : "Bobby", "age" : 12 }
]
}
Same this: Mongoose - using Populate on an array of ObjectId
You can use playOrm to do what you want in one Query(with S-SQL Scalable SQL).
var p = db.sample1.find().limit(2) ,
h = [];
for (var i = 0; i < p.length(); i++)
{
h.push(p[i]['name']);
}
db.sample2.find( { 'doc_name': { $in : h } } );
it works for me.
You can do it in one go using mongo-join-query. Here is how it would look like:
const joinQuery = require("mongo-join-query");
joinQuery(
mongoose.models.User,
{
find: {},
populate: ["friends"],
sort: { age: 1 },
},
(err, res) => (err ? console.log("Error:", err) : console.log("Success:", res.results))
);
The result will have your users ordered by age and all of the friends objects embedded.
How does it work?
Behind the scenes mongo-join-query will use your Mongoose schema to determine which models to join and will create an aggregation pipeline that will perform the join and the query.

Resources