Simulating a join in ElasticSearch - join

Assume there are documents in an ES index that have two fields, user_id and action_id. How to count users such that there are documents both with action_id = 1 and action_id = 2?
Equivalent SQL would be
SELECT COUNT(DISTINCT `a`.`uuid`)
FROM `action` AS `a`
JOIN `action` AS `b` ON `a`.`user_id` = `b`.`user_id`
WHERE `a`.`action_id` = 1
AND `b`.`action_id` = 2
I found the only way to do so: request twice all unique user_ids with these action_ids and find intersection of resulting sets on the ES client. Yet this approach needs to transfer megabytes of data from ES, so I'm searching for an alternative.

You can do it like this:
first you have a query that filters your documents with actions 1 and 2 only (I have no idea if you can have other action types)
then the magic is with aggregations
the first aggregation is a terms one for user_id, so that you can do individual calculations per user
then you use a cardinality sub-aggregation to count the number of distinct actions per user. Since the query is for actions 1 and 2 that number can only be 1 or 2
then you use a bucket_selector sub-aggregation to only keep those users that have the cardinality result of 2.
{
"size": 0,
"query": {
"bool": {
"should": [
{
"terms": {
"action_id": [
1,
2
]
}
}
]
}
},
"aggs": {
"users": {
"terms": {
"field": "user_id",
"size": 10
},
"aggs": {
"actions": {
"cardinality": {
"field": "action_id"
}
},
"actions_count_bucket_filter": {
"bucket_selector": {
"buckets_path": {
"totalActions": "actions"
},
"script": "totalActions >= 2"
}
}
}
}
}
}
The result will look like this:
"aggregations": {
"users": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": 1,
"doc_count": 2,
"actions": {
"value": 2
}
},
{
"key": 5,
"doc_count": 2,
"actions": {
"value": 2
}
}
]
}
}
The keys are the user_ids whose actions are 1 and 2. bucket_selector aggregation is available in 2.x+ version of ES.

Related

How to find ids on array who is created facet operator

I have Customer collection on MongoDB. With status field. Which can have the same Id fields.
And I need find first changed value like 'Guest' and push it Id's to specific pipeline named as 'guests'.
And customers with status 'Member' I need push tu another pipeline named as 'members' who Id'd equal Id's from aggregation pipeline 'guests'.
This is done in order to obtain the quantity elements in 'guests' and 'members'.
Its member item:
{"_id"=>{"$oid"=>"5ce2ecb3ad71852e7fa9e73f"},
"status"=>"member",
"duration"=>nil,
"is_deleted"=>false,
"customer_id"=>"17601",
"customer_journal_item_id"=>"62769",
"customer_ids"=>"17601",
"customer_journal_item_ids"=>"62769",
"self_customer_status_id"=>"21078",
"self_customer_status_created_at"=>"2017-02-01T00:00:00.000Z",
"self_customer_status_updated_at"=>"2017-02-01T00:00:00.000Z",
"updated_at"=>"2019-05-20T18:06:43.655Z",
"created_at"=>"2019-05-20T18:06:43.655Z"}}
My aggregation
{
'$sort': {'self_customer_status_created_at': 1}
},
{'$match':
{
'self_customer_status_created_at':
{
"$gte": Time.parse('2017-01-17').beginning_of_month,
"$lte": Time.parse('2017-01-17').end_of_month
}
}
},
{
"$facet": {
"guests":
[
{
"$group": {
"_id": "$_id",
"data": {
'$first': '$$ROOT'
}
}
},
{
"$match": {
"data.status": "guest"
}
}, {
"$group": {
"_id":nil,
"array":{
"$push": "$data.self_customer_status_id"
}
}
},
{
"$project":{
"array": 1,
"_id":0
}
}
], "members":
[
{
"$group": {
"_id": "$_id", "data": {
'$last': '$$ROOT'
}
}
},
{
"$match": {
"data.status": "member",
"data.self_customer_status_id": {
"$in": [
"$guests.array"
]
}
}
}
}
]
}
}, {
"$project":
{
"members": 1,
"guests.array": 1
}
}
]
).as_json
Instead "guests.array" array? I have error:
Mongo::Error::OperationFailure: $in needs an array (2)
What am I doing wrong?
Sorry my English!
second expression in faced doesnt seen first expression
need delete
,
"data.self_customer_status_id": {
"$in": {
"$arrayElemAt":
[
"$guests.array",
0
]
}
}
{"$match": {"data.self_customer_status_id": { "$in": ["guests.array"] } } }
```
this link paste before $project

Application Insights API $select not returning all results when values share part of path

I'm not sure if this is an OData issue or an Application Insights issue, but the App Insights API is not giving me all of the values I selected. It works normally most of the time, but when I ask for two values that share the beginning of their path, it only gives me the second value I asked for.
Here's an example of my issue:
data:
{
"count": 1,
"type": "customEvent",
"customDimensions": {
"success": "true",
"version": "ver-1"
},
"other": {
"key": "val-1"
}
},
{
"count": 2,
"type": "customEvent",
"customDimensions": {
"success": "false",
"version": "ver-2"
},
"other": {
"key": "val-2"
}
}
These all return the results that I'm expecting:
Query: $select=count,type
{
"count": 1,
"type": "customEvent"
},
{
"count": 2,
"type": "customEvent"
}
Query: select=customDimensions/success,other/key
{
"customDimensions": {
"success":"true"
},
"other": {
"key":"ver-1"
}
},
{
"customDimensions": {
"success":"false"
},
"other": {
"key":"ver-2"
}
}
However, if I try to get two values that start with the same path, it only shows me the second one.
Query: select=customDimensions/success,customDimensions/version
{
"customDimensions": {
"version":"ver-1"
}
},
{
"customDimensions": {
"version":"ver-2"
}
}
Is this an issue with either OData or Application Insights, or is there some other way I can format my query to give me the information I want? Thanks!
Update:
You can use the query api as following to fetch the data:
https://api.applicationinsights.io/v1/apps/Your_application_id/query?query=requests
| where timestamp >ago(5h)
| project customDimensions.UsersNamed, customDimensions.TenantsCoded
I test it in postman, see screenshot below:
Seems that your App Insights query is ok, I tested it using this .
I fetch the operation/name and operation/id(which starts with same path), original like this:
Then input some necessary condition, as screenshot below:
After click "Fetch" button, you can see the operation/name and operation/id are both returned.

how to get the union of hashes in ruby for this json structure

Below is json I translated from ruby hash for ease of representation for this question using hash.to_json. Notice how the key range is being repeated since the values in the nested doc are different. How do I merge the ranges so that for the weight key both "gt": 2232, "lt": 4444 fall under the one hash key weight inside range. Is there some union or collapse method in ruby to sort of "compactify" hashes?
{
"must": [
{
"match": {
"status_type": "good"
}
},
{
"range": {
"created_date": {
"lte": 43252
}
}
},
{
"range": {
"created_date": {
"gt": "42323"
}
}
},
{
"range": {
"created_date": {
"gte": 523432
}
}
},
{
"range": {
"weight": {
"gt": 2232
}
}
},
{
"range": {
"weight": {
"lt": 4444
}
}
}
],
"should": [
{
"match": {
"product_age": "old"
}
}
]
}
Want to change the above to this:
{
"must": [
{
"range": {
"created_date": {
"gte": 523432,
"gt": "42323"
}
}
},
{
"range": {
"weight": {
"gt": 2232,
"lt": 4444
}
}
}
],
"should": [
{
"match": {
"product_age": "old"
}
}
]
}
I don't know of a built in way to handle something like this, but you could write a method that does something like this:
def collapse(array, key)
# Get only the hashes with :range
to_collapse = array.select { |elem| elem.has_key? key }
uncollapsed = array - to_collapse
# Get the hashes that :range points to
to_collapse = to_collapse.map { |elem| elem.values }.flatten
collapsed = {}
# Iterate through each range hash and their subsequent subhashes.
# Collapse the values into the collapsed hash as necessary
to_collapse.each do |elem|
elem.each do |k, v|
collapsed[k] = {} unless collapsed.has_key? k
v.each do |inner_key, inner_val|
collapsed[k][inner_key] = inner_val
end
end
end
[uncollapsed, collapsed].flatten
end
hash[:must] = collapse hash[:must], :range
Note that this is a specific solution that's mainly applicable to the presented problem. It only works for the hash/array depths specified here. You could probably write a recursive solution that could potentially work at any level of depth with a bit more work.

Finding nested hash in hashes of JSON

The JSON structure I have is as follows:
{
"result": {
"status": 1,
"num_results": 100,
"total_results": 500,
"results_remaining": 400,
"matches": [
{
"match_id": 381440228,
"match_seq_num": 347730324,
"start_time": 1384292236,
"lobby_type": 0
},
{
"match_id": 380304327,
"match_seq_num": 346687624,
"start_time": 1384203633,
"lobby_type": 0
}
]
There would be many more "matches" underneath that.
What I'm wondering is how I would pull one of the hashes in the 'matches' array by its 'match_id'.
Since the match_id is inside the hash, how would I pull the entire hash by searching for that value?
One way could be:
hash["result"]["matches"].select {|m| m["match_id"] == match_id }
How about
hash["result"]["matches"].find { |m| m["match_id"] == "your id here" }
See Enumberable#find.

Filter result based on a count of inner data

I am building my search query for some listing data. As part of the search people can ask for multiple rooms which sleeps a min amount of people, ie two rooms which sleep 2 and 3 people.
Im not sure how I can perform that with a filter.
Here is a shortened search query so far.
{
"query":{
"filtered":{
"query":{
"match_all":{}
}
}
},
"filter":{
"and":
[
{
"term":{
"status":"live"
}
},
{
"geo_bounding_box":{
"location":{
"top_left":"60.856553, -8.64935719999994",
"bottom_right":"49.8669688, 1.76270959999999"
}
}
}
,{
"range":{
"bedrooms":{
"gte":"2"
}
}
}
]
}
,
"size":10
}
Test Data
{
"took":1,
"timed_out":false,
"_shards":{
"total":5,
"successful":5,
"failed":0
},
"hits":{
"total":3,
"max_score":1.0,
"hits":[
{
"_index":"listings",
"_type":"listing",
"_id":"1",
"_score":1.0,
"_source":{
"name:":"Listing One",
"address1":"Some Street",
"bedrooms":2,
"city":"A City",
"id":1,
"refno":"FI451",
"user_id":1,
"rooms":[
{
"bathroom":"Shared bathroom with bath",
"double_standard":null,
"id":5,
"single":2,
"sleeps":2,
"title":"Twinny",
},
{
"bathroom":"Ensuite with bath",
"double_king_size":1,
"double_standard":1,
"id":1,
"single":null,
"sleeps":2,
"title":"Double Ensuite Room",
}
]
}
},
{
"_index":"listings",
"_type":"listing",
"_id":"2",
"_score":1.0,
"_source":{
"name":"Listing Two",
"address1":"Some Street",
"bedrooms":2,
"city":"A City",
"id":2,
"refno":"BL932",
"user_id":1,
"rooms":[
{
"bathroom":"Ensuite with bath",
"double_standard":1,
"id":4,
"single":1,
"sleeps":3,
"title":"Family Room",
},
{
"bathroom":"Ensuite with shower",
"double_standard":1,
"id":2,
"single":null,
"sleeps":2,
"title":"Single Room",
}
]
}
},
{
"_index":"listings",
"_type":"listing",
"_id":"3",
"_score":1.0,
"_source":{
"name":"Listing Three",
"address1":"Another Address",
"bedrooms":1,
"city":"Your City",
"id":3,
"refno":"TE2116",
"user_id":1,
"rooms":[
{
"bathroom":"Ensuite with shower",
"double_king_size":null,
"double_standard":1,
"id":3,
"single":1,
"sleeps":3,
"title":"Family Room",
}
]
}
}
]
}
}
If you look at my data I have 3 listings, two of them have multiple rooms (Listing One & Two) but only Listing Two would match my search, Reason it has one room with that sleeps two and the other sleeps three.
Is it possible to perform this query with elasticsearch?
If what you want is "Find all listings where a bedroom sleeps 2 AND another bedroom sleeps 3", this query will work. It makes one big assumptions: that you are using inner objects, and not the Nested data type.
This query is using the fact that inner objects are collapsed into a single field, causing "rooms.sleeps" to equal [2,3] for the desired field. Since the field is collapsed into a single array, a simple Terms query will match them. When you change the execution mode to And, it forces both 2 and 3 to be matched.
The caveat is that a room that has [2,3,4] will also be matched.
I've omitted the geo and status portion since that data wasn't provided in the source documents.
{
"query": {
"filtered": {
"query": {
"match_all": {}
}
}
},
"filter": {
"and": [
{
"range": {
"bedrooms": {
"gte": "2"
}
}
},
{
"terms": {
"rooms.sleeps": [2,3],
"execution": "and"
}
}
]
},
"size": 10
}
As far as I know the filter has to be a sibling of the query inside the filtered element. See: http://www.elasticsearch.org/guide/reference/query-dsl/filtered-query/
If you combine that with Zach's solution it should work.
{
"query":
{
"filtered":
{
"query":
{
"match_all":{}
},
"filter":
{
"put" : "your filter here"
}
}
}
}

Resources