Filter result based on a count of inner data - ruby-on-rails

I am building my search query for some listing data. As part of the search people can ask for multiple rooms which sleeps a min amount of people, ie two rooms which sleep 2 and 3 people.
Im not sure how I can perform that with a filter.
Here is a shortened search query so far.
{
"query":{
"filtered":{
"query":{
"match_all":{}
}
}
},
"filter":{
"and":
[
{
"term":{
"status":"live"
}
},
{
"geo_bounding_box":{
"location":{
"top_left":"60.856553, -8.64935719999994",
"bottom_right":"49.8669688, 1.76270959999999"
}
}
}
,{
"range":{
"bedrooms":{
"gte":"2"
}
}
}
]
}
,
"size":10
}
Test Data
{
"took":1,
"timed_out":false,
"_shards":{
"total":5,
"successful":5,
"failed":0
},
"hits":{
"total":3,
"max_score":1.0,
"hits":[
{
"_index":"listings",
"_type":"listing",
"_id":"1",
"_score":1.0,
"_source":{
"name:":"Listing One",
"address1":"Some Street",
"bedrooms":2,
"city":"A City",
"id":1,
"refno":"FI451",
"user_id":1,
"rooms":[
{
"bathroom":"Shared bathroom with bath",
"double_standard":null,
"id":5,
"single":2,
"sleeps":2,
"title":"Twinny",
},
{
"bathroom":"Ensuite with bath",
"double_king_size":1,
"double_standard":1,
"id":1,
"single":null,
"sleeps":2,
"title":"Double Ensuite Room",
}
]
}
},
{
"_index":"listings",
"_type":"listing",
"_id":"2",
"_score":1.0,
"_source":{
"name":"Listing Two",
"address1":"Some Street",
"bedrooms":2,
"city":"A City",
"id":2,
"refno":"BL932",
"user_id":1,
"rooms":[
{
"bathroom":"Ensuite with bath",
"double_standard":1,
"id":4,
"single":1,
"sleeps":3,
"title":"Family Room",
},
{
"bathroom":"Ensuite with shower",
"double_standard":1,
"id":2,
"single":null,
"sleeps":2,
"title":"Single Room",
}
]
}
},
{
"_index":"listings",
"_type":"listing",
"_id":"3",
"_score":1.0,
"_source":{
"name":"Listing Three",
"address1":"Another Address",
"bedrooms":1,
"city":"Your City",
"id":3,
"refno":"TE2116",
"user_id":1,
"rooms":[
{
"bathroom":"Ensuite with shower",
"double_king_size":null,
"double_standard":1,
"id":3,
"single":1,
"sleeps":3,
"title":"Family Room",
}
]
}
}
]
}
}
If you look at my data I have 3 listings, two of them have multiple rooms (Listing One & Two) but only Listing Two would match my search, Reason it has one room with that sleeps two and the other sleeps three.
Is it possible to perform this query with elasticsearch?

If what you want is "Find all listings where a bedroom sleeps 2 AND another bedroom sleeps 3", this query will work. It makes one big assumptions: that you are using inner objects, and not the Nested data type.
This query is using the fact that inner objects are collapsed into a single field, causing "rooms.sleeps" to equal [2,3] for the desired field. Since the field is collapsed into a single array, a simple Terms query will match them. When you change the execution mode to And, it forces both 2 and 3 to be matched.
The caveat is that a room that has [2,3,4] will also be matched.
I've omitted the geo and status portion since that data wasn't provided in the source documents.
{
"query": {
"filtered": {
"query": {
"match_all": {}
}
}
},
"filter": {
"and": [
{
"range": {
"bedrooms": {
"gte": "2"
}
}
},
{
"terms": {
"rooms.sleeps": [2,3],
"execution": "and"
}
}
]
},
"size": 10
}

As far as I know the filter has to be a sibling of the query inside the filtered element. See: http://www.elasticsearch.org/guide/reference/query-dsl/filtered-query/
If you combine that with Zach's solution it should work.
{
"query":
{
"filtered":
{
"query":
{
"match_all":{}
},
"filter":
{
"put" : "your filter here"
}
}
}
}

Related

How to find ids on array who is created facet operator

I have Customer collection on MongoDB. With status field. Which can have the same Id fields.
And I need find first changed value like 'Guest' and push it Id's to specific pipeline named as 'guests'.
And customers with status 'Member' I need push tu another pipeline named as 'members' who Id'd equal Id's from aggregation pipeline 'guests'.
This is done in order to obtain the quantity elements in 'guests' and 'members'.
Its member item:
{"_id"=>{"$oid"=>"5ce2ecb3ad71852e7fa9e73f"},
"status"=>"member",
"duration"=>nil,
"is_deleted"=>false,
"customer_id"=>"17601",
"customer_journal_item_id"=>"62769",
"customer_ids"=>"17601",
"customer_journal_item_ids"=>"62769",
"self_customer_status_id"=>"21078",
"self_customer_status_created_at"=>"2017-02-01T00:00:00.000Z",
"self_customer_status_updated_at"=>"2017-02-01T00:00:00.000Z",
"updated_at"=>"2019-05-20T18:06:43.655Z",
"created_at"=>"2019-05-20T18:06:43.655Z"}}
My aggregation
{
'$sort': {'self_customer_status_created_at': 1}
},
{'$match':
{
'self_customer_status_created_at':
{
"$gte": Time.parse('2017-01-17').beginning_of_month,
"$lte": Time.parse('2017-01-17').end_of_month
}
}
},
{
"$facet": {
"guests":
[
{
"$group": {
"_id": "$_id",
"data": {
'$first': '$$ROOT'
}
}
},
{
"$match": {
"data.status": "guest"
}
}, {
"$group": {
"_id":nil,
"array":{
"$push": "$data.self_customer_status_id"
}
}
},
{
"$project":{
"array": 1,
"_id":0
}
}
], "members":
[
{
"$group": {
"_id": "$_id", "data": {
'$last': '$$ROOT'
}
}
},
{
"$match": {
"data.status": "member",
"data.self_customer_status_id": {
"$in": [
"$guests.array"
]
}
}
}
}
]
}
}, {
"$project":
{
"members": 1,
"guests.array": 1
}
}
]
).as_json
Instead "guests.array" array? I have error:
Mongo::Error::OperationFailure: $in needs an array (2)
What am I doing wrong?
Sorry my English!
second expression in faced doesnt seen first expression
need delete
,
"data.self_customer_status_id": {
"$in": {
"$arrayElemAt":
[
"$guests.array",
0
]
}
}
{"$match": {"data.self_customer_status_id": { "$in": ["guests.array"] } } }
```
this link paste before $project

Simulating a join in ElasticSearch

Assume there are documents in an ES index that have two fields, user_id and action_id. How to count users such that there are documents both with action_id = 1 and action_id = 2?
Equivalent SQL would be
SELECT COUNT(DISTINCT `a`.`uuid`)
FROM `action` AS `a`
JOIN `action` AS `b` ON `a`.`user_id` = `b`.`user_id`
WHERE `a`.`action_id` = 1
AND `b`.`action_id` = 2
I found the only way to do so: request twice all unique user_ids with these action_ids and find intersection of resulting sets on the ES client. Yet this approach needs to transfer megabytes of data from ES, so I'm searching for an alternative.
You can do it like this:
first you have a query that filters your documents with actions 1 and 2 only (I have no idea if you can have other action types)
then the magic is with aggregations
the first aggregation is a terms one for user_id, so that you can do individual calculations per user
then you use a cardinality sub-aggregation to count the number of distinct actions per user. Since the query is for actions 1 and 2 that number can only be 1 or 2
then you use a bucket_selector sub-aggregation to only keep those users that have the cardinality result of 2.
{
"size": 0,
"query": {
"bool": {
"should": [
{
"terms": {
"action_id": [
1,
2
]
}
}
]
}
},
"aggs": {
"users": {
"terms": {
"field": "user_id",
"size": 10
},
"aggs": {
"actions": {
"cardinality": {
"field": "action_id"
}
},
"actions_count_bucket_filter": {
"bucket_selector": {
"buckets_path": {
"totalActions": "actions"
},
"script": "totalActions >= 2"
}
}
}
}
}
}
The result will look like this:
"aggregations": {
"users": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": 1,
"doc_count": 2,
"actions": {
"value": 2
}
},
{
"key": 5,
"doc_count": 2,
"actions": {
"value": 2
}
}
]
}
}
The keys are the user_ids whose actions are 1 and 2. bucket_selector aggregation is available in 2.x+ version of ES.

Filtered search with Authorization for Elasticsearch

I'm trying to do a search where I look for "test" in any field while filtering for a specific client in the client_id field. Can't seem to figure this one out. This is how fat I got (but it's not working):
{
query: {
filtered: {
query: "test",
filter: {
term: {client_id: #client.id}
}
}
}
}
This is the right syntax
{
"query": {
"filtered": {
"query": {
"match": {
"_all": "test"
}
},
"filter": {
"term": {
"client_id": #client.id
}
}
}
}
}
From ES Docs: The _all field allows you to search for values in documents without knowing which field contains the value

How to build a nested structure with Rails and ElasticSearch?

I have a Feature model that belongs_to FeatureKey and FeatureValue.
FeatureKey#name => 'color'
FeatureValue#name => 'red'
I would like to generate a nested aggregations structure to build a shopping cart filter (facet) navigation.
Ideally, the structure would like something like
{ features: {
{ key: color, values: [ red, blue, yellow ] },
{ key: size, values: [ large, medium, small ]}
}}
Can anyone anyone suggest how I can do this?
What I'm currently using:
{
"size":1000,
"fields":[
"id",
"name",
"price"
],
"query":{
"filtered":{
"filter":{
"bool":{
"must":[
{
"term":{
"categories":4838
}
}
]
}
}
}
},
"aggs":{
"price":{
"stats":{
"field":"price"
}
},
"discounted":{
"terms":{
"field":"discounted"
}
},
"stock":{
"filter":{
"range":{
"stock":{
"gt":0
}
}
}
},
"colour":{
"terms":{
"field":"colour"
}
},
"size":{
"terms":{
"field":"size"
}
}
}
}
Add or remove aggregations as you wish. You most likely wish to filter by category, so I left that in for simplicity's sake.

ElasticSearch returns items that are too far away when using a geo_distance filter

When I am searching my ElasticSearch documents using a nested filter -> and -> geo_distance I retrieve documents which are too far away (and I don't want returned.) You can see the query and a screenshot below of the results (raw results on the left and manually filtered results on the right).
Here's another copy of the query:
{
"query":{
"match_all":{
}
},
"filter":{
"and":[
{
"term":{
"PropertySubType":"Single Family"
}
},
{
"term":{
"City":"Los Angeles"
}
},
{
"geo_distance":{
"distance":"2.25miles",
"Location":[
34.111583657,
-118.324646099
]
}
},
{
"range":{
"BedroomsTotal":{
"gte":3
}
}
},
{
"range":{
"BuildingSize":{
"gte":3000
}
}
},
{
"range":{
"YearBuilt":{
"lte":2000
}
}
},
{
"terms":{
"ListingStatus":[
"Active",
"Pending",
"Closed"
]
}
}
]
},
"size":100
}
Adding the option "distance_type" and setting it to "plane" fixed this issue. See "distance_type" here:
http://www.elasticsearch.org/guide/reference/query-dsl/geo-distance-filter.html

Resources