rails elasticsearch searching a nested json field - ruby-on-rails

I'm using the elasticsearch-model gem for a model that has a has_many relationship. To match the documentation, let's say the model is Article and the relationship is has_many categories. So I wrote the customer serializer as follows (directly from the documentation):
def as_indexed_json(options={})
self.as_json(
include: { categories: { only: :title},
})
end
The serialization seems to work. The result of an example Article's as_indexed_json contains a "categories" => {"title"=> "one", "title"=> "two", "title"=> "three"} block.
What I'm struggling with, and haven't been able to find in the documentation, is how to search this nested field.
Here's what I've tried:
From the elasticsearch documentation on nested query I figured it ought to look like this:
r = Article.search query: {
nested: {
path: "categories",
query: {match: {title: "one"}}
}
}
but when I do r.results.first I get an error: nested object under path [categories] is not of nested type]...
I've tried adding changing the one line in the serializer to be: include: { categories: { type: "nested", only: :title} but that doesn't change anything, it still says that categories is not of a nested type.
Of course, I tried just querying the field without any nesting too like this:
r = Article.search query: {match: {categories: 'one'}}
But that just doesn't return any results.
Full text search like this:
r = Article.search query: {match: {_all: 'one'}}
Returns results, but of course I only want to search for 'one' in the categories field.
Any help would be much appreciated!

Rails don't create nesting mapping in elasticsearch. Elasticsearch is making the categories as object rather than nested child using dynamic mapping ("When Elasticsearch encounters a previously unknown field in a document, it uses dynamic mapping to determine the datatype for the field and automatically adds the new field to the type mapping as an object without nesting them"). For making them nested object you need to create mapping again in elasticsearch with categories as nested type, notice type as nested of category.
PUT /my_index
{
"mappings": {
"article": {
"properties": {
"categories": {
"type": "nested",
"properties": {
"name": { "type": "string" },
"comment": { "type": "string" },
"age": { "type": "short" },
"stars": { "type": "short" },
"date": { "type": "date" }
}
}
}
}
}
}
After this you can either reindex data from client or if you want zero downtime read here.
P.S: You have to delete the old mapping for particular index before creating a new one.

Ok, so it looks like: r = Article.search query: {match: {"categories.title" => 'one'}} works, but I'll leave the question open in case someone can explain what's going on with the nested thing...

Related

Specifying map key regex for additionalProperties through Swagger/OpenAPI [duplicate]

The API I'm trying to describe has a structure where the root object can contain an arbitrary number of child objects (properties that are themselves objects). The "key", or property in the root object, is the unique identifier of the child object, and the value is the rest of the child object's data.
{
"child1": { ... bunch of stuff ... },
"child2": { ... bunch of stuff ... },
...
}
This could similarly be modeled as an array, e.g.:
[
{ "id": "child1", ... bunch of stuff ... },
{ "id": "child2", ... bunch of stuff ... },
...
]
but this both makes it structurally less clear what the identifying property is and makes uniqueness among the children's ID implicit rather than explicit, so we want to use an object, or a map.
I've seen the Swagger documentation for Model with Map/Dictionary Properties, but that doesn't adequately suit my use case. Writing something like:
"Parent": {
"additionalProperties": {
"$ref": "#/components/schemas/Child",
}
Yields something like this:
This adequately communicates the descriptiveness of the value in the property, but how do I document what the restrictions are for the "key" in the object? Ideally I'd like to say something like "it's not just any arbitrary string, it's the ID that corresponds to the child". Is this supported in any way?
Your example is correct.
how do I document what the restrictions are for the "key" in the object? Ideally I'd like to say something like "it's not just any arbitrary string, it's the ID that corresponds to the child". Is this supported in any way?
OpenAPI 3.1
OAS 3.1 fully supports JSON Schema 2020-12, including patternProperties. This keyword lets you define the format of dictionary keys by using a regular expression:
"Parent": {
"type": "object",
"patternProperties": {
"^child\d+$": {
"$ref": "#/components/schemas/Child"
}
},
"description": "A map of `Child` schemas, where the keys are IDs that correspond to the child"
}
Or, if the property names are defined by an enum, you can use propertyNames to define that enum:
"Parent": {
"type": "object",
"propertyNames": {
"enum": ["foo", "bar"]
},
"additionalProperties": {
"$ref": "#/components/schemas/Child"
}
}
OpenAPI 3.0 and 2.0
Dictionary keys are assumed to be strings, but there's no way to limit the contents/format of keys. You can document any restrictions and specifics verbally in the schema description. Adding schema examples could help illustrate what your dictionary/map might look like.
"Parent": {
"type": "object",
"additionalProperties": {
"$ref": "#/components/schemas/Child"
},
"description": "A map of `Child` schemas, where the keys are IDs that correspond to the child",
"example": {
"child1": { ... bunch of stuff ... },
"child2": { ... bunch of stuff ... },
}
If the possible key names are known (for example, they are part of an enum), you can define your dictionary as a regular object and the keys as individual object properties:
// Keys can be: key1, key2, key3
"Parent": {
"type": "object",
"properties": {
"key1": { "$ref": "#/components/schemas/Child" },
"key2": { "$ref": "#/components/schemas/Child" },
"key3": { "$ref": "#/components/schemas/Child" }
}
}
Then you can add "additionalProperties": false to really ensure that only those keys are used.

How to construct nested query using criteria and criteriaQuery in spring-data-elasticsearch

I am using spring-data-elasticsearch. I have constructed most of the query conditions using criteria with lots of sub-criteria. Now I want to include a simple query condition for a nested field. But criteria corms query using uery_string API which is not working for nested fields. I am expecting Nested query.
How to support this using Criteria without NativeSearchQuery?
Nested Mapping
{
"ae": {
"type": "nested",
"properties": {
"atb": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"su": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
}
I want to query "ae.su.keyword" field. Building criteriaQuery using this field constructs query_string query using this field, which is not returing correct documents in response as expected. My expectationis, Is there any way to build nested query using criteria? or Override existing criteria query?
Criteria criteria = new Criteria("ae.su.keyword").is("VALUE");
CriteriaQuery query = new CriteriaQuery(criteria);
elasticOperations.search(query, Foo.class, index);
Currently Spring Data Elasticsearch's CriteriaQuery does not support creating nested queries.
The query that is created currently with your example code is (cleaning out irrelevant parts):
{
"query": {
"bool": {
"must": [
{
"query_string": {
"query": "one",
"fields": [
"ae.su.keyword^1.0"
]
}
}
]
}
}
}
With the nested object you have the needed query would be:
{
"query": {
"nested": {
"path": "ae",
"query": {
"bool": {
"must": [
{
"query_string": {
"query": "one",
"fields": [
"ae.su.keyword"
]
}
}
]
}
}
}
}
}
basically wrapping the query that is built now in an additional nested query.
As I wrote, creating this nested query is not supported at the moment. What you can do, besides building a
NativeSearchQuery, is using a StringQuery (yes, it's very ugly as well):
Query query = new StringQuery("{\"nested\":{\"path\":\"ae\",#" +
"\"query\":{\"bool\":{\"must\":[{\"query_string\": {\"query\":\"$1\"," +
"\"fields\":[\"ae.su.keyword\"]}}]}}}}".replace("$1", value));
return operations.search(query, Foo.class);
Edit 01.04.2021:
This topic came up again in an issue, I opened a bug issue to address this. I will go to implement this fix these days.
Edit 05.04.2021:
fixed from 4.2.0.GA on

Searchkick highlights unnecessary word

I'm highlighting unnecessary word for example I'm search for "Document CASE No. 2015-331"
here the list that searchkick will highlight
"Document CASE No. 2015-331"
"Not"
"no"
"on"
"case is"<----- this is very weird i dont know why this is
highlighted lol
"2015"
"2017"
"2018"
"2016"
"to"
"Not to"
here's what's my search looks like
search = ::Document.search params[:q], fields: [:content], where: {id:
params[:id]}, highlight: { tag: 'span class=match-matcher',
fragment_size: #document.content.length}
search.with_highlights.each do |document, highlights|
document.content = highlights[:content]
end
the goal here to highlight the "Document CASE No. 2015-331" only
Looks like your field has been analyzed when indexed.
If you want to achieve the exact match and tokens should be searchable the mapping should be "not_analyzed" and the data needs to be re-indexed.
Here you are looking for exact match.
Modify the mapping for you field by adding something like below.
"mappings": {
"properties": {
"city": {
"type": "text",
"fields": {
"raw": {
"type": "keyword"
}
}
}
}
}

serializing array when nested with other attributes in rails 5

I have a ruby (2.4.0p0) rails (5.0.2) controller from which I wish to return a json result containing a list of Thing objects as well as some high level info (such as next and previous from Kaminari paging).
Consider a Thing with an association to Owner. Thing has a owner_id attribute.
For #things = Thing.page(1).per(2) I will be able to use
render json: #things
and get;
[
{ "id": 1, "owner_id": 1, "name": "thing1" },
{ "id": 2, "owner_id": 1, "name": "thing2" }
]
Good. If I then create a serializer called ThingSerializer.rb and define owner such that it adds "owner":"CatInHat" instead of "owner_id":1
This works as well;
[
{ "id": 1, "owner": "CatInHat", "name": "thing1" },
{ "id": 2, "owner": "CatInHat", "name": "thing2" }
]
This is good, but, my problem comes when I want to add higher level data and label the list as "results" such as when I try;
render json: { next:"some_url_link",previous:"some_other_url_link", results: #bags}
I'd like to get;
{ "next":some_url_link,
"prev":some_other_url_link,
"results":[ { "id": 1, "owner": "CatInHat", "name": "thing1" }, { "id": 2, "owner": "CatInHat", "name": "thing2" } ]
}
What I get is nearly the above but with "owner_id":1 instead of "owner":"CatInHat" - my serializer does not seem to be used when I label and nest my list of things. What is the appropriate way to use my serializer and get this output?
If I create config/initializers/active_model_serializers.rb and add
ActiveModel::Serializer.config.adapter = :json_api
It gives me an api which is similar but I don't know if it can be customized to fit the spec I need above.
thank you for any help
It looks like the serialization logic in render json: ... only kicks in if the attribute is an ActiveRecord object or an array of ActiveRecord objects. Since you are giving it a hash, it will not inspect the individual attributes and recursively apply the serializers.
You can try manually applying ThingSerializer:
render json: {
next: ...,
prev: ...,
results: #things.map { |thing|
ThingSerializer.new(thing).attributes
},
}

Updating multiple documents on nested object change

I am using the elasticsearch-rails and elasticsearch-model gems for my Ruby on Rails app, which is like a question-and-answer site.
My main question is: how do you tell Elasticsearch which documents to update when there was a change to a nested object that's nested in multiple documents?
I have one index my_index and mappings for question, and answer. In particular, question has a nested object with a user:
"question": {
"properties": {
"user": {
"type": "nested",
"properties": {
"created_at": {
"type": "date",
"format": "dateOptionalTime"
},
"name": {
"type": "string"
},
"id": {
"type": "long"
},
"email": {
"type": "string"
}
}
}
...
}
}
It's possible for a user to change his name, and I have hooks to update the user in Elasticsearch:
after_commit lambda { __elasticsearch__.index_document}, on: :update
But this isn't updating the appropriate question objects correctly, and I don't know what to pass to the index_document call to make sure it updates all the corresponding questions with the new user name. Does anyone know? It might even help me to see what a RESTful/curl request should look like?
Any help would be appreciated!
There are a couple of different ways you can go about this. They are all probably going to require some code changes, though. I don't think there is a way to do what you are asking directly, with your current setup.
You can read about the various options here. If you can set things up as a one-to-many relationship, then the parent/child relationship is probably the way to go. Then you could set up something like this:
PUT my_index
{
"mappings": {
"user": {
"properties": {...}
},
"question": {
"_parent": {
"type": "user"
},
"properties": {...}
}
}
}
And in that case you would be able to update users independently of questions. But it makes querying more complicated, which may or may not be a problem in your application code.
Given that you already have nested documents set up, you could simply query for all the documents that have that particular user as a nested document, with something like:
POST /test_index/question/_search
{
"filter": {
"nested": {
"path": "user",
"filter": {
"term": {
"user.id": 2
}
}
}
}
}
and once you have all the affected question documents you can modify the user name in each one and update all the documents with a bulk index request.
Here is some code I used to play around with that last bit:
http://sense.qbox.io/gist/d2a319c6b4e7da0d5ff910b4118549228d90cba0

Resources