I have trouble getting elasticsearch results highlighting to work. I have found many examples and tried different versions but I fail applying it to my own index. What am I doing wrong?
Here is my test script:
init() {
curl -XDELETE http://localhost:9200/twitter
echo
curl -XPUT http://localhost:9200/twitter
echo
curl -XPUT http://localhost:9200/twitter/tweet/_mapping -d '{
"tweet" : {
"properties" : {
"user" : { "type" : "string" },
"message" : {
"type" : "string",
"index": "analyzed",
"store": "yes",
"term_vector" : "with_positions_offsets"
}
}
}
}'
echo
curl -XPOST http://localhost:9200/twitter/tweet -d '{
"user": "kimchy",
"message": "You know, for Search"
}'
echo
curl -XPOST http://localhost:9200/twitter/tweet -d '{
"user": "bar",
"message": "You know, foo for Search"
}'
echo
sleep 2
echo '-------------------'
}
[ "$1" = "init" ] && init
curl -X GET 'http://localhost:9200/twitter/_search/?pretty=true' -d '{
"query":{
"query_string":{
"query":"foo"
}
}
},
"highlight":{
"pre_tags": "<b>",
"post_tags": "</b>",
"fields" : {
"message" : {"number_of_fragments": 20}
}
}
}'
and here the output:
{
"took" : 4,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 0.09492774,
"hits" : [ {
"_index" : "twitter",
"_type" : "tweet",
"_id" : "1tgGWGhnRLy-nJIAunFeeQ",
"_score" : 0.09492774, "_source" : {
"user": "bar",
"message": "You know, foo for Search"
}
} ]
}
}%
As you can see the highlight property is missing completely.
You have too many closing curly brackets in your query part:
"query":{
"query_string":{
"query":"foo"
}
} <---- This one is not needed.
},
So the highlight portion is simply ignored by parser.
By the way, the pre_tags and post_tags should be arrays:
curl "localhost:9200/twitter/tweet/_search?pretty=true" -d '{
"query": {
"query_string": {
"query": "foo"
}
},
"highlight": {
"pre_tags": ["<b>"],
"post_tags": ["</b>"],
"fields": {
"message": {"number_of_fragments": 20}
}
}
}'
Related
I have documents as
{
"took" : 3,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 3,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "journeys-development-latest",
"_type" : "_doc",
"_id" : "1399",
"_score" : 1.0,
"_source" : {
"draft_recent_edit_at" : "2023-01-14T04:16:41.318Z",
"recent_edit_at" : "2022-09-23T14:13:41.246Z"
}
},
{
"_index" : "journeys-development-latest",
"_type" : "_doc",
"_id" : "1394",
"_score" : 1.0,
"_source" : {
"draft_recent_edit_at" : "2022-07-02T16:19:41.347Z",
"recent_edit_at" : "2022-12-26T10:12:41.333Z"
}
},
{
"_index" : "journeys-development-latest",
"_type" : "_doc",
"_id" : "1392",
"_score" : 1.0,
"_source" : {
"draft_recent_edit_at" : "2022-05-20T11:33:41.372Z",
"recent_edit_at" : "2021-12-21T03:36:41.359Z"
}
}
]
}
}
What I know is if I do
{
"size": 12,
"from": 0,
"query": {
......,
......
},
"sort": [
{
"recent_edit_at": {
"order": "desc"
}
}
]
}
This will order by recent_edit_at in desc order.
Similarly replacing recent_edit_at with draft_recent_edit_at will order by draft_recent_edit_at in desc order.
What I am struggling is to find a way where I can say I want to order by max in draft_recent_edit_at, recent_edit_at and then order the documents according to those.
===========================Update===========================
After adding sort proposed by HPringles the output is
{
"error": {
"root_cause": [
{
"type": "script_exception",
"reason": "runtime error",
"script_stack": [
"Math.max(doc['draft_recent_edit_at'].value.toInstant().toEpochMilli(),\n doc['recent_edit_at'].value.toInstance().toEpochMilli())\n ",
" ^---- HERE"
],
"script": "\n Math.max(doc['draft_recent_edit_at'].value.toInstant().toEpochMilli(),\n doc['recent_edit_at'].value.toInstance().toEpochMilli())\n ",
"lang": "painless"
}
],
"type": "search_phase_execution_exception",
"reason": "all shards failed",
"phase": "query",
"grouped": true,
"failed_shards": [
{
"shard": 0,
"index": "journeys-development-latest",
"node": "GGAHq1ufQQmSqeLRyzka5A",
"reason": {
"type": "script_exception",
"reason": "runtime error",
"script_stack": [
"Math.max(doc['draft_recent_edit_at'].value.toInstant().toEpochMilli(),\n doc['recent_edit_at'].value.toInstance().toEpochMilli())\n ",
" ^---- HERE"
],
"script": "\n Math.max(doc['draft_recent_edit_at'].value.toInstant().toEpochMilli(),\n doc['recent_edit_at'].value.toInstance().toEpochMilli())\n ",
"lang": "painless",
"caused_by": {
"type": "illegal_argument_exception",
"reason": "dynamic method [org.elasticsearch.script.JodaCompatibleZonedDateTime, toInstance/0] not found"
}
}
}
]
},
"status": 400
}
If I'm understanding correctly, you can do this with a painless script at runtime.
See below:
"sort": {
"_script": {
"type": "number",
"script": {
"lang": "painless",
"source": """
Math.max(doc['draft_recent_edit_at'].value.toInstant().toEpochMilli(),
doc['recent_edit_at'].value.toInstance().toEpochMilli())
""",
"params": {
"factor": 1.1
}
},
"order": "asc"
}
}
This will work out the maximum of the two, and then sort based on that value.
As far as I know you might also want to convert the Epoch values to long.
Something like -
"sort": {
"_script": {
"type": "number",
"script": {
"lang": "painless",
"source": """
long draft_recent_edit_at = doc['draft_recent_edit_at'].value.toInstant().toEpochMilli();
long recent_edit_at = doc['recent_edit_at'].value.toInstant().toEpochMilli();
Math.max(draft_recent_edit_at, recent_edit_at);
"""
},
"order": "asc"
}
}
Query
curl -X POST \
http://my-neo4j.example.com:7474/db/data/cypher \
-H 'Accept: application/json; charset=UTF-8' \
-H 'Cache-Control: no-cache' \
-H 'Content-Type: application/json' \
-H 'Postman-Token: 10c0796f-d397-4c05-8f6d-9dcde4baca8a' \
-d '{
"query" : "MATCH (c:category) RETURN c {.categoryName} ORDER BY c.categoryName"
}'
Response
{
"columns": [
"c"
],
"data": [
[
{
"categoryName": "Scenario"
}
],
[
{
"categoryName": "Theme"
}
],
[
{
"categoryName": "Video Mood"
}
]
]
}
Question: Why doesn't the result look like this…
{
"columns": [
"c"
],
"data": [
{
"categoryName": "Scenario"
},
{
"categoryName": "Theme"
},
{
"categoryName": "Video Mood"
}
]
}
❓
The returned data is an array of rows.
Each row is an array of columns (one for each item in the RETURN clause).
RETURN c {.categoryName} returns just a single column. And, since you used a map projection to specify the column value, the resulting value is a map (that contains a single field in your case).
If your query had used RETURN c.categoryName instead of RETURN c {.categoryName}, then you might of found the result to be less confusing:
{
"columns": [
"c.categoryName"
],
"data": [
[
"Scenario"
],
[
"Theme"
],
[
"Video Mood"
]
]
}
I have an index named books which has reviews as an object which can handle arrays.
While retrieving data, in a particular case I only want the review having maximum rating.
"books" :{
"reviews": {
"properties": {
"rating": {
"type": "float"
},
"comments": {
"type": "string"
}
}
},
"author" : {
"type" : "string"
}
}
Many books can have many reviews each having some rating. For a particular use case I only want the result set to have the reviews having maximum rating. I need to build a search query for that kind of result.
POST books/_search
{
"size": 51,
"sort": [
{
"reviews.rating": {
"order": "asc",
"mode" : "min"
}
}
],
"fields": [
"reviews","author"]
}
By using script_fields one can build dynamic fields but not objects. Else I could have made a dynamic object reviews having one field as rating and another as comment.
script_fields can be used to build both dynamic fields and objects:
curl -XDELETE localhost:9200/test-idx
curl -XPUT localhost:9200/test-idx -d '{
"mappings": {
"books" :{
"reviews": {
"properties": {
"rating": {
"type": "float"
},
"comments": {
"type": "string"
}
}
},
"author" : {
"type" : "string"
}
}
}
}'
curl -XPOST "localhost:9200/test-idx/books?refresh=true" -d '{
"reviews": [{
"rating": 5.5,
"comments": "So-so"
}, {
"rating": 9.8,
"comments": "Awesome"
}, {
"rating": 1.2,
"comments": "Awful"
}],
"author": "Roversial, Cont"
}'
curl "localhost:9200/test-idx/books/_search?pretty" -d '{
"fields": ["author"],
"script_fields": {
"highest_review": {
"script": "max_rating = 0.0; max_review = null; for(review : _source[\"reviews\"]) { if (review.rating > max_rating) { max_review = review; max_rating = review.rating;}} max_review"
}
}
}'
I am running a simple query like so:
{
"query": {
"term": {
"statuses": "active"
}
},
"script_fields": {
"test": {
"script": "_source.name"
}
}
}
The problem is that once I introduce the script_fields, I no longer get _source in my results.
I have tried:
{
"fields": [
"_all"
],
"query": {
"term": {
"statuses": "active"
}
},
"script_fields": {
"email": {
"script": "_source.name"
}
}
}
and
{
"fields": [
"*"
],
"query": {
"term": {
"statuses": "active"
}
},
"script_fields": {
"email": {
"script": "_source.name"
}
}
}
But they did not make any difference. Is there a way to get _source returned in addition to the script_fields?
In the fields array, make it load _source:
{
"stored_fields": [
"_source"
],
"query": {
"term": {
"statuses": "active"
}
},
"script_fields": {
"email": {
"script": "_source.name"
}
}
}
This works for me:
curl -X DELETE localhost:9200/a
curl -X POST localhost:9200/a/b/c -d '{"title" : "foo"}'
curl -X POST localhost:9200/a/_refresh
echo;
curl localhost:9200/a/_search?pretty -d '{
"fields": [
"_source"
],
"query": {
"match_all": {}
},
"script_fields": {
"title_script": {
"script": "_source.title"
}
}
}'
Output:
"hits" : {
# ...
"hits" : [ {
# ...
"_source" : {"title" : "foo"},
"fields" : {
"title_script" : "foo"
}
} ]
}
My question is similar to this one.
Simply, is there a way to return the geo distance when NOT sorting with _geo_distance?
Update:
To clarify, I want the results in random order AND include distance.
Yes you can, by using a script field.
For instance, assuming your doc have a geo-point field called location, you could use the following:
(note the \u0027 is just an escaped single quote, so \u0027location\u0027 is really 'location')
curl -XGET 'http://127.0.0.1:9200/geonames/_search?pretty=1' -d '
{
"script_fields" : {
"distance" : {
"params" : {
"lat" : 2.27,
"lon" : 50.3
},
"script" : "doc[\u0027location\u0027].distanceInKm(lat,lon)"
}
}
}
'
# [Thu Feb 16 11:20:29 2012] Response:
# {
# "hits" : {
# "hits" : [
# {
# "_score" : 1,
# "fields" : {
# "distance" : 466.844095463887
# },
# "_index" : "geonames_1318324623",
# "_id" : "6436641_en",
# "_type" : "place"
# },
... etc
If you want the _source field to be returned as well, then you can specify that as follows:
curl -XGET 'http://127.0.0.1:9200/geonames/_search?pretty=1' -d '
{
"fields" : [ "_source" ],
"script_fields" : {
"distance" : {
"params" : {
"lat" : 2.27,
"lon" : 50.3
},
"script" : "doc[\u0027location\u0027].distanceInKm(lat,lon)"
}
}
}
'
Great answer by DrTech ... here is an updated version for Elasticsearch 5.x with painless as the script language. I also added "store_fields" to include _source in the result:
curl -XGET 'http://127.0.0.1:9200/geonames/_search?pretty=1' -d '
{
"stored_fields" : [ "_source" ],
"script_fields" : {
"distance" : {
"script" : {
"inline": "doc['location'].arcDistance(params.lat,params.lon) * 0.001",
"lang": "painless",
"params": {
"lat": 2.27,
"lon": 50.3
}
}
}
}
}'
To return distance aswel as as all the default fields/source, you could also do this:
To avoid that it sorts by distance (primarily) you just sort by _score (or whatever you want the results sorted by) first.
{
"sort": [
"_score",
{
"_geo_distance": {
"location": {
"lat": 40.715,
"lon": -73.998
},
"order": "asc",
"unit": "km",
"distance_type": "plane"
}
}
]
}
Since ES 1.3 MVEL is disabled by default so use a query like:
GET some-index/_search
{
"sort": [
{
"_geo_distance": {
"geo_location": "47.1, 8.1",
"order": "asc",
"unit": "m"
}
}
],
"query": {
"match_all": {}
},
"script_fields" : {
"distance" : {
"lang": "groovy",
"params" : {
"lat" : 47.1,
"lon" : 8.1
},
"script" : "doc[\u0027geo_location\u0027].distanceInKm(lat,lon)"
}
}
}
see: "lang": "groovy", part