Elastic Search - Performing Geo Location Query on Bulk imported JSON

Elastic Search - Performing Geo Location Query on Bulk imported JSON - geolocation

I have a file modified.json containing JSON documents which contain the following:
{"index":{"_index": "trial", "_type": "trial", "_id":"1"}}
{"clinics" : [{"price": 1048, "city": "Bangalore", "Location": {"lon": 77.38381692924742, "lat": 12.952155989068519}}, {"price": 1048, "city": "Bangalore", "Location": {"lon": 77.38381692924742, "lat": 12.952155989068519}}]
.... And more similar documents. I am running a bulk insert through this command:
curl -s -XPOST 'http://localhost:9200/_bulk' --data-binary #modified.json
Next, through the Sense (Google Chrome) plugin, I am issuing the following request
Server: localhost:9200/trial/trial
PUT _search
{
"mappings": {
"clinics": {
"properties": {
"Location": {"type": "geo_point"}
}
}
}
}
I don't know whether or not this is creating a mapping for the Location fields.
After issuing a search request, like this:
Server: localhost:9200/trial/trial
GET _search
{
"query": {
"filtered" : {
"query" : {
"match_all" : {}
},
"filter" : {
"geo_distance" : {
"distance" : "20km",
"Location" : {
"lat" : 12.958487287824958,
"lon" : 77.69648881178146
}
}
}
}
}
}
I am getting an error saying:
failed to find geo_point field [Location]]; }]",
Please help me regarding this. Also, if possible also guide me on how to do faceted search on this data, to show clinics within a range of distance: like between 20 and 30 kilometers

You need to reverse your ordering. PUT your mapping mapping first and then do the bulk insert.
By putting the map before the insert you are letting ES know what the datatype are so that it can index them correctly. ES allows you to add mappings to an existing index but the data that's already there isn't going to get re-indexed automatically. (This is a known issue).

Related

Google Slides API reports Invalid requests[0].updateTableCellProperties: Invalid field: table_cell_properties

Trying to troubleshoot an error message my app gets after sending a batchUpdate request to Google Slides API
Invalid requests[19].updateTableCellProperties: Invalid field: table_cell_properties
The 19th request in the batch is the only updateTableCellProperties request I have. If I removing the 19th request from the batch, everything works fine.
Other requests which I run in this batchUpdate with no issues are are insertTableRows, deleteTableRow, insertText, updateParagraphStyle, updateTextStyle, updateTableColumnProperties. They all work on the same table, so I use the same objectId, but depending on the request I have to specify it as tableObjectId instead of objectId.
Unsure if I am generating a wrong request for the only updateTableCellProperties request I have, or if there is a problem in the Google Slides ruby gem itself, I tried sending just this updateTableCellProperties request from the Google Slides API explorer which has some validation on the request structure. So I sent this updateTableCellProperties batchUpdate request
{
"requests": [
{
"updateTableCellProperties": {
"objectId": "gf9d8fea71f_22_1",
"tableRange": {
"location": {
"columnIndex": 0,
"rowIndex": 1
}
},
"fields": "tableCellProperties",
"tableCellProperties": {
"tableCellBackgroundFill": {
"solidFill": {
"color": {
"themeColor": "LIGHT1"
}
}
}
}
}
}
]
}
And I got this error:
{
"error": {
"code": 400,
"message": "Invalid requests[0].updateTableCellProperties: Invalid field: table_cell_properties",
"status": "INVALID_ARGUMENT"
}
}
Why is this updateTableCellProperties request reported as invalid? I am also confused by the output of the error message as it mentions table_cell_properties in snake case, while the documentation only mentions tableCellProperties in camel case, and my request also only mentions tableCellProperties in camel case. I am only aware of the ruby gems translating between snake case and camel case, but this is not relevant to the API Explorer.

The error Invalid field: table_cell_properties originates from the erroneously specified fields property
See documentation:
fields
At least one field must be specified. The root tableCellProperties is implied and should not be specified. A single "*" can be used as short-hand for listing every field.
So you need to modify fields
from
"fields": "tableCellProperties"
to
"fields": "tableCellBackgroundFill.solidFill.color"
or to
"fields": "*"
There is a second problem with your request:
When specifying the table range, it is required to set the properties rowSpan and columnSpan.
A complete, correct request would be:
{
"requests": [
{
"updateTableCellProperties": {
"objectId": "gf9d8fea71f_22_1",
"tableRange": {
"location": {
"columnIndex": 0,
"rowIndex": 1
},
"rowSpan": 1,
"columnSpan": 1
},
"fields": "tableCellBackgroundFill.solidFill.color",
"tableCellProperties": {
"tableCellBackgroundFill": {
"solidFill": {
"color": {
"themeColor": "LIGHT1"
}
}
}
}
}
}
]
}

Twitter API 2.0 - Unable to fetch user.fields

I am using API version 2.0 and unable to fetch the user.fields results. All other parameters seem to be returning results correctly. I'm following this documentation.
url = "https://api.twitter.com/2/tweets/search/all"
query_params = {
"query": "APPL",
"max_results": "10",
"tweet.fields": "created_at,lang,text,author_id",
"user.fields": "name,username,created_at,location",
"expansions": "referenced_tweets.id.author_id",
}
response = requests.request("GET", url, headers=headers, params=query_params).json()
Sample result:
{
'author_id': '1251347502013521925',
'text': 'All conspiracy. But watch for bad news on Apple. Such a vulnerable stocktechnically for the biggest market cap # $2.1T ( Thanks Jay). This is the glue for the bulls. But, they stopped innovating when Steve died, built a fancy office and split the stock. $appl',
'lang': 'en',
'created_at': '2021-06-05T02:33:48.000Z',
'id': '1401004298738311168',
'referenced_tweets': [{
'type': 'retweeted',
'id': '1401004298738311168'
}]
}
As you can see, the following information is not returned: name, username, and location.
Any idea how to retrieve this info?

Your query does actually return the correct data. I tested this myself.
A full example response will be structured like this:
{
"data": [
{
"created_at": "2021-06-05T02:33:48.000Z",
"lang": "en",
"id": "1401004298738311168",
"text": "All conspiracy. But watch for bad news on Apple. Such a vulnerable stocktechnically for the biggest market cap # $2.1T ( Thanks Jay). This is the glue for the bulls. But, they stopped innovating when Steve died, built a fancy office and split the stock. $appl",
"author_id": "1251347502013521925",
"referenced_tweets": [
{
"type": "retweeted",
"id": "1401004298738311168"
}
]
}
],
"includes": {
"users": [
{
"name": "Gary Casper",
"id": "1251347502013521925",
"username": "Hisel1979",
"created_at": "2020-07-11T13:39:58.000Z"
}
]
}
}
The sample result you provided comes from within the data object. However, the expanded object data will be nested in the includes object (in your case name, username, and location). The corresponding user object can be referenced via the author_id field.

How to get a sub-field of a struct type map, in the search response of YQL query in Vespa?

Sample Data:
"fields": {
"key1":0,
"key2":"no",
"Lang": {
"en": {
"firstName": "Vikrant",
"lastName":"Thakur"
},
"ch": {
"firstName": "维克兰特",
"lastName":"塔库尔"
}
}
}
Expected Response:
"fields": {
"Lang": {
"en": {
"firstName": "Vikrant",
"lastName":"Thakur"
}
}
}
I have added the following in my search-definition demo.sd:
struct lang {
field firstName type string {}
field lastName type string {}
}
field Lang type map <string, lang> {
indexing: summary
struct-field key {
indexing: summary | index | attribute
}
}
I want to write a yql query something like this (This doesn't work):
http://localhost:8080/search/?yql=select Lang.en from sources demo where key2 contains 'no';
My temporary workaround approach
I have implemented a custom searcher in MySearcher.java, through which I am able to extract the required sub-field and set a new field 'defaultLang', and remove the 'Lang' field. The response generated by the searcher:
"fields": {
"defaultLang": {
"firstName": "Vikrant",
"lastName":"Thakur"
}
}
I have written the following in MySearcher.java:
for (Hit hit: result.hits()) {
String language = "en"; //temporarily hard-coded
StructuredData Lang = (StructuredData) hit.getField("Lang");
Inspector o = Lang.inspect();
for (int j=0;j<o.entryCount();j++){
if (o.entry(j).field("key").asString("").equals(language)){
SlimeAdapter value = (SlimeAdapter) o.entry(j).field("value");
hit.setField("defaultLang",value);
break;
}
}
hit.removeField("Lang");
}
Edit-1: A more efficient way instead is to make use of the Inspectable interface and Inspector, like above (Thanks to #Jo Kristian Bergum)
But, in the above code, I am having to loop through all the languages to filter out the required one. I want to avoid this O(n) time-complexity and make use of the map structure to access it in O(1). (Because the languages may increase to 1000, and this would be done for each hit.)
All this is due to the StructuredData data type I am getting in the results. StructureData doesn't keep the Map Structure and rather gives an array of JSON like:
[{
"key": "en",
"value": {
"firstName": "Vikrant",
"lastName": "Thakur"
}
}, {
"key": "ch",
"value": {
"firstName": "维克兰特",
"lastName": "塔库尔"
}
}]
Please, suggest a better approach altogether, or any help with my current one. Both are appreciated.

The YQL sample query I guess is to illustrate what you want as that syntax is not valid. Picking a given key from the field Lang of type map can be done as you do in your searcher but deserializing into JSON and parsing the JSON is probably inefficient as StructuredData implements the Inspectable interface and you can inspect it directly without the need to go through JSON format. See https://docs.vespa.ai/documentation/reference/inspecting-structured-data.html

Default datatype mapping in elasticsearch

The following are the contents of config/default_mapping.json:
{
"_default_" : [
{
"int_template" : {
"match": "*",
"match_mapping_type": "int",
"mapping": {
"type": "string"
}
}
]
}
Want i want ES to do is to pick out all numbers from my logs and map them as strings.
Use case-
After clearing all indexes- curl -XDELETE 'http://localhost:9200/_all', i run this to send the following to ES (through fluentd's tailf plugin)-
echo "{\"this\" : 134}" >> /home/user/logs/program-data/logs/tiger/tiger.log
Elastic happily creates the initial indexes. Now, to test weather my default_mapping works, i send a string at the value where i previously sent an int.
echo "{\"this\" : \"ABC\"}" >> /home/user/logs/program-data/logs/tiger/tiger.log
Exception caught by ES-
org.elasticsearch.index.mapper.MapperParsingException: failed to parse [this]
at org.elasticsearch.index.mapper.core.AbstractFieldMapper.parse(AbstractFieldMapper.java:398)
at org.elasticsearch.index.mapper.object.ObjectMapper.serializeValue(ObjectMapper.java:618)
at org.elasticsearch.index.mapper.object.ObjectMapper.parse(ObjectMapper.java:471)
at org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:513)
at org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:457)
at org.elasticsearch.index.shard.service.InternalIndexShard.prepareCreate(InternalIndexShard.java:342)
at org.elasticsearch.action.bulk.TransportShardBulkAction.shardIndexOperation(TransportShardBulkAction.java:401)
at org.elasticsearch.action.bulk.TransportShardBulkAction.shardOperationOnPrimary(TransportShardBulkAction.java:155)
at org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction.performOnPrimary(TransportShardReplicationOperationAction.java:556)
at org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction$1.run(TransportShardReplicationOperationAction.java:426)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:701)
Caused by: java.lang.NumberFormatException: For input string: "ABC"
at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Long.parseLong(Long.java:438)
at java.lang.Long.parseLong(Long.java:478)
at org.elasticsearch.common.xcontent.support.AbstractXContentParser.longValue(AbstractXContentParser.java:89)
What could be wrong here?
Update-
My default_mapping.json now looks like-
{
"_default_": {
"dynamic_templates": [
{
"string_template": {
"match": "*",
"mapping": {
"type": "string"
}
}
}
]
}
}

First of all, I'd suggest not to use file system based configuration or mappings. Just do it via api.
Your mapping is malformed, as you have the type name (_default_) but you don't specify that what you are submitting is a dynamic template.
As for the content, I'd remove that match_mapping_type if you want to map everything as a string.

How to make elasticsearch add the timestamp field to every document in all indices?

Elasticsearch experts,
I have been unable to find a simple way to just tell ElasticSearch to insert the _timestamp field for all the documents that are added in all the indices (and all document types).
I see an example for specific types:
http://www.elasticsearch.org/guide/reference/mapping/timestamp-field/
and also see an example for all indices for a specific type (using _all):
http://www.elasticsearch.org/guide/reference/api/admin-indices-put-mapping/
but I am unable to find any documentation on adding it by default for all documents that get added irrespective of the index and type.

Elasticsearch used to support automatically adding timestamps to documents being indexed, but deprecated this feature in 2.0.0
From the version 5.5 documentation:
The _timestamp and _ttl fields were deprecated and are now removed. As a replacement for _timestamp, you should populate a regular date field with the current timestamp on application side.

You can do this by providing it when creating your index.
$curl -XPOST localhost:9200/test -d '{
"settings" : {
"number_of_shards" : 1
},
"mappings" : {
"_default_":{
"_timestamp" : {
"enabled" : true,
"store" : true
}
}
}
}'
That will then automatically create a _timestamp for all stuff that you put in the index.
Then after indexing something when requesting the _timestamp field it will be returned.

Adding another way to get indexing timestamp. Hope this may help someone.
Ingest pipeline can be used to add timestamp when document is indexed. Here, is a sample example:
PUT _ingest/pipeline/indexed_at
{
"description": "Adds indexed_at timestamp to documents",
"processors": [
{
"set": {
"field": "_source.indexed_at",
"value": "{{_ingest.timestamp}}"
}
}
]
}
Earlier, elastic search was using named-pipelines because of which 'pipeline' param needs to be specified in the elastic search endpoint which is used to write/index documents. (Ref: link) This was bit troublesome as you would need to make changes in endpoints on application side.
With Elastic search version >= 6.5, you can now specify a default pipeline for an index using index.default_pipeline settings. (Refer link for details)
Here is the to set default pipeline:
PUT ms-test/_settings
{
"index.default_pipeline": "indexed_at"
}
I haven't tried out yet, as didn't upgraded to ES 6.5, but above command should work.

You can make use of default index pipelines, leverage the script processor, and thus emulate the auto_now_add functionality you may know from Django and DEFAULT GETDATE() from SQL.
The process of adding a default yyyy-MM-dd HH:mm:ss date goes like this:
1. Create the pipeline and specify which indices it'll be allowed to run on:
PUT _ingest/pipeline/auto_now_add
{
"description": "Assigns the current date if not yet present and if the index name is whitelisted",
"processors": [
{
"script": {
"source": """
// skip if not whitelisted
if (![ "myindex",
"logs-index",
"..."
].contains(ctx['_index'])) { return; }
// don't overwrite if present
if (ctx['created_at'] != null) { return; }
ctx['created_at'] = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss").format(new Date());
"""
}
}
]
}
Side note: the ingest processor's Painless script context is documented here.
2. Update the default_pipeline setting in all of your indices:
PUT _all/_settings
{
"index": {
"default_pipeline": "auto_now_add"
}
}
Side note: you can restrict the target indices using the multi-target syntax:
PUT myindex,logs-2021-*/_settings?allow_no_indices=true
{
"index": {
"default_pipeline": "auto_now_add"
}
}
3. Ingest a document to one of the configured indices:
PUT myindex/_doc/1
{
"abc": "def"
}
4. Verify that the date string has been added:
GET myindex/_search

An example for ElasticSearch 6.6.2 in Python 3:
from elasticsearch import Elasticsearch
es = Elasticsearch(hosts=["localhost"])
timestamp_pipeline_setting = {
"description": "insert timestamp field for all documents",
"processors": [
{
"set": {
"field": "ingest_timestamp",
"value": "{{_ingest.timestamp}}"
}
}
]
}
es.ingest.put_pipeline("timestamp_pipeline", timestamp_pipeline_setting)
conf = {
"settings": {
"number_of_shards": 2,
"number_of_replicas": 1,
"default_pipeline": "timestamp_pipeline"
},
"mappings": {
"articles":{
"dynamic": "false",
"_source" : {"enabled" : "true" },
"properties": {
"title": {
"type": "text",
},
"content": {
"type": "text",
},
}
}
}
}
response = es.indices.create(
index="articles_index",
body=conf,
ignore=400 # ignore 400 already exists code
)
print ('\nresponse:', response)
doc = {
'title': 'automatically adding a timestamp to documents',
'content': 'prior to version 5 of Elasticsearch, documents had a metadata field called _timestamp. When enabled, this _timestamp was automatically added to every document. It would tell you the exact time a document had been indexed.',
}
res = es.index(index="articles_index", doc_type="articles", id=100001, body=doc)
print(res)
res = es.get(index="articles_index", doc_type="articles", id=100001)
print(res)
About ES 7.x, the example should work after removing the doc_type related parameters as it's not supported any more.

first create index and properties of the index , such as field and datatype and then insert the data using the rest API.
below is the way to create index with the field properties.execute the following in kibana console
`PUT /vfq-jenkins
{
"mappings": {
"properties": {
"BUILD_NUMBER": { "type" : "double"},
"BUILD_ID" : { "type" : "double" },
"JOB_NAME" : { "type" : "text" },
"JOB_STATUS" : { "type" : "keyword" },
"time" : { "type" : "date" }
}}}`
the next step is to insert the data into that index:
curl -u elastic:changeme -X POST http://elasticsearch:9200/vfq-jenkins/_doc/?pretty
-H Content-Type: application/json -d '{
"BUILD_NUMBER":"83","BUILD_ID":"83","JOB_NAME":"OMS_LOG_ANA","JOB_STATUS":"SUCCESS" ,
"time" : "2019-09-08'T'12:39:00" }'

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

Elastic Search - Performing Geo Location Query on Bulk imported JSON - geolocation

Related

Google Slides API reports Invalid requests[0].updateTableCellProperties: Invalid field: table_cell_properties

Twitter API 2.0 - Unable to fetch user.fields

How to get a sub-field of a struct type map, in the search response of YQL query in Vespa?

Default datatype mapping in elasticsearch

How to make elasticsearch add the timestamp field to every document in all indices?

Categories

Resources