Default datatype mapping in elasticsearch

Default datatype mapping in elasticsearch - mapping

The following are the contents of config/default_mapping.json:
{
"_default_" : [
{
"int_template" : {
"match": "*",
"match_mapping_type": "int",
"mapping": {
"type": "string"
}
}
]
}
Want i want ES to do is to pick out all numbers from my logs and map them as strings.
Use case-
After clearing all indexes- curl -XDELETE 'http://localhost:9200/_all', i run this to send the following to ES (through fluentd's tailf plugin)-
echo "{\"this\" : 134}" >> /home/user/logs/program-data/logs/tiger/tiger.log
Elastic happily creates the initial indexes. Now, to test weather my default_mapping works, i send a string at the value where i previously sent an int.
echo "{\"this\" : \"ABC\"}" >> /home/user/logs/program-data/logs/tiger/tiger.log
Exception caught by ES-
org.elasticsearch.index.mapper.MapperParsingException: failed to parse [this]
at org.elasticsearch.index.mapper.core.AbstractFieldMapper.parse(AbstractFieldMapper.java:398)
at org.elasticsearch.index.mapper.object.ObjectMapper.serializeValue(ObjectMapper.java:618)
at org.elasticsearch.index.mapper.object.ObjectMapper.parse(ObjectMapper.java:471)
at org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:513)
at org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:457)
at org.elasticsearch.index.shard.service.InternalIndexShard.prepareCreate(InternalIndexShard.java:342)
at org.elasticsearch.action.bulk.TransportShardBulkAction.shardIndexOperation(TransportShardBulkAction.java:401)
at org.elasticsearch.action.bulk.TransportShardBulkAction.shardOperationOnPrimary(TransportShardBulkAction.java:155)
at org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction.performOnPrimary(TransportShardReplicationOperationAction.java:556)
at org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction$1.run(TransportShardReplicationOperationAction.java:426)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:701)
Caused by: java.lang.NumberFormatException: For input string: "ABC"
at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Long.parseLong(Long.java:438)
at java.lang.Long.parseLong(Long.java:478)
at org.elasticsearch.common.xcontent.support.AbstractXContentParser.longValue(AbstractXContentParser.java:89)
What could be wrong here?
Update-
My default_mapping.json now looks like-
{
"_default_": {
"dynamic_templates": [
{
"string_template": {
"match": "*",
"mapping": {
"type": "string"
}
}
}
]
}
}

First of all, I'd suggest not to use file system based configuration or mappings. Just do it via api.
Your mapping is malformed, as you have the type name (_default_) but you don't specify that what you are submitting is a dynamic template.
As for the content, I'd remove that match_mapping_type if you want to map everything as a string.

Related

Google Slides API reports Invalid requests[0].updateTableCellProperties: Invalid field: table_cell_properties

Trying to troubleshoot an error message my app gets after sending a batchUpdate request to Google Slides API
Invalid requests[19].updateTableCellProperties: Invalid field: table_cell_properties
The 19th request in the batch is the only updateTableCellProperties request I have. If I removing the 19th request from the batch, everything works fine.
Other requests which I run in this batchUpdate with no issues are are insertTableRows, deleteTableRow, insertText, updateParagraphStyle, updateTextStyle, updateTableColumnProperties. They all work on the same table, so I use the same objectId, but depending on the request I have to specify it as tableObjectId instead of objectId.
Unsure if I am generating a wrong request for the only updateTableCellProperties request I have, or if there is a problem in the Google Slides ruby gem itself, I tried sending just this updateTableCellProperties request from the Google Slides API explorer which has some validation on the request structure. So I sent this updateTableCellProperties batchUpdate request
{
"requests": [
{
"updateTableCellProperties": {
"objectId": "gf9d8fea71f_22_1",
"tableRange": {
"location": {
"columnIndex": 0,
"rowIndex": 1
}
},
"fields": "tableCellProperties",
"tableCellProperties": {
"tableCellBackgroundFill": {
"solidFill": {
"color": {
"themeColor": "LIGHT1"
}
}
}
}
}
}
]
}
And I got this error:
{
"error": {
"code": 400,
"message": "Invalid requests[0].updateTableCellProperties: Invalid field: table_cell_properties",
"status": "INVALID_ARGUMENT"
}
}
Why is this updateTableCellProperties request reported as invalid? I am also confused by the output of the error message as it mentions table_cell_properties in snake case, while the documentation only mentions tableCellProperties in camel case, and my request also only mentions tableCellProperties in camel case. I am only aware of the ruby gems translating between snake case and camel case, but this is not relevant to the API Explorer.

The error Invalid field: table_cell_properties originates from the erroneously specified fields property
See documentation:
fields
At least one field must be specified. The root tableCellProperties is implied and should not be specified. A single "*" can be used as short-hand for listing every field.
So you need to modify fields
from
"fields": "tableCellProperties"
to
"fields": "tableCellBackgroundFill.solidFill.color"
or to
"fields": "*"
There is a second problem with your request:
When specifying the table range, it is required to set the properties rowSpan and columnSpan.
A complete, correct request would be:
{
"requests": [
{
"updateTableCellProperties": {
"objectId": "gf9d8fea71f_22_1",
"tableRange": {
"location": {
"columnIndex": 0,
"rowIndex": 1
},
"rowSpan": 1,
"columnSpan": 1
},
"fields": "tableCellBackgroundFill.solidFill.color",
"tableCellProperties": {
"tableCellBackgroundFill": {
"solidFill": {
"color": {
"themeColor": "LIGHT1"
}
}
}
}
}
}
]
}

Artifactory and Jenkins - get file with newest/biggest custom property

I have generic repository "my_repo". I uploaded files there from jenkins with to paths like my_repo/branch_buildNumber/package.tar.gz and with custom property "tag" like "1.9.0","1.10.0" etc. I want to get item/file with latest/newest tag.
I tried to modify Example 2 from this link ...
https://www.jfrog.com/confluence/display/JFROG/Using+File+Specs#UsingFileSpecs-Examples
... and add sorting and limit the way it was done here ...
https://www.jfrog.com/confluence/display/JFROG/Artifactory+Query+Language#ArtifactoryQueryLanguage-limitDisplayLimitsandPagination
But im getting "unknown property desc" error.

The Jenkins Artifactory Plugin, like most of the JFrog clients, supports File Specs for downloading and uploading generic files.
The File Specs schema is described here. When creating a File Spec for downloading files, you have the option of using the "pattern" property, which can include wildcards. For example, the following spec downloads all the zip files from the my-local-repo repository into the local froggy directory:
{
"files": [
{
"pattern": "my-local-repo/*.zip",
"target": "froggy/"
}
]
}
Alternatively, you can use "aql" instead of "pattern". The following spec, provides the same result as the previous one:
{
"files": [
{
"aql": {
"items.find": {
"repo": "my-local-repo",
"$or": [
{
"$and": [
{
"path": {
"$match": "*"
},
"name": {
"$match": "*.zip"
}
}
]
}
]
}
},
"target": "froggy/"
}
]
}
The allowed AQL syntax inside File Specs does not include everything the Artifactory Query Language allows. For examples, you can't use the "include" or "sort" clauses. These limitations were put in place, to make the response structure known and constant.
Sorting however is still available with File Specs, regardless of whether you choose to use "pattern" or "aql". It is supported throw the "sortBy", "sortOrder", "limit" and "offset" File Spec properties.
For example, the following File Spec, will download only the 3 largest zip file files:
{
"files": [
{
"aql": {
"items.find": {
"repo": "my-local-repo",
"$or": [
{
"$and": [
{
"path": {
"$match": "*"
},
"name": {
"$match": "*.zip"
}
}
]
}
]
}
},
"sortBy": ["size"],
"sortOrder": "desc",
"limit": 3,
"target": "froggy/"
}
]
}
And you can do the same with "pattern", instead of "aql":
{
"files": [
{
"pattern": "my-local-repo/*.zip",
"sortBy": ["size"],
"sortOrder": "desc",
"limit": 3,
"target": "local/output/"
}
]
}
You can read more about File Specs here.
(After answering this question here, we also updated the File Specs documentation with these examples).

After a lot of testing and experimenting i found that there are many ways of solving my main problem (getting latest version of package) but each of way require some function which is available in paid version. Like sort() in AQL or [RELEASE] in REST API. But i found that i still can get JSON with a full list of files and its properties. I can also download each single file. This led me to solution with simple python script. I can't publish whole but only the core which should bu fairly obvious
import requests, argparse
from packaging import version
...
query="""
items.find({
"type" : "file",
"$and":[{
"repo" : {"$match" : \"""" + args.repository + """\"},
"path" : {"$match" : \"""" + args.path + """\"}
}]
}).include("name","repo","path","size","property.*")
"""
auth=(args.username,args.password)
def clearVersion(ver: str):
new = ''
for letter in ver:
if letter.isnumeric() or letter == ".":
new+=letter
return new
def lastestArtifact(response: requests):
response = response.json()
latestVer = "0.0.0"
currentItemIndex = 0
chosenItemIndex = 0
for results in response["results"]:
for prop in results['properties']:
if prop["key"] == "tag":
if version.parse(clearVersion(prop["value"])) > version.parse(clearVersion(latestVer)):
latestVer = prop["value"]
chosenItemIndex = currentItemIndex
currentItemIndex += 1
return response["results"][chosenItemIndex]
req = requests.post(url,data=query,auth=auth)
if args.verbose:
print(req.text)
latest = lastestArtifact(req)
...
I just want to point that THIS IS NOT permanent solution. We just didnt want to buy license yet only because of one single problem. But if there will be more of such problems then we definetly buy PRO subscription.

How to get a sub-field of a struct type map, in the search response of YQL query in Vespa?

Sample Data:
"fields": {
"key1":0,
"key2":"no",
"Lang": {
"en": {
"firstName": "Vikrant",
"lastName":"Thakur"
},
"ch": {
"firstName": "维克兰特",
"lastName":"塔库尔"
}
}
}
Expected Response:
"fields": {
"Lang": {
"en": {
"firstName": "Vikrant",
"lastName":"Thakur"
}
}
}
I have added the following in my search-definition demo.sd:
struct lang {
field firstName type string {}
field lastName type string {}
}
field Lang type map <string, lang> {
indexing: summary
struct-field key {
indexing: summary | index | attribute
}
}
I want to write a yql query something like this (This doesn't work):
http://localhost:8080/search/?yql=select Lang.en from sources demo where key2 contains 'no';
My temporary workaround approach
I have implemented a custom searcher in MySearcher.java, through which I am able to extract the required sub-field and set a new field 'defaultLang', and remove the 'Lang' field. The response generated by the searcher:
"fields": {
"defaultLang": {
"firstName": "Vikrant",
"lastName":"Thakur"
}
}
I have written the following in MySearcher.java:
for (Hit hit: result.hits()) {
String language = "en"; //temporarily hard-coded
StructuredData Lang = (StructuredData) hit.getField("Lang");
Inspector o = Lang.inspect();
for (int j=0;j<o.entryCount();j++){
if (o.entry(j).field("key").asString("").equals(language)){
SlimeAdapter value = (SlimeAdapter) o.entry(j).field("value");
hit.setField("defaultLang",value);
break;
}
}
hit.removeField("Lang");
}
Edit-1: A more efficient way instead is to make use of the Inspectable interface and Inspector, like above (Thanks to #Jo Kristian Bergum)
But, in the above code, I am having to loop through all the languages to filter out the required one. I want to avoid this O(n) time-complexity and make use of the map structure to access it in O(1). (Because the languages may increase to 1000, and this would be done for each hit.)
All this is due to the StructuredData data type I am getting in the results. StructureData doesn't keep the Map Structure and rather gives an array of JSON like:
[{
"key": "en",
"value": {
"firstName": "Vikrant",
"lastName": "Thakur"
}
}, {
"key": "ch",
"value": {
"firstName": "维克兰特",
"lastName": "塔库尔"
}
}]
Please, suggest a better approach altogether, or any help with my current one. Both are appreciated.

The YQL sample query I guess is to illustrate what you want as that syntax is not valid. Picking a given key from the field Lang of type map can be done as you do in your searcher but deserializing into JSON and parsing the JSON is probably inefficient as StructuredData implements the Inspectable interface and you can inspect it directly without the need to go through JSON format. See https://docs.vespa.ai/documentation/reference/inspecting-structured-data.html

Elastic Search - Performing Geo Location Query on Bulk imported JSON

I have a file modified.json containing JSON documents which contain the following:
{"index":{"_index": "trial", "_type": "trial", "_id":"1"}}
{"clinics" : [{"price": 1048, "city": "Bangalore", "Location": {"lon": 77.38381692924742, "lat": 12.952155989068519}}, {"price": 1048, "city": "Bangalore", "Location": {"lon": 77.38381692924742, "lat": 12.952155989068519}}]
.... And more similar documents. I am running a bulk insert through this command:
curl -s -XPOST 'http://localhost:9200/_bulk' --data-binary #modified.json
Next, through the Sense (Google Chrome) plugin, I am issuing the following request
Server: localhost:9200/trial/trial
PUT _search
{
"mappings": {
"clinics": {
"properties": {
"Location": {"type": "geo_point"}
}
}
}
}
I don't know whether or not this is creating a mapping for the Location fields.
After issuing a search request, like this:
Server: localhost:9200/trial/trial
GET _search
{
"query": {
"filtered" : {
"query" : {
"match_all" : {}
},
"filter" : {
"geo_distance" : {
"distance" : "20km",
"Location" : {
"lat" : 12.958487287824958,
"lon" : 77.69648881178146
}
}
}
}
}
}
I am getting an error saying:
failed to find geo_point field [Location]]; }]",
Please help me regarding this. Also, if possible also guide me on how to do faceted search on this data, to show clinics within a range of distance: like between 20 and 30 kilometers

You need to reverse your ordering. PUT your mapping mapping first and then do the bulk insert.
By putting the map before the insert you are letting ES know what the datatype are so that it can index them correctly. ES allows you to add mappings to an existing index but the data that's already there isn't going to get re-indexed automatically. (This is a known issue).

How to make elasticsearch add the timestamp field to every document in all indices?

Elasticsearch experts,
I have been unable to find a simple way to just tell ElasticSearch to insert the _timestamp field for all the documents that are added in all the indices (and all document types).
I see an example for specific types:
http://www.elasticsearch.org/guide/reference/mapping/timestamp-field/
and also see an example for all indices for a specific type (using _all):
http://www.elasticsearch.org/guide/reference/api/admin-indices-put-mapping/
but I am unable to find any documentation on adding it by default for all documents that get added irrespective of the index and type.

Elasticsearch used to support automatically adding timestamps to documents being indexed, but deprecated this feature in 2.0.0
From the version 5.5 documentation:
The _timestamp and _ttl fields were deprecated and are now removed. As a replacement for _timestamp, you should populate a regular date field with the current timestamp on application side.

You can do this by providing it when creating your index.
$curl -XPOST localhost:9200/test -d '{
"settings" : {
"number_of_shards" : 1
},
"mappings" : {
"_default_":{
"_timestamp" : {
"enabled" : true,
"store" : true
}
}
}
}'
That will then automatically create a _timestamp for all stuff that you put in the index.
Then after indexing something when requesting the _timestamp field it will be returned.

Adding another way to get indexing timestamp. Hope this may help someone.
Ingest pipeline can be used to add timestamp when document is indexed. Here, is a sample example:
PUT _ingest/pipeline/indexed_at
{
"description": "Adds indexed_at timestamp to documents",
"processors": [
{
"set": {
"field": "_source.indexed_at",
"value": "{{_ingest.timestamp}}"
}
}
]
}
Earlier, elastic search was using named-pipelines because of which 'pipeline' param needs to be specified in the elastic search endpoint which is used to write/index documents. (Ref: link) This was bit troublesome as you would need to make changes in endpoints on application side.
With Elastic search version >= 6.5, you can now specify a default pipeline for an index using index.default_pipeline settings. (Refer link for details)
Here is the to set default pipeline:
PUT ms-test/_settings
{
"index.default_pipeline": "indexed_at"
}
I haven't tried out yet, as didn't upgraded to ES 6.5, but above command should work.

You can make use of default index pipelines, leverage the script processor, and thus emulate the auto_now_add functionality you may know from Django and DEFAULT GETDATE() from SQL.
The process of adding a default yyyy-MM-dd HH:mm:ss date goes like this:
1. Create the pipeline and specify which indices it'll be allowed to run on:
PUT _ingest/pipeline/auto_now_add
{
"description": "Assigns the current date if not yet present and if the index name is whitelisted",
"processors": [
{
"script": {
"source": """
// skip if not whitelisted
if (![ "myindex",
"logs-index",
"..."
].contains(ctx['_index'])) { return; }
// don't overwrite if present
if (ctx['created_at'] != null) { return; }
ctx['created_at'] = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss").format(new Date());
"""
}
}
]
}
Side note: the ingest processor's Painless script context is documented here.
2. Update the default_pipeline setting in all of your indices:
PUT _all/_settings
{
"index": {
"default_pipeline": "auto_now_add"
}
}
Side note: you can restrict the target indices using the multi-target syntax:
PUT myindex,logs-2021-*/_settings?allow_no_indices=true
{
"index": {
"default_pipeline": "auto_now_add"
}
}
3. Ingest a document to one of the configured indices:
PUT myindex/_doc/1
{
"abc": "def"
}
4. Verify that the date string has been added:
GET myindex/_search

An example for ElasticSearch 6.6.2 in Python 3:
from elasticsearch import Elasticsearch
es = Elasticsearch(hosts=["localhost"])
timestamp_pipeline_setting = {
"description": "insert timestamp field for all documents",
"processors": [
{
"set": {
"field": "ingest_timestamp",
"value": "{{_ingest.timestamp}}"
}
}
]
}
es.ingest.put_pipeline("timestamp_pipeline", timestamp_pipeline_setting)
conf = {
"settings": {
"number_of_shards": 2,
"number_of_replicas": 1,
"default_pipeline": "timestamp_pipeline"
},
"mappings": {
"articles":{
"dynamic": "false",
"_source" : {"enabled" : "true" },
"properties": {
"title": {
"type": "text",
},
"content": {
"type": "text",
},
}
}
}
}
response = es.indices.create(
index="articles_index",
body=conf,
ignore=400 # ignore 400 already exists code
)
print ('\nresponse:', response)
doc = {
'title': 'automatically adding a timestamp to documents',
'content': 'prior to version 5 of Elasticsearch, documents had a metadata field called _timestamp. When enabled, this _timestamp was automatically added to every document. It would tell you the exact time a document had been indexed.',
}
res = es.index(index="articles_index", doc_type="articles", id=100001, body=doc)
print(res)
res = es.get(index="articles_index", doc_type="articles", id=100001)
print(res)
About ES 7.x, the example should work after removing the doc_type related parameters as it's not supported any more.

first create index and properties of the index , such as field and datatype and then insert the data using the rest API.
below is the way to create index with the field properties.execute the following in kibana console
`PUT /vfq-jenkins
{
"mappings": {
"properties": {
"BUILD_NUMBER": { "type" : "double"},
"BUILD_ID" : { "type" : "double" },
"JOB_NAME" : { "type" : "text" },
"JOB_STATUS" : { "type" : "keyword" },
"time" : { "type" : "date" }
}}}`
the next step is to insert the data into that index:
curl -u elastic:changeme -X POST http://elasticsearch:9200/vfq-jenkins/_doc/?pretty
-H Content-Type: application/json -d '{
"BUILD_NUMBER":"83","BUILD_ID":"83","JOB_NAME":"OMS_LOG_ANA","JOB_STATUS":"SUCCESS" ,
"time" : "2019-09-08'T'12:39:00" }'

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

Default datatype mapping in elasticsearch - mapping

Related

Google Slides API reports Invalid requests[0].updateTableCellProperties: Invalid field: table_cell_properties

Artifactory and Jenkins - get file with newest/biggest custom property

How to get a sub-field of a struct type map, in the search response of YQL query in Vespa?

Elastic Search - Performing Geo Location Query on Bulk imported JSON

How to make elasticsearch add the timestamp field to every document in all indices?

Categories

Resources