get distinct values of string field using elasticsearch query Client java API - elasticsearch-5

I'm using elasticsearch 5.4
I need to get distinct values of fields using elasticsearch query
I'm trying this query but is not wokring
GET /index/index_type/_search
{
"size": 0,
"aggs": {
"distinct": {
"terms": {
"field": "status"
}
}
}
thank you

solution with client java api elasticsearch 5.4
import org.elasticsearch.search.aggregations.AggregationBuilders;
import org.elasticsearch.search.aggregations.bucket.terms.Terms;
TransportClient client;
SearchResponse response = client.prepareSearch(indexName).setTypes(typeName)
.addAggregation(AggregationBuilders.terms("distinct_line_status").field("line_status.keyword"))
.setSize(0).get();
//AggregationProfileShardResult searchHits = response.getAggregations().getAsMap();
System.out.println(response);
Terms genders = response.getAggregations().get("distinct_line_status");
// For each entry
for (Terms.Bucket entry : genders.getBuckets()) {
System.out.println(entry.getKey()); // Term
System.out.println(entry.getDocCount()); // Doc count
}

You can use cardinality to achieve this.
GET /index/index_type/_search
{
"size": 0,
"aggs": {
"distinct": {
"cardinality": {
"field": "status"
}
}
}

Related

Prometheus metric retruns no data

i have installed the prometheus-es-exporter for querying the elasticsearch and also i have written some queries.E.g one of the query looks like:
[query_database_connection_exception]
QueryIntervalSecs = 300
QueryIndices = logs.*
QueryJson = {
"size": 0,
"query": {
"query_string": {
"query": "message: \"com.microsoft.sqlserver.jdbc.SQLServerException: \" AND #timestamp:(>=now-1h AND <now)"
}
},
"aggs": {
"application": {
"terms": {
"field": "kubernetes.labels.app.keyword"
}
}
}
}
ES-Exporter exposes after the configuration the metric database_connection_exception_application_doc_count but i face the issue that sometimes i get in prometheus the error message:
This happens not only for this query but for other queries as well.My understanding and expectation is that if my query does not find the string com.microsoft.sqlserver.jdbc.SQLServerException for the last 1h it must return the value=0 in prometheus but for some reason it returns no data.How should i understand this?
ES-Exporter is running smoothly,health check of ES-Exporter and Elastic shows no error,all elastic nodes are at state green.

keep getting this error what is the proper syntax fornConditionType 'NUMBER_BETWEEN' requires exactly two ConditionValues, but 1 value was supplied

I am trying to use the conditional filter to check a range between two numbers using google sheets api I keep getting this syntax error how do I fix it please here is my code:
const addFilterViewRequest = [
{
'addFilterView' : {
filter : {
title : 'PO_Log Precentage',
range : {
sheetId : 1701531392,
'startRowIndex': 1,
'startColumnIndex':1
},
'criteria': {
19:{
'condition': {
'type': 'NUMBER_BETWEEN',
'values':[
{
"userEnteredValue" : "80"
}
],
}
}
}
}
}
}
]
Sheets.Spreadsheets.batchUpdate({ requests: addFilterViewRequest },spreadsheet_id);
}
ConditionType
NUMBER_BETWEEN
The cell's value must be between the two condition values. Supported by data validation, conditional formatting and filters. Requires exactly two ConditionValues .
'values':[
{
"userEnteredValue" : "0"
},
{
"userEnteredValue" : "80"
}
],

Elasticsearch validate API explain query terms from more like this against single field getting highlighted terms

I have an index, with effectively the converted word or pdf document plain text "document_texts", built on a Rails stack the ActiveModel is DocumentText using the elasticsearch rails gems, for model, and API. I want to be able to match similar word documents or pdf's based on the document text
I have been able to match documents against each other by using
response = DocumentText.search \
query: {
filtered: {
query: {
more_like_this: {
ids: ["12345"]
}
}
}
}
But I want to see HOW did the result set get queried, what were the query terms used to match the documents
Using the elasticsearch API gem I can do the following
client=Elasticsearch::Client.new log:true
client.indices.validate_query index: 'document_texts',
explain: true,
body: {
query: {
filtered: {
query: {
more_like_this: {
ids: ['12345']
}
}
}
}
}
But I get this in response
{"valid":true,"_shards":{"total":1,"successful":1,"failed":0},"explanations":[{"index":"document_texts","valid":true,"explanation":"+(like:null -_uid:document_text#12345)"}]}
I would like to find out how did the query get built, it uses upto 25 terms for the matching, what were those 25 terms and how can I get them from the query?
I'm not sure if its possible but I would like to see if I can get the 25 terms used by elasticsearches analyzer and then reapply the query with boosted values on the terms depending on my choice.
I also want to highlight this in the document text but tried this
response = DocumentText.search \
from: 0, size: 25,
query: {
filtered: {
query: {
more_like_this: {
ids: ["12345"]
}
},
filter: {
bool: {
must: [
{match: { documentable_type: model}}
]
}
}
}
},
highlight: {
pre_tags: ["<tag1>"],
post_tags: ["</tag1>"],
fields: {
doc_text: {
type_name: {
content: {term_vector: "with_positions_offsets"}
}
}
}
}
But this fails to produce anything, I think I was being rather hopeful. I know that this should be possible but would be keen to know if anyone has done this or the best approach. Any ideas?
Including some stop words just for anyone else out there this will give an easy way for it to show the terms used for the query. It doesnt solve the highlight issue but can give the terms used for the mlt matching process. Some other settings are used just to show
curl -XGET 'http://localhost:9200/document_texts/document_text/_validate/query?rewrite=true' -d '
{
"query": {
"filtered": {
"query": {
"more_like_this": {
"ids": ["12345"],
"min_term_freq": 1,
"max_query_terms": 50,
"stop_words": ["this","of"]
}
}
}
}
}'
https://github.com/elastic/elasticsearch-ruby/pull/359
Once this is merged this should be easier
client.indices.validate_query index: 'document_texts',
rewrite: true,
explain: true,
body: {
query: {
filtered: {
query: {
more_like_this: {
ids: ['10538']
}
}
}
}
}

Grails - ElasticSearch - QueryParsingException[[index] No query registered for [query]]; with elasticSearchHelper; JSON via curl works fine though

I have been working on a Grails project, clubbed with ElasticSearch ( v 20.6 ), with a custom build of elasticsearch-grails-plugin(to support geo_point indexing : v.20.6)
have been trying to do a filtered Search, while using script_fields (to calculate distance). Following is Closure & the generated JSON from the GXContentBuilder :
Closure
records = Domain.search(searchType:'dfs_query_and_fetch'){
query {
filtered = {
query = {
if(queryTxt){
query_string(query: queryTxt)
}else{
match_all {}
}
}
filter = {
geo_distance = {
distance = "${userDistance}km"
"location"{
lat = latlon[0]?:0.00
lon = latlon[1]?:0.00
}
}
}
}
}
script_fields = {
distance = {
script = "doc['location'].arcDistanceInKm($latlon)"
}
}
fields = ["_source"]
}
GXContentBuilder generated query JSON :
{
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"geo_distance": {
"distance": "5km",
"location": {
"lat": "37.752258",
"lon": "-121.949886"
}
}
}
}
},
"script_fields": {
"distance": {
"script": "doc['location'].arcDistanceInKm(37.752258, -121.949886)"
}
},
"fields": ["_source"]
}
The JSON query, using curl-way, works perfectly fine. But when I try to execute it from Groovy Code, I mean with this (taken from ElasticSearchService.groovy) where request is SearchRequest instance :
elasticSearchHelper.withElasticSearch { Client client ->
def response = client.search(request).actionGet()
}
It throws following error :
Failed to execute phase [dfs], total failure; shardFailures {[1][index][3]: SearchParseException[[index][3]: from[0],size[60]: Parse Failure [Failed to parse source [{"from":0,"size":60,"query_binary":"eyJxdWVyeSI6eyJmaWx0ZXJlZCI6eyJxdWVyeSI6eyJtYXRjaF9hbGwiOnt9fSwiZmlsdGVyIjp7Imdlb19kaXN0YW5jZSI6eyJkaXN0YW5jZSI6IjVrbSIsImNvbXBhbnkuYWRkcmVzcy5sb2NhdGlvbiI6eyJsYXQiOiIzNy43NTIyNTgiLCJsb24iOiItMTIxLjk0OTg4NiJ9fX19fSwic2NyaXB0X2ZpZWxkcyI6eyJkaXN0YW5jZSI6eyJzY3JpcHQiOiJkb2NbJ2NvbXBhbnkuYWRkcmVzcy5sb2NhdGlvbiddLmFyY0Rpc3RhbmNlSW5LbSgzNy43NTIyNTgsIC0xMjEuOTQ5ODg2KSJ9fSwiZmllbGRzIjpbIl9zb3VyY2UiXX0=","explain":true}]]]; nested: QueryParsingException[[index] No query registered for [query]]; }
The above Closure works if I only use filtered = { ... } script_fields = { ... } but it doesn't return the calculated distance.
Anyone had any similar problem ?
Thanks in advance :)
It's possible that I might have been dim to point out the obvious here :P

How to make elasticsearch add the timestamp field to every document in all indices?

Elasticsearch experts,
I have been unable to find a simple way to just tell ElasticSearch to insert the _timestamp field for all the documents that are added in all the indices (and all document types).
I see an example for specific types:
http://www.elasticsearch.org/guide/reference/mapping/timestamp-field/
and also see an example for all indices for a specific type (using _all):
http://www.elasticsearch.org/guide/reference/api/admin-indices-put-mapping/
but I am unable to find any documentation on adding it by default for all documents that get added irrespective of the index and type.
Elasticsearch used to support automatically adding timestamps to documents being indexed, but deprecated this feature in 2.0.0
From the version 5.5 documentation:
The _timestamp and _ttl fields were deprecated and are now removed. As a replacement for _timestamp, you should populate a regular date field with the current timestamp on application side.
You can do this by providing it when creating your index.
$curl -XPOST localhost:9200/test -d '{
"settings" : {
"number_of_shards" : 1
},
"mappings" : {
"_default_":{
"_timestamp" : {
"enabled" : true,
"store" : true
}
}
}
}'
That will then automatically create a _timestamp for all stuff that you put in the index.
Then after indexing something when requesting the _timestamp field it will be returned.
Adding another way to get indexing timestamp. Hope this may help someone.
Ingest pipeline can be used to add timestamp when document is indexed. Here, is a sample example:
PUT _ingest/pipeline/indexed_at
{
"description": "Adds indexed_at timestamp to documents",
"processors": [
{
"set": {
"field": "_source.indexed_at",
"value": "{{_ingest.timestamp}}"
}
}
]
}
Earlier, elastic search was using named-pipelines because of which 'pipeline' param needs to be specified in the elastic search endpoint which is used to write/index documents. (Ref: link) This was bit troublesome as you would need to make changes in endpoints on application side.
With Elastic search version >= 6.5, you can now specify a default pipeline for an index using index.default_pipeline settings. (Refer link for details)
Here is the to set default pipeline:
PUT ms-test/_settings
{
"index.default_pipeline": "indexed_at"
}
I haven't tried out yet, as didn't upgraded to ES 6.5, but above command should work.
You can make use of default index pipelines, leverage the script processor, and thus emulate the auto_now_add functionality you may know from Django and DEFAULT GETDATE() from SQL.
The process of adding a default yyyy-MM-dd HH:mm:ss date goes like this:
1. Create the pipeline and specify which indices it'll be allowed to run on:
PUT _ingest/pipeline/auto_now_add
{
"description": "Assigns the current date if not yet present and if the index name is whitelisted",
"processors": [
{
"script": {
"source": """
// skip if not whitelisted
if (![ "myindex",
"logs-index",
"..."
].contains(ctx['_index'])) { return; }
// don't overwrite if present
if (ctx['created_at'] != null) { return; }
ctx['created_at'] = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss").format(new Date());
"""
}
}
]
}
Side note: the ingest processor's Painless script context is documented here.
2. Update the default_pipeline setting in all of your indices:
PUT _all/_settings
{
"index": {
"default_pipeline": "auto_now_add"
}
}
Side note: you can restrict the target indices using the multi-target syntax:
PUT myindex,logs-2021-*/_settings?allow_no_indices=true
{
"index": {
"default_pipeline": "auto_now_add"
}
}
3. Ingest a document to one of the configured indices:
PUT myindex/_doc/1
{
"abc": "def"
}
4. Verify that the date string has been added:
GET myindex/_search
An example for ElasticSearch 6.6.2 in Python 3:
from elasticsearch import Elasticsearch
es = Elasticsearch(hosts=["localhost"])
timestamp_pipeline_setting = {
"description": "insert timestamp field for all documents",
"processors": [
{
"set": {
"field": "ingest_timestamp",
"value": "{{_ingest.timestamp}}"
}
}
]
}
es.ingest.put_pipeline("timestamp_pipeline", timestamp_pipeline_setting)
conf = {
"settings": {
"number_of_shards": 2,
"number_of_replicas": 1,
"default_pipeline": "timestamp_pipeline"
},
"mappings": {
"articles":{
"dynamic": "false",
"_source" : {"enabled" : "true" },
"properties": {
"title": {
"type": "text",
},
"content": {
"type": "text",
},
}
}
}
}
response = es.indices.create(
index="articles_index",
body=conf,
ignore=400 # ignore 400 already exists code
)
print ('\nresponse:', response)
doc = {
'title': 'automatically adding a timestamp to documents',
'content': 'prior to version 5 of Elasticsearch, documents had a metadata field called _timestamp. When enabled, this _timestamp was automatically added to every document. It would tell you the exact time a document had been indexed.',
}
res = es.index(index="articles_index", doc_type="articles", id=100001, body=doc)
print(res)
res = es.get(index="articles_index", doc_type="articles", id=100001)
print(res)
About ES 7.x, the example should work after removing the doc_type related parameters as it's not supported any more.
first create index and properties of the index , such as field and datatype and then insert the data using the rest API.
below is the way to create index with the field properties.execute the following in kibana console
`PUT /vfq-jenkins
{
"mappings": {
"properties": {
"BUILD_NUMBER": { "type" : "double"},
"BUILD_ID" : { "type" : "double" },
"JOB_NAME" : { "type" : "text" },
"JOB_STATUS" : { "type" : "keyword" },
"time" : { "type" : "date" }
}}}`
the next step is to insert the data into that index:
curl -u elastic:changeme -X POST http://elasticsearch:9200/vfq-jenkins/_doc/?pretty
-H Content-Type: application/json -d '{
"BUILD_NUMBER":"83","BUILD_ID":"83","JOB_NAME":"OMS_LOG_ANA","JOB_STATUS":"SUCCESS" ,
"time" : "2019-09-08'T'12:39:00" }'

Resources