When I use SageMaker A2i I get timeSpentInSeconds in the object returned, which is useful as we can get stats on how long it takes for workers to complete certain tasks and plan around it. However for sagemaker groundtruth, I receive a list of objects like this;
{
"datasetObjectId": "0",
"consolidatedAnnotation": {
"content": {
"translation2": {
"annotationsFromAllWorkers": [
{
"workerId": "private.us-east-2.ex11121331faeb5c25c",
"annotationData": {
"content": "{\"semantic-similarity\":{\"label\":\"New\"}}"
}
}
]
}
}
}
}
No information on time to complete is included....is there a way to get this included?
i have installed the prometheus-es-exporter for querying the elasticsearch and also i have written some queries.E.g one of the query looks like:
[query_database_connection_exception]
QueryIntervalSecs = 300
QueryIndices = logs.*
QueryJson = {
"size": 0,
"query": {
"query_string": {
"query": "message: \"com.microsoft.sqlserver.jdbc.SQLServerException: \" AND #timestamp:(>=now-1h AND <now)"
}
},
"aggs": {
"application": {
"terms": {
"field": "kubernetes.labels.app.keyword"
}
}
}
}
ES-Exporter exposes after the configuration the metric database_connection_exception_application_doc_count but i face the issue that sometimes i get in prometheus the error message:
This happens not only for this query but for other queries as well.My understanding and expectation is that if my query does not find the string com.microsoft.sqlserver.jdbc.SQLServerException for the last 1h it must return the value=0 in prometheus but for some reason it returns no data.How should i understand this?
ES-Exporter is running smoothly,health check of ES-Exporter and Elastic shows no error,all elastic nodes are at state green.
I'm developing a workspace add-on with alternate runtime; I configured the add-on to work with spreadsheets and I need to retrieve the spreadsheet id when the user opens the add-on. For test purposes I created a cloud function that contains the business logic.
My deployment.json file is the following:
{
"oauthScopes": ["https://www.googleapis.com/auth/spreadsheets.currentonly", "https://www.googleapis.com/auth/drive.file"],
"addOns": {
"common": {
"name": "My Spreadsheet Add-on",
"logoUrl": "https://cdn.icon-icons.com/icons2/2070/PNG/512/penguin_icon_126624.png"
},
"sheets": {
"homepageTrigger": {
"runFunction": "cloudFunctionUrl"
}
}
}
}
However, the request I receive seems to be empty and without the id of the spreadsheet in which I am, while I was expecting to have the spreadsheet id as per documentation
Is there anything else I need to configure?
The relevant code is quite easy, I'm just printing the request:
exports.getSpreadsheetId = function addonsHomePage (req, res) { console.log('called', req.method); console.log('body', req.body); res.send(createAction()); };
the information showed in the log is:
sheets: {}
Thank you
UPDATE It's a known issue of the engineering team, here you can find the ticket
The information around Workspace Add-ons is pretty new and the documentation is pretty sparse.
In case anyone else comes across this issue ... I solved it in python using CloudRun by creating a button that checks for for the object then if there is no object it requests access to the sheet in question.
from flask import Flask
from flask import request
app = Flask(__name__)
#app.route('/', methods=['POST'])
def test_addon_homepage():
req_body = request.get_json()
sheet_info = req_body.get('sheets')
card = {
"action": {
"navigations": [
{
"pushCard": {
"sections": [
{
"widgets": [
{
"textParagraph": {
"text": f"Hello {sheet_info.get('title','Auth Needed')}!"
}
}
]
}
]
}
}
]
}
}
if not sheet_info:
card = create_file_auth_button(card)
return card
def create_file_auth_button(self, card):
card['action']['navigations'][0]['pushCard']['fixedFooter'] = {
'primaryButton': {
'text': 'Authorize file access',
'onClick': {
'action': {
'function': 'https://example-cloudrun.a.run.app/authorize_sheet'
}
}
}
}
return card
#app.route('/authorize_sheet', methods=['POST'])
def authorize_sheet():
payload = {
'renderActions': {
'hostAppAction': {
'editorAction': {
'requestFileScopeForActiveDocument': {}
}
}
}
}
return payload
I'm using the following function score for outfits purchased:
{
"query": {
"function_score": {
"field_value_factor": {
"field": "purchased",
"factor": 1.2,
"modifier": "sqrt",
"missing": 1
}
}
}
}
However, when I create a search - I get the following error:
"type":"illegal_argument_exception","reason":"Fielddata is disabled on text fields by default. Set fielddata=true on [purchased] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory. Alternatively use a keyword field instead."
The syntax is correct for the search as I've run it locally and it works perfectly. I'm now running it on my server and it's not workings. Do I need to define purchased as an integer somewhere or is this due to something else?
The purchased field is an analyzed string field, hence the error you see.
When indexing your documents, make sure that the numbers are not within double quotes, i.e.
Wrong:
{
"purchased": "324"
}
Right:
{
"purchased": 324
}
...or if you can't change the source documents (because you're not responsible for producing them), make sure that you create a mapping that defines the purchased field as being an integer field.
{
"your_type": {
"properties": {
"purchased": {
"type": "integer"
}
}
}
}
Elasticsearch experts,
I have been unable to find a simple way to just tell ElasticSearch to insert the _timestamp field for all the documents that are added in all the indices (and all document types).
I see an example for specific types:
http://www.elasticsearch.org/guide/reference/mapping/timestamp-field/
and also see an example for all indices for a specific type (using _all):
http://www.elasticsearch.org/guide/reference/api/admin-indices-put-mapping/
but I am unable to find any documentation on adding it by default for all documents that get added irrespective of the index and type.
Elasticsearch used to support automatically adding timestamps to documents being indexed, but deprecated this feature in 2.0.0
From the version 5.5 documentation:
The _timestamp and _ttl fields were deprecated and are now removed. As a replacement for _timestamp, you should populate a regular date field with the current timestamp on application side.
You can do this by providing it when creating your index.
$curl -XPOST localhost:9200/test -d '{
"settings" : {
"number_of_shards" : 1
},
"mappings" : {
"_default_":{
"_timestamp" : {
"enabled" : true,
"store" : true
}
}
}
}'
That will then automatically create a _timestamp for all stuff that you put in the index.
Then after indexing something when requesting the _timestamp field it will be returned.
Adding another way to get indexing timestamp. Hope this may help someone.
Ingest pipeline can be used to add timestamp when document is indexed. Here, is a sample example:
PUT _ingest/pipeline/indexed_at
{
"description": "Adds indexed_at timestamp to documents",
"processors": [
{
"set": {
"field": "_source.indexed_at",
"value": "{{_ingest.timestamp}}"
}
}
]
}
Earlier, elastic search was using named-pipelines because of which 'pipeline' param needs to be specified in the elastic search endpoint which is used to write/index documents. (Ref: link) This was bit troublesome as you would need to make changes in endpoints on application side.
With Elastic search version >= 6.5, you can now specify a default pipeline for an index using index.default_pipeline settings. (Refer link for details)
Here is the to set default pipeline:
PUT ms-test/_settings
{
"index.default_pipeline": "indexed_at"
}
I haven't tried out yet, as didn't upgraded to ES 6.5, but above command should work.
You can make use of default index pipelines, leverage the script processor, and thus emulate the auto_now_add functionality you may know from Django and DEFAULT GETDATE() from SQL.
The process of adding a default yyyy-MM-dd HH:mm:ss date goes like this:
1. Create the pipeline and specify which indices it'll be allowed to run on:
PUT _ingest/pipeline/auto_now_add
{
"description": "Assigns the current date if not yet present and if the index name is whitelisted",
"processors": [
{
"script": {
"source": """
// skip if not whitelisted
if (![ "myindex",
"logs-index",
"..."
].contains(ctx['_index'])) { return; }
// don't overwrite if present
if (ctx['created_at'] != null) { return; }
ctx['created_at'] = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss").format(new Date());
"""
}
}
]
}
Side note: the ingest processor's Painless script context is documented here.
2. Update the default_pipeline setting in all of your indices:
PUT _all/_settings
{
"index": {
"default_pipeline": "auto_now_add"
}
}
Side note: you can restrict the target indices using the multi-target syntax:
PUT myindex,logs-2021-*/_settings?allow_no_indices=true
{
"index": {
"default_pipeline": "auto_now_add"
}
}
3. Ingest a document to one of the configured indices:
PUT myindex/_doc/1
{
"abc": "def"
}
4. Verify that the date string has been added:
GET myindex/_search
An example for ElasticSearch 6.6.2 in Python 3:
from elasticsearch import Elasticsearch
es = Elasticsearch(hosts=["localhost"])
timestamp_pipeline_setting = {
"description": "insert timestamp field for all documents",
"processors": [
{
"set": {
"field": "ingest_timestamp",
"value": "{{_ingest.timestamp}}"
}
}
]
}
es.ingest.put_pipeline("timestamp_pipeline", timestamp_pipeline_setting)
conf = {
"settings": {
"number_of_shards": 2,
"number_of_replicas": 1,
"default_pipeline": "timestamp_pipeline"
},
"mappings": {
"articles":{
"dynamic": "false",
"_source" : {"enabled" : "true" },
"properties": {
"title": {
"type": "text",
},
"content": {
"type": "text",
},
}
}
}
}
response = es.indices.create(
index="articles_index",
body=conf,
ignore=400 # ignore 400 already exists code
)
print ('\nresponse:', response)
doc = {
'title': 'automatically adding a timestamp to documents',
'content': 'prior to version 5 of Elasticsearch, documents had a metadata field called _timestamp. When enabled, this _timestamp was automatically added to every document. It would tell you the exact time a document had been indexed.',
}
res = es.index(index="articles_index", doc_type="articles", id=100001, body=doc)
print(res)
res = es.get(index="articles_index", doc_type="articles", id=100001)
print(res)
About ES 7.x, the example should work after removing the doc_type related parameters as it's not supported any more.
first create index and properties of the index , such as field and datatype and then insert the data using the rest API.
below is the way to create index with the field properties.execute the following in kibana console
`PUT /vfq-jenkins
{
"mappings": {
"properties": {
"BUILD_NUMBER": { "type" : "double"},
"BUILD_ID" : { "type" : "double" },
"JOB_NAME" : { "type" : "text" },
"JOB_STATUS" : { "type" : "keyword" },
"time" : { "type" : "date" }
}}}`
the next step is to insert the data into that index:
curl -u elastic:changeme -X POST http://elasticsearch:9200/vfq-jenkins/_doc/?pretty
-H Content-Type: application/json -d '{
"BUILD_NUMBER":"83","BUILD_ID":"83","JOB_NAME":"OMS_LOG_ANA","JOB_STATUS":"SUCCESS" ,
"time" : "2019-09-08'T'12:39:00" }'