install plugin for Open Distro - elasticsearch-opendistro

Amazon Elasticsearch Service offers k-Nearest Neighbor (k-NN) search which can enhance search by similarity use cases.
https://aws.amazon.com/about-aws/whats-new/2020/03/build-k-nearest-neighbor-similarity-search-engine-with-amazon-elasticsearch-service/
I tried this official code that I found here...
https://github.com/opendistro-for-elasticsearch/k-NN
PUT /myindex
{
"settings" : {
"index": {
"knn": true
}
},
"mappings": {
"properties": {
"my_vector1": {
"type": "knn_vector",
"dimension": 2
},
"my_vector2": {
"type": "knn_vector",
"dimension": 4
},
"my_vector3": {
"type": "knn_vector",
"dimension": 8
}
}
}
}
Getting this error:
"unknown setting [index.knn] please check that any required plugins
are installed, or check the breaking changes documentation for removed
settings"
How do I check if my Elastic installation supports this feature?

t2.small and t2.medium instance types are not supported. (It is not mentioned anywhere in the documentation.) It worked as expected when r5.large instance type was selected.

Related

GAS - create google slide scribble line with script

I would like to create a line with some point array with google-slide-api script. but I do not found any document about how to do it.
Then I create a line with scribble tool manually then read it by google-slide-api. I got below output:
- Slide #1 contains 1 elements.
elem #0 - {
"line": {
"lineProperties": {
"dashStyle": "SOLID",
"endArrow": "NONE",
"lineFill": {
"solidFill": {
"alpha": 1,
"color": {
"themeColor": "DARK2"
}
}
},
"startArrow": "NONE",
"weight": {
"magnitude": 9525,
"unit": "EMU"
}
}
},
"objectId": "gf0b4bb3376_0_2",
"size": {
"height": {
"magnitude": 786825,
"unit": "EMU"
},
"width": {
"magnitude": 763325,
"unit": "EMU"
}
},
"transform": {
"scaleX": 1,
"scaleY": 1,
"translateX": 1196175,
"translateY": 777044.3325,
"unit": "EMU"
}
}
That means google slide actual create it as a line type but nothing special to define the data points! then it's not possible to create the same from the output json data.
Answer:
Unfortunately, this isn't currently possible.
More Information:
As explained by Iamblichus in this answer, the Slides API does not support this kind of line, and as you have pointed out smoothing between separate lines does not work.
Feature Request:
You can however let Google know that this is a feature that is important for access to their APIs, and that you would like to request they implement it.
Google's Issue Tracker is a place for developers to report issues and make feature requests for their development services, I'd urge you to make a feature request there. The best component to file this under would be the Google Slides component, with the Feature Request template.

Artifactory and Jenkins - get file with newest/biggest custom property

I have generic repository "my_repo". I uploaded files there from jenkins with to paths like my_repo/branch_buildNumber/package.tar.gz and with custom property "tag" like "1.9.0","1.10.0" etc. I want to get item/file with latest/newest tag.
I tried to modify Example 2 from this link ...
https://www.jfrog.com/confluence/display/JFROG/Using+File+Specs#UsingFileSpecs-Examples
... and add sorting and limit the way it was done here ...
https://www.jfrog.com/confluence/display/JFROG/Artifactory+Query+Language#ArtifactoryQueryLanguage-limitDisplayLimitsandPagination
But im getting "unknown property desc" error.
The Jenkins Artifactory Plugin, like most of the JFrog clients, supports File Specs for downloading and uploading generic files.
The File Specs schema is described here. When creating a File Spec for downloading files, you have the option of using the "pattern" property, which can include wildcards. For example, the following spec downloads all the zip files from the my-local-repo repository into the local froggy directory:
{
"files": [
{
"pattern": "my-local-repo/*.zip",
"target": "froggy/"
}
]
}
Alternatively, you can use "aql" instead of "pattern". The following spec, provides the same result as the previous one:
{
"files": [
{
"aql": {
"items.find": {
"repo": "my-local-repo",
"$or": [
{
"$and": [
{
"path": {
"$match": "*"
},
"name": {
"$match": "*.zip"
}
}
]
}
]
}
},
"target": "froggy/"
}
]
}
The allowed AQL syntax inside File Specs does not include everything the Artifactory Query Language allows. For examples, you can't use the "include" or "sort" clauses. These limitations were put in place, to make the response structure known and constant.
Sorting however is still available with File Specs, regardless of whether you choose to use "pattern" or "aql". It is supported throw the "sortBy", "sortOrder", "limit" and "offset" File Spec properties.
For example, the following File Spec, will download only the 3 largest zip file files:
{
"files": [
{
"aql": {
"items.find": {
"repo": "my-local-repo",
"$or": [
{
"$and": [
{
"path": {
"$match": "*"
},
"name": {
"$match": "*.zip"
}
}
]
}
]
}
},
"sortBy": ["size"],
"sortOrder": "desc",
"limit": 3,
"target": "froggy/"
}
]
}
And you can do the same with "pattern", instead of "aql":
{
"files": [
{
"pattern": "my-local-repo/*.zip",
"sortBy": ["size"],
"sortOrder": "desc",
"limit": 3,
"target": "local/output/"
}
]
}
You can read more about File Specs here.
(After answering this question here, we also updated the File Specs documentation with these examples).
After a lot of testing and experimenting i found that there are many ways of solving my main problem (getting latest version of package) but each of way require some function which is available in paid version. Like sort() in AQL or [RELEASE] in REST API. But i found that i still can get JSON with a full list of files and its properties. I can also download each single file. This led me to solution with simple python script. I can't publish whole but only the core which should bu fairly obvious
import requests, argparse
from packaging import version
...
query="""
items.find({
"type" : "file",
"$and":[{
"repo" : {"$match" : \"""" + args.repository + """\"},
"path" : {"$match" : \"""" + args.path + """\"}
}]
}).include("name","repo","path","size","property.*")
"""
auth=(args.username,args.password)
def clearVersion(ver: str):
new = ''
for letter in ver:
if letter.isnumeric() or letter == ".":
new+=letter
return new
def lastestArtifact(response: requests):
response = response.json()
latestVer = "0.0.0"
currentItemIndex = 0
chosenItemIndex = 0
for results in response["results"]:
for prop in results['properties']:
if prop["key"] == "tag":
if version.parse(clearVersion(prop["value"])) > version.parse(clearVersion(latestVer)):
latestVer = prop["value"]
chosenItemIndex = currentItemIndex
currentItemIndex += 1
return response["results"][chosenItemIndex]
req = requests.post(url,data=query,auth=auth)
if args.verbose:
print(req.text)
latest = lastestArtifact(req)
...
I just want to point that THIS IS NOT permanent solution. We just didnt want to buy license yet only because of one single problem. But if there will be more of such problems then we definetly buy PRO subscription.

With Google Cloud Speech-to-text, why do I get different results for the same audio file, depending on which bucket do I put it into?

I am trying to use Google Cloud Speech-to-text, using the client libraries, from a node.js environment, and I see something I don't understand: I get a different result for the same example audio file, and the same configuration, depending on whether I am using it from the original sample bucket, or from my own bucket.
There are the requests and responses:
The baseline is Google's own test data file, available here: https://storage.googleapis.com/cloud-samples-tests/speech/brooklyn.flac
Request:
{
"config": {
"encoding": "FLAC",
"languageCode": "en-US",
"sampleRateHertz": 16000,
"enableAutomaticPunctuation": true
},
"audio": {
"uri": "gs://cloud-samples-tests/speech/brooklyn.flac"
}
}
Response:
{
"results": [
{
"alternatives": [
{
"transcript": "How old is the Brooklyn Bridge?",
"confidence": 0.9831430315971375
}
]
}
]
}
So far, so good. But, if I download this audio file, re-upload it to my own bucket, and do the same, then:
Request:
{
"config": {
"encoding": "FLAC",
"languageCode": "en-US",
"sampleRateHertz": 16000,
"enableAutomaticPunctuation": true
},
"audio": {
"uri": "gs://goe-transcript-creation/brooklyn.flac"
}
}
Response:
{
"results": [
{
"alternatives": [
{
"transcript": "how old is",
"confidence": 0.8902621865272522
}
]
}
]
}
As you can see this is the same request. The re-uploaded audio data is here: https://storage.googleapis.com/goe-transcript-creation/brooklyn.flac
This the exact same file as in the first example... not a bit of difference.
Still, the results are different; I only get half of the sentence.
What am I missing here? Thanks.
Update 1:
The same thing happens with the CLI tool, too:
$ gcloud ml speech recognize gs://cloud-samples-tests/speech/brooklyn.flac --language-code=en-US
{
"results": [
{
"alternatives": [
{
"confidence": 0.98314303,
"transcript": "how old is the Brooklyn Bridge"
}
]
}
]
}
$ gcloud ml speech recognize gs://goe-transcript-creation/brooklyn.flac --language-code=en-US
ERROR: (gcloud.ml.speech.recognize) INVALID_ARGUMENT: Invalid recognition 'config': bad encoding..
$ gcloud ml speech recognize gs://goe-transcript-creation/brooklyn.flac --language-code=en-US --encoding=FLAC
ERROR: (gcloud.ml.speech.recognize) INVALID_ARGUMENT: Invalid recognition 'config': bad sample rate hertz.
$ gcloud ml speech recognize gs://goe-transcript-creation/brooklyn.flac --language-code=en-US --encoding=FLAC --sample-rate=16000
{
"results": [
{
"alternatives": [
{
"confidence": 0.8902483,
"transcript": "how old is"
}
]
}
]
}
It's also interesting that when pulling the audio from the other bucket, I need to specify encoding and sample rate, otherwise it doesn't work... but it's not necessary when I am using the original test bucket.
Update 2:
If I don't use Google Cloud Storage, but upload the data directly in the speech-to-text request, it works as intended:
$ gcloud ml speech recognize brooklyn.flac --language-code=en-US
{
"results": [
{
"alternatives": [
{
"confidence": 0.98314303,
"transcript": "how old is the Brooklyn Bridge"
}
]
}
]
}
So the problem doesn't seems to be with the recognition itself, but accessing the audio data. The obvious guess would be that maybe it's the fault of the uploading, and the data is somehow corrupted along the way?
We can verify that by pulling the data from the cloud, and comparing with the original. It doesn't seem to be broken.
So maybe it's a problem when the S-T-T service is accessing the storage service? But why with one bucket only? Or is it some kind of file metadata problem?

Drag and drop a GeoJSON with linked CRS

I have a GeoJSON file that looks like this:
{
"type": "FeatureCollection",
"crs": {
"type": "link",
"properties": {
"href": "http://spatialreference.org/ref/epsg/32198/proj4/",
"type": "proj4"
}
},
"features": [
{
"type": "Feature",
"geometry": {
"type": "Point",
"coordinates": [200000, 20000]
},
"properties": {
"id": 1,
"name": "foo"
}
}
]
}
As you can see, the crs definition uses the link type, which is documented here: http://geojson.org/geojson-spec.html#linked-crs
I'm dropping the file in an OL3 map that has the ol.interaction.DragDrop interaction enabled, but it fails to load it. OpenLayers 3 doesn't currently support this type of crs definition, thus the reason it fails to load it. It supports those of type name and EPSG, see: https://github.com/openlayers/ol3/blob/master/src/ol/format/geojsonformat.js#L484 (snippet below):
if (crs.type == 'name') {
return ol.proj.get(crs.properties.name);
} else if (crs.type == 'EPSG') {
// 'EPSG' is not part of the GeoJSON specification, but is generated by
// GeoServer.
// TODO: remove this when http://jira.codehaus.org/browse/GEOS-5996
// is fixed and widely deployed.
return ol.proj.get('EPSG:' + crs.properties.code);
} else {
goog.asserts.fail('Unknown crs.type: ' + crs.type);
return null;
}
Looking at it, I don't know if it would be possible to natively support the link type inside OpenLayers directly, as it would require to do an asynchronous request to fetch the projection definition within code that's synchronous. I suspect that I'm stuck with this problem.
I'm looking for an alternative to approach the problem, or maybe I'm just wrong about the fact that it could be possible to support this (with a proper patch) natively in OL3.
Any hint ?
Alexandre, your best bet is to avoid using the old GeoJSON linked CRS (which is very poorly supported by software) and either 1) convert your data to GeoJSON's default WGS84 long/lat – this is the best option by far or 2) use a CRS name like "urn:ogc:def:crs:EPSG::32198".

How to make elasticsearch add the timestamp field to every document in all indices?

Elasticsearch experts,
I have been unable to find a simple way to just tell ElasticSearch to insert the _timestamp field for all the documents that are added in all the indices (and all document types).
I see an example for specific types:
http://www.elasticsearch.org/guide/reference/mapping/timestamp-field/
and also see an example for all indices for a specific type (using _all):
http://www.elasticsearch.org/guide/reference/api/admin-indices-put-mapping/
but I am unable to find any documentation on adding it by default for all documents that get added irrespective of the index and type.
Elasticsearch used to support automatically adding timestamps to documents being indexed, but deprecated this feature in 2.0.0
From the version 5.5 documentation:
The _timestamp and _ttl fields were deprecated and are now removed. As a replacement for _timestamp, you should populate a regular date field with the current timestamp on application side.
You can do this by providing it when creating your index.
$curl -XPOST localhost:9200/test -d '{
"settings" : {
"number_of_shards" : 1
},
"mappings" : {
"_default_":{
"_timestamp" : {
"enabled" : true,
"store" : true
}
}
}
}'
That will then automatically create a _timestamp for all stuff that you put in the index.
Then after indexing something when requesting the _timestamp field it will be returned.
Adding another way to get indexing timestamp. Hope this may help someone.
Ingest pipeline can be used to add timestamp when document is indexed. Here, is a sample example:
PUT _ingest/pipeline/indexed_at
{
"description": "Adds indexed_at timestamp to documents",
"processors": [
{
"set": {
"field": "_source.indexed_at",
"value": "{{_ingest.timestamp}}"
}
}
]
}
Earlier, elastic search was using named-pipelines because of which 'pipeline' param needs to be specified in the elastic search endpoint which is used to write/index documents. (Ref: link) This was bit troublesome as you would need to make changes in endpoints on application side.
With Elastic search version >= 6.5, you can now specify a default pipeline for an index using index.default_pipeline settings. (Refer link for details)
Here is the to set default pipeline:
PUT ms-test/_settings
{
"index.default_pipeline": "indexed_at"
}
I haven't tried out yet, as didn't upgraded to ES 6.5, but above command should work.
You can make use of default index pipelines, leverage the script processor, and thus emulate the auto_now_add functionality you may know from Django and DEFAULT GETDATE() from SQL.
The process of adding a default yyyy-MM-dd HH:mm:ss date goes like this:
1. Create the pipeline and specify which indices it'll be allowed to run on:
PUT _ingest/pipeline/auto_now_add
{
"description": "Assigns the current date if not yet present and if the index name is whitelisted",
"processors": [
{
"script": {
"source": """
// skip if not whitelisted
if (![ "myindex",
"logs-index",
"..."
].contains(ctx['_index'])) { return; }
// don't overwrite if present
if (ctx['created_at'] != null) { return; }
ctx['created_at'] = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss").format(new Date());
"""
}
}
]
}
Side note: the ingest processor's Painless script context is documented here.
2. Update the default_pipeline setting in all of your indices:
PUT _all/_settings
{
"index": {
"default_pipeline": "auto_now_add"
}
}
Side note: you can restrict the target indices using the multi-target syntax:
PUT myindex,logs-2021-*/_settings?allow_no_indices=true
{
"index": {
"default_pipeline": "auto_now_add"
}
}
3. Ingest a document to one of the configured indices:
PUT myindex/_doc/1
{
"abc": "def"
}
4. Verify that the date string has been added:
GET myindex/_search
An example for ElasticSearch 6.6.2 in Python 3:
from elasticsearch import Elasticsearch
es = Elasticsearch(hosts=["localhost"])
timestamp_pipeline_setting = {
"description": "insert timestamp field for all documents",
"processors": [
{
"set": {
"field": "ingest_timestamp",
"value": "{{_ingest.timestamp}}"
}
}
]
}
es.ingest.put_pipeline("timestamp_pipeline", timestamp_pipeline_setting)
conf = {
"settings": {
"number_of_shards": 2,
"number_of_replicas": 1,
"default_pipeline": "timestamp_pipeline"
},
"mappings": {
"articles":{
"dynamic": "false",
"_source" : {"enabled" : "true" },
"properties": {
"title": {
"type": "text",
},
"content": {
"type": "text",
},
}
}
}
}
response = es.indices.create(
index="articles_index",
body=conf,
ignore=400 # ignore 400 already exists code
)
print ('\nresponse:', response)
doc = {
'title': 'automatically adding a timestamp to documents',
'content': 'prior to version 5 of Elasticsearch, documents had a metadata field called _timestamp. When enabled, this _timestamp was automatically added to every document. It would tell you the exact time a document had been indexed.',
}
res = es.index(index="articles_index", doc_type="articles", id=100001, body=doc)
print(res)
res = es.get(index="articles_index", doc_type="articles", id=100001)
print(res)
About ES 7.x, the example should work after removing the doc_type related parameters as it's not supported any more.
first create index and properties of the index , such as field and datatype and then insert the data using the rest API.
below is the way to create index with the field properties.execute the following in kibana console
`PUT /vfq-jenkins
{
"mappings": {
"properties": {
"BUILD_NUMBER": { "type" : "double"},
"BUILD_ID" : { "type" : "double" },
"JOB_NAME" : { "type" : "text" },
"JOB_STATUS" : { "type" : "keyword" },
"time" : { "type" : "date" }
}}}`
the next step is to insert the data into that index:
curl -u elastic:changeme -X POST http://elasticsearch:9200/vfq-jenkins/_doc/?pretty
-H Content-Type: application/json -d '{
"BUILD_NUMBER":"83","BUILD_ID":"83","JOB_NAME":"OMS_LOG_ANA","JOB_STATUS":"SUCCESS" ,
"time" : "2019-09-08'T'12:39:00" }'

Resources