not_analyzed field with doc_values still in fielddata cache - mapping

During some experiment with fielddata vs doc_values, I encountered a weird case. In my earlier mapping, I didn't use doc values at all. In my new mapping, I've added doc_values: true to all fields in my mapping, except analyzed string fields and booleans (not supported until 2.0).
So in details, here is how I proceeded:
Before reindexing all my data, I restarted my ES 1.7 cluster fresh and ran a query with sorting, aggregations and script fields to "warm up" the fielddata cache. Then I queried the /fielddata endpoint to have an idea of the fielddata cache usage. It looked something like this:
curl -XGET 'localhost:9200/_cat/fielddata?v&fields=*'
id host ip node total items.desc.raw more_fields...
rKX7... myhost 192.168.1.100 Doom 32.9mb 2.3mb ...
As you can see, the field items.desc.raw used 2.3mb of heap space. items is of type nested and contains a string multi-field with a not_analyzed sub-field called raw. In short, the mapping of that nested field looks like this:
"items": {
"type": "nested",
"properties": {
"desc": {
"type": "string",
"fields": {
"raw": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
After adding doc_values: true to items.desc.raw, reindexing the whole index and running some aggregations, sorting and scripting again to warm up the fielddata cache, I queried the /fielddata endpoint again and here was the result:
curl -XGET 'localhost:9200/_cat/fielddata?v&fields=*'
id host ip node total items.desc.raw some_bools...
tAB5... myhost 192.168.1.100 Yack 2.1mb 9.2kb ...
So the fielddata usage has indeed been drastically lowered (which is good), the only fields I see are boolean fields (i.e. some_bools above) which was expected, but to my surprise, my nested not_analyzed string field also appeared, but with a much lower space usage.
What could be the cause of items.desc.raw still appearing in the fielddata cache?

Somehow I forgot about global ordinals. They are the reason why I'm still getting fielddata usage even after using doc_values as global ordinals cannot be included in doc_values.
See more details here

Related

Istio EnvoyFilter - Wasm - Classifying Metrics Based on Request or Response

I am trying to insert a custom dimension for an istio metric for URL path.
I am following the steps here -
https://istio.io/latest/docs/tasks/observability/metrics/classify-metrics/
Specifically, this part, where I can parse the URL and decide the value :
configuration:
"#type": type.googleapis.com/google.protobuf.StringValue
value: |
{
"attributes": [
{
"output_attribute": "istio_operationId",
"match": [
{
"value": "ListReviews",
"condition": "request.url_path == '/reviews' && request.method == 'GET'"
},
{
"value": "GetReview",
"condition": "request.url_path.matches('^/reviews/[[:alnum:]]*$') && request.method == 'GET'"
},
{
"value": "CreateReview",
"condition": "request.url_path == '/reviews/' && request.method == 'POST'"
}
]
}
]
}
I want to add a fall-back approach - i.e., if it doesn't match any URL, then make the value of istio_operationId as the original request.url_path
How can I do that?
I tried adding this as the last condition but it doesn't work
{
"value": "request.url_path", //tried without quotes as well
"condition": "request.url_path.matches('.+$')"
}
Also, is this possible in Lua?
To set a fall back value leave the condition blank. From the docs "An empty condition evaluates to true and should be used to provide a default value.".
That said, I don't think there's a way to set the attribute value to anything but a static string. What you can do instead is add the url_path as a dimension to one (or all) metrics generated by the stats filter. To say it another way, I don't think you can combine request classification with custom dimensions the way you're describing.
See this blog post for details as well as an explanation of the difference between attributes and dimensions.
Also, you may want to reconsider trying. When metrics are emitted with unbounded cardinality monitoring systems fall over. Here is a description of an Istio user crashing their Prometheus instance this way. Here is the (then) co-lead of the Istio extensions and telemetry working group discussing why they do not recommend doing this.

/beta MS Graph /sites/lists lastModifiedDateTime is wrong?

Using MS Graph API /beta endpoint to figure out if the list has been updated/changed.
Used the following query first:
https://graph.microsoft.com/beta/sites/xxxxx.sharepoint.com:/sites/xxxxx?$expand=lists(select=id, name, system, lastModifiedDateTime)
And did get the following date:
"lastModifiedDateTime": "2018-10-08T10:23:37Z",
But when going against the items and see the dates on the latest item with the following query:
https://graph.microsoft.com/beta/sites/xxxxx.sharepoint.com,92af4fbc-04bc-46d8-9c78-f63832fbf48a,1b59d85a-41bd-4498-a64c-17bd13069d90/lists/b9c39323-076a-4ae7-942b-1d0060a6b352/items
you can see the dates:
"createdDateTime": "2018-10-08T10:23:37Z"
"lastModifiedDateTime": "2018-10-08T10:29:14Z",
You can see that lastModifiedDateTime property on the list looks like actually lastCreatedDateTime?
Best Regards,
Kim
edit:
First graph request gets the SitePages list and its lastModifiedDateTime:
{
"id": "b9c39323-076a-4ae7-942b-1d0060a6b352",
"lastModifiedDateTime": "2018-10-08T10:23:37Z",
"name": "SitePages",
"system": {}
},
But if we then look at the items of the list, we can see that it has an item with a higher lastModifiedDateTime (second graph request):
"createdDateTime": "2018-10-08T10:23:37Z",
"eTag": "\"27e03a98-9321-4586-8ef1-0b5323c26730,6\"",
"id": "8",
"lastModifiedDateTime": "2018-10-08T10:29:14Z",
We can also see that the createdDateTime of the listitem is the same as the list lastModifiedDateTime. Looks like a bug in the api to mee. The date in the first request should be "2018-10-08T10:29:14Z". Dont you agree?
As your description, I assume you want to know why the LastModifyDateTime is different.
Base on my test, your first link is to get the lastModifyDateTime of the special site,
but your second link is about the items of the b9c39....
We can use the MS Graph Explore to check whether this two site has some differences.
Indeed, it appears to be a bug, for List resource lastModifiedDateTime property returns invalid value, it seems to be mapped to the last list item date and time when the item was created (ListItem.createdDateTime)
It could also be confirmed as a bug using the following endpoints (in both examples a valid lastModifiedDateTime value is returned):
https://graph.microsoft.com/beta/sites/{site-id}/lists/{list-id}/drive/root?select=lastModifiedDateTime
https://tenant.sharepoint.com/_api/web/lists/getbyid({list-id})?$select=LastItemModifiedDate
Meanwhile as a workaround the following solution could be considered to enumerate site lists:
https://graph.microsoft.com/beta/sites/{site-id}/drives?expand=root(select=lastModifiedDateTime)
where root/lastModifiedDateTime returns a valid value
Limitation: Only returns document libraries

BitBucket 1.0 REST API Retrieve all Pull-Requests for repository

When I curl the rest api, I get back an empty response but I know that there are pull-requests open.
What is the setting in bitbucket stash that allows anyone to view/read pull-requests without being authenticated?
curl -X GET https://bitbucket/rest/api/1.0/projects/{project}/repos/{repo}/pull-requests
response:
{
"size": 0,
"limit": 25,
"isLastPage": true,
"values": [],
"start": 0
}
Try
curl -X GET https://bitbucket/rest/api/1.0/projects/{project}/repos/{repo}/pull-requests?state=ALL
You can find more options for this specific API call at https://developer.atlassian.com/static/rest/bitbucket-server/latest/bitbucket-rest.html#idm140236731714560
This worked for me:
curl -D- -u user:password -X GET -H "Content-Type: application/json" -X GET https://bitbucket.com/rest/api/1.0/projects/ONEP/repos/oneplanner/pull-requests?state=OPEN
To check status to particular PR:
curl -X GET https://bitbucket/rest/api/1.0/projects/{project}/repos/{repo}/pull-requests/{pr-id}
DOC https://docs.atlassian.com/bitbucket-server/rest/5.16.0/bitbucket-rest.html#idm8287391664
Paged APIs
Bitbucket uses paging to conserve server resources and limit response size for resources that return potentially large collections of items. A request to a paged API will result in a values array wrapped in a JSON object with some paging metadata, like this:
{
"size": 3,
"limit": 3,
"isLastPage": false,
"values": [
{ /* result 0 */ },
{ /* result 1 */ },
{ /* result 2 */ }
],
"start": 0,
"filter": null,
"nextPageStart": 3
}
Clients can use the limit and start query parameters to retrieve the desired number of results.
The limit parameter indicates how many results to return per page. Most APIs default to returning 25 if the limit is left unspecified. This number can be increased, but note that a resource-specific hard limit will apply. These hard limits can be configured by server administrators, so it's always best practice to check the limit attribute on the response to see what limit has been applied. The request to get a larger page should look like this:
http://host:port/context/rest/api-name/api-version/path/to/resource?limit={desired size of page}
For example:
https://stash.atlassian.com/rest/api/1.0/projects/JIRA/repos/jira/commits?limit=1000
The start parameter indicates which item should be used as the first item in the page of results. All paged responses contain an isLastPage attribute indicating whether another page of items exists.
Important: If more than one page exists (i.e. the response contains "isLastPage": false), the response object will also contain a nextPageStart attribute which must be used by the client as the start parameter on the next request. Identifiers of adjacent objects in a page may not be contiguous, so the start of the next page is not necessarily the start of the last page plus the last page's size. A client should always use nextPageStart to avoid unexpected results from a paged API. The request to get a subsequent page should look like this:
http://host:port/context/rest/api-name/api-version/path/to/resource?start={nextPageStart from previous response}
For example:
https://stash.atlassian.com/rest/api/1.0/projects/JIRA/repos/jira/commits?start=25

How to deal with GeoJson in CKAN DataStore?

Is it true CKAN DataStore is able to deal with GeoJson? I've not seen any reference in the documentation except for this link about the DataStore Map visualization, saying:
Shows data stored on the DataStore in an interactive map. It supports plotting markers from a pair of latitude / longitude fields or from a field containing a GeoJSON representation of the geometries.
Thus, I'm supossing GeoJson is accepted in DataStore columns. Anyway, I've not found any GeoJson CKAN type, thus, again, I'm guessing the simple Json type must be use for this purpose.
Can anybody confirm this? Thanks!
EDIT 1
I've created a resource and a datastore and a "recline_map_view" associated to the resource. Then, I've upserted a value, which is shown by this datastore_search operation:
$ curl -X POST "https://host:port/api/3/action/datastore_search" -d '{"resource_id":"14418d40-de42-4fdd-84f7-3c51244c7469"}' -H "Authorization: xxx" -k
{"help": "https://host:port/api/3/action/help_show?name=datastore_search", "success": true, "result": {"resource_id": "14418d40-de42-4fdd-84f7-3c51244c7469", "fields": [{"type": "int4", "id": "_id"}, {"type": "text", "id": "label"}, {"type": "json", "id": "geojson"}], "records": [{"_id": 1, "geojson": {"type": "Point", "coordinates": [48.856699999999996, 2.3508]}, "label": "Paris"}], "_links": {"start": "/api/3/action/datastore_search", "next": "/api/3/action/datastore_search?offset=100"}, "total": 1}}
Nevertheless, nothing is shown in CKAN :(
EDIT 2
It was a problem with my CKAN. I've tested Ifurini's solution at demo.ckan.org and it works.
GeoJSON is just a (particular kind of) JSON, so it does not have a particular treatment as a database field.
So, you can create a resource with a GeoJSON field from a simple CSV file like this:
Name,Position
"Paris","{""type"":""Point"",""coordinates"":[2.3508,48.8567]}"
(note the double double quotes "" instead of just a single double quote ")
If you call the column "GeoJSON" (or "geojson", "gEoJsOn", etc., as capitalization is not important) the Map View will automatically use that field to mark the data in the map, instead of just letting you manually select which field to use.

Unable to create course via api

I am trying to create a course in a semester through the api in valence d2l. I keep getting a 404 not found error, both in my program and in the "getting started" application. The call I am making is to /d2l/api/lp/1.0/courses/ using post. I pass the following JSON object along with it:
{
"Name": "COMM291 - Test A",
"Code": "C-COMM291",
"Path": "/enforced/C-COMM291/",
"CourseTemplateId": 20992,
"SemesterId": 20993,
"StartDate": "2013-08-22T19:41:14.0983532Z",
"EndDate": "2013-08-27T19:41:14.0993532Z",
"LocaleId": 4105,
"ForceLocale": false,
"ShowAddressBook": false
}
I have also tried passing null for the fields that say they accept null values, but no luck. The course template and the semester ID are correct - I have tripled checked that they exist, I am enrolled in them and I am using the correct ID numbers.
Try reducing the precision in your start and end dates to three decimals after the final point (e.g., "2013-08-22T19:41:14.0983532Z" becomes "2013-08-22T19:41:14.098Z").
If your org is configured to automatically enforce, and generate, paths for course offerings, then you should not provide one in your CreateCourseOffering block at all. The following structure works on our test instance: notice the empty string for path (shouldn't be null, but an empty string, I believe):
{ "Name": "Extensibility 104",
"Code": "EXT-104",
"Path": "",
"CourseTemplateId": 8082,
"SemesterId": 6984,
"StartDate": "2013-09-01T19:41:14.098Z",
"EndDate": "2013-12-27T19:41:14.098Z",
"LocaleId": 1,
"ForceLocale": false,
"ShowAddressBook": false }
The other thing to note is that if your CreateCourse form doesn't have a form element to provide a Semester ID, then your API call should pass null for that property.
I found that part of my problem was with the call if I change it to /d2l/api/lp/1.3/courses/ instead of 1.0 it works, (1.0 will work but it seems that you can only pass null for the semester).
The dates were also picky and did prefer milliseconds to only 3 decimal places.
Then passing null for LocaleId also helped.

Resources