I have articles data indexed to elastic as follows.
{
"id": 1011,
"title": "abcd",
"author": "author1"
"status": "published"
}
Now I wanted to get all the article id grouped by status.
Result should someway look like this
{
"published": [1011, 1012, ....],
"draft": [2011],
"deleted": [3011]
}
NB: I tried normal aggs (Article.search('*',aggs: [:status], load: false).aggs) , it just giving me the count of each items in, I want ids in each item instead
#Crazy Cat
You can transform you query in this way:
sort(Inc/Dec order) your response from ES over field "status".
Only Ask ES query to return only ID Field and status.
Now the usage of sorting would be it would sort your response to like this: [1st N results of "deleted" status, then N+1 to M results to "draft" and then M+1 to K results to "published"].
Now the advantages of this technique:
You will get flagged ids field of every document over which you can apply operations in you application.
Your query would be light weight as compared to Aggs query.
This way you would also get the metdata of every document ike docId of that document.
Now the Disadvantages:
You would always have to give a high upper bound of your page size, but You can also play around with count coming in the metadata.
Might take a bit more of network size as it returns redundant status in every document.
I Hope this redesign of your query might be helpful to you.
Related
I am trying to figure out how to correctly support returning the number of items in a (filtered) data set in my OData API.
My understanding is that adding the $count=true argument to the query string should allow for this.
Now, based on the example from the tutorial in the official docs, adding that parameter should cause the web API to return just an integer number:
The request below returns the total number of people in the collection.
GET serviceRoot/People?$count=true
Response Payload
20
On the other hand, this accepted and quite upvoted1 answer indicates that a query with $count=true will actually return an object, one of whose properties holds said integer number. It provides an exemplary query on a sample endpoint:
https://services.odata.org/V4/Northwind/Northwind.svc/Customers?$count=true&$top=0&$filter=Country eq 'Germany'
Indeed, the actual result from that endpoint is the complex object
{
"#odata.context": "https://services.odata.org/V4/Northwind/Northwind.svc/$metadata#Customers",
"#odata.count": 11,
"value": []
}
instead of the expected result of a single integer number
11
Why is this? Am I misunderstanding the documentation?
1: The answer had 25 upvotes at the time of writing.
The main issue is that the OData v4 specification is an evolving standard, as such many implementations handle some requests differently, either because the standard has changed or because the standard was hard to implement or the suggested behavior in the spec does not conform to the rest of the conventions.
Why is this? Am I misunderstanding the documentation?
So your main issue is that you were reading the wrong documentation for the API that you were querying. It is important to recognize that with each implementation of a standard it is up to the developer to choose how conformant to that standard they are, so you need to read the documentation that goes specifically with that API.
This is the specification in question for OData v4:
4.8 Addressing the Count of a Collection
To address the raw value of the number of items in a collection, clients append /$count to the resource path of the URL identifying the entity set or collection.
The /$count path suffix identifies the integer count of records in the collection and SHOULD NOT be combined with the system query options $top, $skip, $orderby, $expand, and $format. The count MUST NOT be affected by $top, $skip, $orderby, or $expand.
The count is calculated after applying any /$filter path segments, or $filter or $search system query options to the collection.
In the .Net implementation because $count is a result of a query, it needs to be evaluated as part of the query options pipeline, not as part of the path.
MS OData QueryOptions - Count
The $count system query option allows clients to request a count of the matching resources included with the resources in the response. The $count query option has a Boolean value of true or false.
Examples:
Return, along with the results, the total number of products in the collection
http://host/service/Products?$count=true
The count of related entities can be requested by specifying the $count query option within the $expand clause.
http://host/service/Categories?$expand=Products($count=true)
From an implementaion point of view mixing this query option into the path breaks the convention used for all other processing and url parsing, it really is the odd one out. path and query.
Regarding the object response
In the .Net implementation, because $count is supported in collection expansions as well as on the root (see the second example) they have chosen to inject the value as metadata/attributes mixed in with the results. In that way the response will still be valid for serialization purposes and the count behaviour is again consistent where ever it is used.
This last example I leave you with from one of my own APIs, demonstrating the attribute response for expanded collections, if $count=true didn't return the object graph, I would not be able to get to the counts of the expansions at all:
https://localhost/OData/Residents?$count=true&$expand=Entity($select=Id;$expand=Contacts($count=true;$top=0))&$select=id&$top=2
{
"#odata.context": "https://localhost/odata/$metadata#Residents(Id,Entity(Id,Contacts()))",
"#odata.count": 29,
"value": [
{
"Id": 13110,
"Entity": {
"Id": 13110,
"Contacts#odata.count": 6,
"Contacts": []
}
},
{
"Id": 13164,
"Entity": {
"Id": 13164,
"Contacts#odata.count": 6,
"Contacts": []
}
}
],
"#odata.nextLink": "localhost/OData/Residents?$expand=Entity%28%24select%3DId%3B%24expand%3DContacts%28%24count%3Dtrue%3B%24top%3D0%29%29&$select=id&$top=2&$skip=2"
}
Using the graph explorer, I'm trying to limit (time box) the number of entries being returned. This is so I can extract the data from Azure to upload into our SIEM portal. I am getting the data back (10's of thousands of datapoints) - but I need to time box them.
This works as a query (both in graph explorer and from powershell) - but the results are not in the time frame requested. I've tried different time formats (including down to the second) and they don't limit the results.
It seems like it isn't going deeper into the data structure for the filter to operate on.
Any suggestions on the filter or a different approach (without accepting all the data each query and doing a post-results filter)?
Note: I also tried the activityDateTime prefix with value/ and value\ (reading from a different article I found) - so value/activityDateTime and value\activityDateTime - no different results (no errors either)
This is the 'get' from graph explorer (beta selected)
https://graph.microsoft.com/beta/auditLogs/directoryAudits?=activityDateTime ge 2018-07-16T15:48:00 and activityDateTime lt 2018-07-16T15:58:00
returned this (only partially results, guid/hex strings were removed) - you'll notice that the activityDateTime returned below is not >= and < the date/time passed in the query
{
"#odata.context": "https://graph.microsoft.com/beta/$metadata#auditLogs/directoryAudits",
"#odata.nextLink": "https://graph.microsoft.com/beta/auditLogs/directoryAudits?=value%2factivityDateTime+ge+2018-07-16T15%3a48%3a00+and+value%2factivityDateTime+lt+2018-07-16T15%3a58%3a00&$skiptoken=[hex string removed]_1000",
"value": [
{
"id": "Directory_[hex string removed]",
"category": "Core Directory",
"correlationId": "[GUID removed]",
"result": "success",
"resultReason": "",
"activityDisplayName": "Update group",
**"activityDateTime": "2018-07-18T14:30:44.6046176Z"**,
"loggedByService": "AzureAD",
"initiatedBy": {
"user": null,
"app": null
},
"targetResources": [
{
"#odata.type": "#microsoft.graph.targetResourceGroup",
[rest of data returned 1000 total removed]
You need to specify the parameter name. Otherwise, the API has no way of knowing what operation you want (select, orderby, filter). In this case, you want to $filter like this: $filter=activityDateTime ge 2018-07-16T15:48:00Z and activityDateTime lt 2018-07-16T15:58:00Z.
https://graph.microsoft.com/beta/auditLogs/directoryAudits?$filter=activityDateTime ge 2018-07-16T15:48:00Z and activityDateTime lt 2018-07-16T15:58:00Z
In a question on CouchDB I asked previously (Can you implement document joins using CouchDB 2.0 'Mango'?), the answer mentioned creating domain objects instead of storing relational data in Couch.
My use case, however, is not necessarily to store relational data in Couch but to flatten relational data. For example, I have the entity of Invoice that I collect from several suppliers. So I have two different schemas for that entity.
So I might end up with 2 docs in Couch that look like this:
{
"type": "Invoice",
"subType": "supplier B",
"total": 22.5,
"date": "10 Jan 2017",
"customerName": "me"
}
{
"type": "Invoice",
"subType": "supplier A",
"InvoiceTotal": 10.2,
"OrderDate": <some other date format>,
"customerName": "me"
}
I also have a doc like this:
{
"type": "Customer",
"name": "me",
"details": "etc..."
}
My intention then is to 'flatten' the Invoice entities, and then join on the reduce function. So, the map function looks like this:
function(doc) {
switch(doc.type) {
case 'Customer':
emit(doc.customerName, { doc information ..., type: "Customer" });
break;
case 'Invoice':
switch (doc.subType) {
case 'supplier B':
emit (doc.customerName, { total: doc.total, date: doc.date, type: "Invoice"});
break;
case 'supplier A':
emit (doc.customerName, { total: doc.InvoiceTotal, date: doc.OrderDate, type: "Invoice"});
break;
}
break;
}
}
Then I would use the reduce function to compare docs with the same customerName (i.e. a join).
Is this advisable using CouchDB? If not, why?
First of all apologizes for getting back to you late, I thought I'd look at it directly but I haven't been on SO since we exchanged the other day.
Reduce functions should only be used to reduce scalar values, not to aggregate data. So you wouldn't use them to achieve things such as doing joins, or removing duplicates, but you would for example use them to compute the number of invoices per customer - you see the idea. The reason is you can only make weak assumptions with regards to the calls made to your reduce functions (order in which records are passed, rereduce parameter, etc...) so you can easily end up with serious performance problems.
But this is by design since the intended usage of reduce functions is to reduce scalar values. An easy way to think about it is to say that no filtering should ever happen in a reduce function, filtering and things such as checking keys should be done in map.
If you just want to compare docs with the same customer name you do not need a reduce function at all, you can query your view the following parameters:
startkey=["customerName"]
endkey=["customerName", {}]
Otherwise you may want to create a separate view to filter on customers first, and return their names and then use these names to query your view in a bulk manner using the keys view parameter. Startkey/endkey is good if you only want to filter one customer at a time, and/or need to match complex keys in a partial way.
If what you are after are the numbers, you may want to do :
if(doc.type == "Invoice") {
emit([doc.customerName, doc.supplierName, doc.date], doc.amount)
}
And then use the _stats built-in reduce function to get statistics on the amount (sum, min, max,)
So that to get the amount spent with a supplier, you'd just need to make a reduce query to your view, and use the parameter group_level=2 to aggregate by the first 2 elements of the key. You can combine this with startkey and endkey to filter specific values of this key :
startkey=["name1", "supplierA"]
endkey=["name1", "supplierA", {}]
You can then build from this example to do things such as :
if(doc.type == "Invoice") {
emit(["BY_DATE", doc.customerName, doc.date], doc.amount);
emit(["BY_SUPPLIER", doc.customerName, doc.supplierName], doc.amount);
emit(["BY_SUPPLIER_AND_DATE", doc.customerName, doc.supplierName, doc.date], doc.amount);
}
Hope this helps
It is totally ok to "normalize" your different schemas (or subTypes) via a view. You cannot create views based on those normalized schemas, though, and on the long run, it might be hard to manage different schemas.
The better solution might be to normalize the documents before writing them to CouchDB. If you still need the documents in their original schema, you can add a sub-property original where you store your documents in their original form. This would make working on data much easier:
{
"type": "Invoice",
"total": 22.5,
"date": "2017-01-10T00:00:00.000Z",
"customerName": "me",
"original": {
"supplier": "supplier B",
"total": 22.5,
"date": "10 Jan 2017",
"customerName": "me"
}
},
{
"type": "Invoice",
"total": 10.2,
"date": "2017-01-12T00:00:00:00.000Z,
"customerName": "me",
"original": {
"subType": "supplier A",
"InvoiceTotal": 10.2,
"OrderDate": <some other date format>,
"customerName": "me"
}
}
I d' also convert the date to ISO format because it parses well with new Date(), sorts correctly and is human-readable. You can easily emit invoices grouped by year, month, day and whatever with that.
Use reduce preferably only with built-in functions, because reduces have to be re-executed on queries, and executing JavaScript on many documents is a complex and time-intensive operation, even if the database has not changed at all. You find more information about the reduce process in the CouchDB process. It makes more sense to preprocess the documents as much as you can before storing them in CouchDB.
I want to know the total number of active listings by shop id. Is there any such API available ?
I could find the API which returns paginated results for all the listings in a shop.
'/shops/:shop_id/listings/active'
I cannot give a limit of over 100 in this API and for fetching total count of all listings, I will have to make a lot of requests is the listings are lets say several thousands. A simple API endpoint that can get the count of total active listings would be really helpful
It's included in the standard response:
{
"count":integer,
"results": [
{ result object }
],
"params": { parameters },
"type":result type
}
Docs can be found here: https://www.etsy.com/developers/documentation/getting_started/api_basics#section_standard_response_format
Got it.
The response contains a count field which gives the exact count of the active listings.
100 is the highest limit you can set—you will need to use the "page" parameter to move to the next 100 and so on.
I have merged two Fusion Tables together on the key "PID". Now I would like to do a SELECT query WHERE PID = "value'. The error comes back that no column with the name PID exists in the table. A query for another column gives this result:
"kind": "fusiontables#sqlresponse",
"columns": [
"\ufeffPID",
"Address",
"City",
"Zoning"
],
"rows": [
[
"001-374-079",
"# LOT 15 MYSTERY BEACH RD",
"No_City_Value",
"R-1"
],
It appears that the column name has been changed from "PID" to "\ufeffPID", which no matter how many attempts to get the syntax to read a GET Url, I keep getting an error.
Is there any limitation with querying on the key of a merged table? Since I cannot seem to get the name correct for the column a work around would be to use the Column ID but that does not seem to be an option either. Here is the URL:
https://www.googleapis.com/fusiontables/v1/query?sql=SELECT 'PID','Address','City','Zoning' FROM 1JanYNl3T45kFFxqAmGS0BRgkopj4AS207qnLVQI WHERE '\ufeffPID' = 001-493-078&key=myKey
Cheers
I have no explanation for \ufeff in there; that's the Unicode character 'ZERO WIDTH NO-BREAK SPACE', so it's conceivable that it's actually there in the column name because it would be invisible in the UI. So, first off I would recommend changing the name in the base tables and see if that works.
Column IDs for merge tables have a different form than for base tables. An easy way to get them is to add the filters of interest to one of your tabs (any type will do) and then do Tools > Publish. The top text ("Send a link in email or IM") has a query URL that has what you need. Run it through a URL decoder such as http://meyerweb.com/eric/tools/dencoder/ and you'll see the column ID for PID is col0>>0.
Rod