I have a query method in my ElasticsearchRepository and I've noticed that it sends two requests to my elastic search index. I can pull out the json used for the query in my debugger and I can see that the first query is sent with "size": 0 and "track_total_hits" : 2147483647 and the second query is run with "size": 103, no "track_total_hits".
Why is it doing this?
This happens when the repository method is supposed to return multiple objects and when no Pageable is set on the query (you can do that with a Pageable parameter of the method for example).
In this case, if no from or size parameters are set, Elasticsearch would only return the first 10 documents found for a query.
So this first request is a count request sent to get the number of documents that would be returned. Setting size to zero will make Elasticsearch to only return the metainformation like the number of found documents, but not the documents themselves; track_total_hits value makes sure, the number is not capped at 10.000 (Elasticsearch's default limit of data to return).
The second call you see is then the "real" call for data having set the size so that all documents are returned.
Related
I have a need to be able to edit multiple (10-20) noncontiguous rows in an Excel table via the Microsoft Graph API. My application receives a list of 10-20 strings as input. It then needs to be able to find the rows of data associated with those strings (they are all in the same column) and update each row (separate column) with different values. I am able to update the rows using individual PATCH requests that specify the specific row index to update, however, sending 10-20 separate HTTP requests is not sufficient due to performance reasons.
Here is what I have tried so far:
JSON batching. I created a JSON batch request where each request in the batch updates a row of data at a specific row index. However, only a few of the calls actually succeed while the rest of them fail due to being unable to acquire a lock to edit the Excel document. Using the dependsOn feature in JSON batching fixed the issue, but performance was hardly better than sending the update requests separately.
Concurrent PATCH requests. If I use multiple threads to make the PATCH requests concurrently I run into the same issue as above. A few of them succeed while the others fail as they can not acquire a lock to edit the Excel document.
Filtering/sorting the table in order to perform a range update on the specific rows currently visible. I was able to apply a table filter using the Microsoft Graph API, however, it appears that you can only define two criterion to filter on and I need to be able to filter the data on 10-20 different values. Thus it does not seem like I will be able to accomplish this using a range update since I cannot filter on enough values at the same time and the rows cannot be sorted in such a way that would leave them all in a contiguous block.
Is there any feature in the Microsoft Graph API I am not aware of that would enable me to do what I am proposing? Or any other idea/approach I am not thinking of? I would think that making bulk edits to noncontiguous rows in a range/table would be a common problem. I have searched through the API documentation/forums/etc. and cannot seem to find anything else that would help.
Any help/information in the right direction would be greatly appreciated!
After much trial and error I was able to solve my problem using filtering. I stumbled across this readme on filter apply: https://github.com/microsoftgraph/microsoft-graph-docs/blob/master/api-reference/v1.0/api/filter_apply.md which has an example request body of:
{
"criteria": {
"criterion1": "criterion1-value",
"criterion2": "criterion2-value",
"color": "color-value",
"operator": {
},
"icon": {
"set": "set-value",
"index": 99
},
"dynamicCriteria": "dynamicCriteria-value",
"values": {
},
"filterOn": "filterOn-value"
}
}
Although this didn't help me immediately, it got me thinking in the right direction. I was unable to find any more documentation about how the request format works but I started playing with the request body until finally I got something working. I changed "values" to an array of String and "filterOn" to "values". Now rather than being limited to criterion1 and criterion2 I can filter on whatever values I pass in the "values" array.
{
"criteria": {
"values": [
"1",
"2",
"3",
"4",
"5"
],
"filterOn": "values"
}
}
After applying the filter I retrieve the visibleView range, which I discovered here: https://developer.microsoft.com/en-us/excel/blogs/additions-to-excel-rest-api-on-microsoft-graph/, like this:
/workbook/tables('tableName')/range/visibleView?$select=values
Lastly, I perform a bulk edit on the visibleView range with a PATCH request like this:
/workbook/tables('tableName')/range/visibleView
and a request body with a "values" array that matches the number of columns/rows I am updating.
Unfortunately this simple task was made difficult by a lack of Microsoft Graph API documentation, but hopefully this information here is able to help someone else.
Is it possible to count all rows in a given entity, bypassing the 5000 row limit and bypassing the pagesize limit?
I do not want to return more than 5000 rows in one request, but only want the count of all the rows in that given entity.
According to Microsoft, you cannot do it in the request URI:
The count value does not represent the total number of entities in the system.
It is limited by the maximum number of entities that can be returned.
I have tried this:
GET [Organization URI]/api/data/v9.0/accounts/?$count=true
Any other way?
Use function RetrieveTotalRecordCount:
If you want to retrieve the total number of records for an entity beyond 5000, use the RetrieveTotalRecordCount Function.
Your query will look like this:
https://<your api url>/RetrieveTotalRecordCount(EntityNames=['accounts'])
Update:
Latest release v9.1 has the direct function to achieve this - RetrieveTotalRecordCount
————————————————————————————
Unfortunately we have to pick one of this route to identify the count of records based on expected result within the limits.
1. If less than 5000, use this: (You already tried this)
GET [Organization URI]/api/data/v9.0/accounts/?$count=true
2. Less than 50,000, use this:
GET [Organization URI]/api/data/v8.2/accounts?fetchXml=[URI-encoded FetchXML query]
Exceeding limit will get error: AggregateQueryRecordLimit exceeded. Cannot perform this operation.
Sample query:
<fetch version="1.0" mapping="logical" aggregate="true">
<entity name="account">
<attribute name="accountid" aggregate="count" alias="count" />
</entity>
</fetch>
Do a browser address bar test with URI:
[Organization URI]/api/data/v8.2/accounts?fetchXml=%3Cfetch%20version=%221.0%22%20mapping=%22logical%22%20aggregate=%22true%22%3E%3Centity%20name=%22account%22%3E%3Cattribute%20name=%22accountid%22%20aggregate=%22count%22%20alias=%22count%22%20/%3E%3C/entity%3E%3C/fetch%3E
The only way to get around this is to partition the dataset based on some property so that you get smaller subsets of records to aggregate individually.
Read more
3. The last resort is iterating through #odata.nextLink and counting the records in each page with a code variable (code example to query the next page)
The XrmToolBox has a counting tool that can help with this .
Also, we here at MetaTools Inc. have just released an online tool called AggX that runs aggregates on any number of records in a Dynamics 365 Online org, and it's free during the beta release.
You may try OData's $inlinecount query option.
Adding only $inlinecount=allpages in the querystring will return all records, so add $top=1 in the URI to fetch only one record along with count of all records.
You URL will look like /accounts/?$inlinecount=allpages&$top=1
For example, click here and the response XML will have the count as <m:count>11</m:count>
Note: This query option is only supported in OData version 2.0 and
above
This works:
[Organization URI]/api/data/v8.2/accounts?$count
As far as I can tell, I'm paging through API results as I should:
Make a request
Get a result back containing the 'totalResults' and the 'nextPage' token
Make the same request, adding the 'pageToken' parameter
Some issues I'm having:
If I make any request multiple times, I'll often get one of two different 'totalResults' values
If I page through and grab all the results for various queries, I'll get different numbers of items
Here's a set of queries followed by their 'nextPage' and 'totalResults' values:
https://www.googleapis.com/youtube/v3/search?key=~key~&part=id&channelId=UC8hzWaP0JfFJ8a1UjLIl1FA&maxResults=50&publishedAfter=2016-02-23T14%3A32%3A58.928-06%3A00&publishedBefore=2017-02-24T14%3A32%3A58.928-06%3A00&type=video
CDIQAA/239
https://www.googleapis.com/youtube/v3/search?key=~key~&part=id&channelId=UC8hzWaP0JfFJ8a1UjLIl1FA&maxResults=50&pageToken=CDIQAA&publishedAfter=2016-02-23T14%3A32%3A58.928-06%3A00&publishedBefore=2017-02-24T14%3A32%3A58.928-06%3A00&type=video
CGQQAA/188
https://www.googleapis.com/youtube/v3/search?key=~key~&part=id&channelId=UC8hzWaP0JfFJ8a1UjLIl1FA&maxResults=50&pageToken=CGQQAA&publishedAfter=2016-02-23T14%3A32%3A58.928-06%3A00&publishedBefore=2017-02-24T14%3A32%3A58.928-06%3A00&type=video
CJYBEAA/188
https://www.googleapis.com/youtube/v3/search?key=~key~&part=id&channelId=UC8hzWaP0JfFJ8a1UjLIl1FA&maxResults=50&pageToken=CJYBEAA&publishedAfter=2016-02-23T14%3A32%3A58.928-06%3A00&publishedBefore=2017-02-24T14%3A32%3A58.928-06%3A00&type=video
null/239
The first three queries contained 50 items, and the last contained 18, so I got 168 items total. This is really frustrating since I don't know if any of the three counts is the correct count.
Again, if I put any one query in my browser and hit 'refresh' over and over, I'll get either 188 or 239.
Through a script I can collect a sequence of videos that search list returns. The maxresults variable was set to 50. The total number items are big in number but the number of next page tokens are not enough to retrieve all the desired results. Is there any way to take all the returned items or it is YouTube restricted?
Thank you.
No, retrieving the results of a search is limited in size.
The total results that you are allowed to retrieve seems to have been reduced to 500 (in the past it was limited to 1000). The api does not allow you to retrieve more from a query. To try to get more, try using a number of queries with different parameters, like: publishedAfter, publishedBefore, order, type, videoCategoryId, or vary the query tags and keep track of getting different video id's returned.
See for a reference:
https://code.google.com/p/gdata-issues/issues/detail?id=4282
BTW. "totalResults" is an estimation and its value can change on the next page call.
See: YouTube API v3 totalResults field is returning 1 000 000 when it shoudn't
I am currently trying to pull data about videos from a YouTube user upload feed. This feed contains all of the videos uploaded by a certain user, and is accessed from the API by a request to:
http://gdata.youtube.com/feeds/api/users/USERNAME/uploads
Where USERNAME is the name of the YouTube user who owns the feed.
However, I have encountered problems when trying to access feeds which are longer than 1000 videos. Since each request to the API can return 50 items, I am iterating through the feed using max_length and start_index as follows:
http://gdata.youtube.com/feeds/api/users/USERNAME/uploads?start-index=1&max-results=50&orderby=published
http://gdata.youtube.com/feeds/api/users/USERNAME/uploads?start-index=51&max-results=50&orderby=published
And so on, incrementing start_index by 50 on each call. This works perfectly up until:
http://gdata.youtube.com/feeds/api/users/USERNAME/uploads?start-index=1001&max-results=50&orderby=published
At which point I receive a 400 error informing me that 'You cannot request beyond item 1000.' This confused me as I assumed that the query would have only returned 50 videos: 1001-1051 in the order of most recently published. Having looked through the documentation, I discovered this:
Limits on result counts and accessible results
...
For any given query, you will not be able to retrieve more than 1,000
results even if there are more than that. The API will return an error
if you try to retrieve greater than 1,000 results. Thus, the API will
return an error if you set the start-index query parameter to a value
of 1001 or greater. It will also return an error if the sum of the
start-index and max-results parameters is greater than 1,001.
For example, if you set the start-index parameter value to 1000, then
you must set the max-results parameter value to 1, and if you set the
start-index parameter value to 980, then you must set the max-results
parameter value to 21 or less.
I am at a loss about how to access a generic user's 1001st last uploaded video and beyond in a consistent fashion, since they cannot be indexed using only max-results and start-index. Does anyone have any useful suggestions for how to avoid this problem? I hope that I've outlined the difficulty clearly!
Getting all the videos for a given account is supported, but you need to make sure that your request for the uploads feed is going against the backend database and not the search index. Because you're including orderby=published in your request URL, you're going against the search index. Search index feeds are limited to 1000 entries.
Get rid of the orderby=published and you'll get the data you're looking for. The default ordering of the uploads feed is reverse-chronological anyway.
This is a particularly easy mistake to make, and we have a blog post up explaining it in more detail:
http://apiblog.youtube.com/2012/03/keeping-things-fresh.html
The nice thing is that this is something that will no longer be a problem in version 3 of the API.