Okay, lets say I have a YouTube playlist with 500 items in them. YouTube's PlaylistItems end-point only allows you to retrieve 50 items at a time:
https://developers.google.com/youtube/v3/docs/playlistItems/list
After 50 items, it gives you a nextPageToken which you can use to specify in your query to get the next page. Doing this, you could iterate through the entire playlist to get all 500 items in 10 queries.
However, what if I only wanted to get the last page? Page 10?
In YouTube's V2 API, you could have told it to start the index at position 451, and then it would give you the results for 451-500. This doesn't seem to be an option in their V3 API. Now, it seems if I wanted to get just page 10, I would have to iterate through the entire playlist once again, throw out the first 9 pages, and then just take the 10th page.
This seems like a HUGE waste of resources and the cURL operations alone could be a killer.
So is it possible to set the starting index in the V3 API like in the V2 API?
You can still use a start index but you have to generate the corresponding page token yourself.
As far as you can tell from observation, page tokens are basically a byte sequence encoded in base64, with the first byte always being 8, and the last two being 16, 0. We generate tokens somewhat like this (using python 3):
i = 451
k = i // 128
i -= 128 * (k - 1)
b = [8, index]
if k > 1 or i > 127: b += [k]
b += [16, 0]
t = base64.b64encode(bytes(f)).decode('utf8').strip('=')
The hindmost operation removes the trailing '=' characters that are used to fill incomplete blocks in base64. The result ('CMMDEAA') is your page token.
Related
I think I've found a bug with the date filtering on the delta API.
I'm finding on one of the email accounts I'm working with using Office 365 Graph API that the "messages" graph API delta request is returning a different number of items than are actually in a folder for the expected time range. There are 150,000 items covering 10 years in the folder but delta only returns the last 5,000-ish items covering the last 60 or so days.
Paging Works Fine
When querying the graph API for the folder "Inbox" it has 154,045 total items and 57456 unread items.
IUserMailFoldersCollectionPage foldersPage =
await client.Users[mailboxid].MailFolders.Request().GetAsync();
I can skip over 10,000, 50,000 or more messages using paging.
model.messages = await client.Users[mailboxid].MailFolders[folderid].Messages.Request().Top(top)
.Skip(skip).GetAsync();
Delta with Date Filter doesn't work
But when looping with nextToken and deltaTokens, the deltaToken appears after 5000 or so email messages. Basically it seems like it's only returning results for the last couple months even though the filter is saying find messages for the last 20 years.
Here is the example for how we generate the Delta request. The time is hardcoded here but in reality it is a variable.
var sFilter = $"receivedDateTime ge {DateTimeOffset.UtcNow.AddYears(-20).ToString("yyyy-MM-dd")}";
model.messages = await client.Users[mailboxid].MailFolders[folderid].Messages.Delta().Request()
.Header("Prefer", "odata.maxpagesize=" + maxpagesize)
.Filter(sFilter)
.OrderBy("receivedDateTime desc")
.GetAsync();
And then on each paging operation I do the following. "nexttoken" is either the next or delta link depending on what came back from the first request.
model.messages = new MessageDeltaCollectionPage();
model.messages.InitializeNextPageRequest(client, nexttoken);
model.messages = await model.messages.NextPageRequest
.Header("Prefer", "odata.maxpagesize=" + maxpagesize)
.GetAsync();
Delta without Filter works
If I do the exact same code for delta above but remove the "Filter" operation on date, then I get all the messages in the folder.
This isn't a great solution since I normally only need messages for the last year or 2 years and if there are 15 years of messages it is a huge waste to query everything.
Update on 12/3/2019
I'm still getting this issue. I recently switched back to trying to use Delta again whereas before I was querying everything from the server even though I might only need the last month of data. But that's super wasteful.
This code works fine for most mailboxes but sometimes I encounter a mailbox with this issue.
My code looks like this.
string sStartingTime = startingTime.ToString("yyyy'-'MM'-'dd'T'HH':'mm':'ss") + "Z";
var messageCollectionPage = await client.Users[mailboxsource.GetMailboxIdFromAccountID()].MailFolders[folder.Id].Messages.Delta().Request()
.Filter("receivedDateTime+ge+" + Uri.EscapeDataString(sStartingTime))
.Select(select)
.Header("Prefer", "odata.maxpagesize=" + preferredPageSize)
.OrderBy("receivedDateTime desc")
.GetAsync(cancellationToken);
At around 5000 results the Delta request just stops returning results even though there are 66K items in the folder.
Paul, my peers confirmed there is indeed a 5000-item limit if you apply $filter to a delta query of the message resource.
Within the next day, the docs will also be updated with this information. Thank you for your patience and support!
I am querying a fusion table using a request like this:
https://www.googleapis.com/fusiontables/v2/query?alt=media&sql=SELECT ROWID,State,County,Name,Location,Geometry,[... many more ...] FROM <table ID>
The results from this query exceed 10MB, so I must include the alt=media option (for smaller queries, where I can remove this option, the problem does not exist). The response is in csv, as promised by the documentation. The first line of the response appears to be a header line which exactly matches my query string (except that it shows rowid instead of ROWID):
rowid,State,County,Name,Location,Geometry,[... many more ...]
The following rows however do not include the row id. Each line begins with the second item requested in the query - it seems as though the row id was ignored:
WV,Calhoun County,"Calhoun County, WV, USA",38.858 -81.1196,"<Polygon><outerBoundaryIs>...
Is there any way around this? How can I retrieve row ids from a table when the table is large?
Missing ROWIDs are also present for "media" requests made via Google's Python API Client library, e.g.
def doGetQuery(query):
request = FusionTables.query().sqlGet_media(sql = query);
response = request.execute();
ft = {'kind': "fusiontables#sqlresponse"};
s = queryResult.decode(); # bytestring to string
data = [];
for i in s.splitlines():
data.append(i.split(','));
ft['columns'] = data.pop(0); # ['Rowid', 'A', 'B', 'C']
ft['rows'] = data; # [['a1', 'b1', 'c1'], ...]
return ft;
You may at least have one fewer issue than I - this sqlGet_media method can only be made with a "pure" GET request - a query long enough (2 - 8k characters) to get sent as overridden POST generates a 502 Bad Gateway error, even for tiny response sizes such as the result from SELECT COUNT(). The same query as a non-media request works flawlessly (provided the response size is not over 10 MB, of course).
The solution to both your and my issue is to batch the request using OFFSET and LIMIT, such that the 10 MB response limit isn't hit. Estimate the size of an average response row at the call site, pass it into a wrapper function, and let the wrapper handle adding OFFSET and LIMIT to your input SQL, and then collate the multi-query result into a single output of the desired format:
def doGetQuery(query, rowSize = 1.):
limitValue = floor(9.5 * 1024 / rowSize) # rowSize in kB
offsetValue = 0;
ft = {'kind': "fusiontables#sqlresponse"};
data = [];
done = False;
while not done:
tail = ' '.join(['OFFSET', str(offsetValue), 'LIMIT', str(limitValue)]);
request = FusionTables.query().sqlGet(sql = query + ' ' + tail);
response = request.execute();
offsetValue += limitValue;
if 'rows' in response.keys():
data.extend(response['rows']);
# Check the exit condition.
if 'rows' not in response.keys() or len(response['rows']) < limitValue:
done = True;
if 'columns' not in ft.keys() and 'columns' in response.keys():
ft['columns'] = response['columns'];
ft['rows'] = data;
return ft;
This wrapper can be extended to handle actual desired uses of offset and limit. Ideally for FusionTable or other REST API methods they provide a list() and list_next() method for native pagination, but no such features are present for FusionTable::Query. Given the horrendously slow rate of FusionTable API / functionality updates, I wouldn't expect ROWIDs to magically appear in alt=media downloads, or for GET-turned-POST media-format queries to ever work, so writing your own wrapper is going to save you a lot of headache.
I have problem with Enhanced E-commerce Measurement Protocol ( docs ). Sometimes client is buying about 100 different products in one transaction. It exceedes the size of 8192 bytes ( reference ) and request doesn't go through.
I tried to split that into small packs with:
transaction details + one item with index 1 (all items as pr1id)
I tried also to split that with increment index:
transaction details + one item with incrementing index (for ex. first I send transaction + pr1id, then transaction + pr2id etc)
I always end up with only one item in google analytics. Is there any way to split it in working and correct way? I couldn't find solution in google or docs.
I am following YQL sample query
select * from local.search(500) where query="sushi" and location="san francisco, ca"
but I get 260 max count instead of 500. I tried also to use limit 500 after 'where' and different keywords, always get maximum 260 results. How do you increase it?
The underlying API that the local.search table uses (Yahoo! Local Search Web Service) has restrictions on the number of results returned.
The results parameter (the number of results "per page") has a maximum value of 20.
The start parameter (the offset at which to start) has a maximum value of 250.
Since you ask for the first 500 results, YQL makes multiple queries against the Local Search API returning 20 results at a time. Therefore the start values are 1, 21, 41, ... 241. This brings back 260 results, as you have seen.
Since the YQL query asks for more results, the next start value is tried (261) which is beyond the allowed range so the underlying service returns an error (with the message "invalid value: start (261) must be between 1 and 250"). If you turn on "diagnostics" in the YQL console, you will see the "Bad Request" being returned.
Nothing you do to the query will bring back more results than the underlying service allows.
I figured out, I was missing paging number, so 0++ will work
select * from local.search(0,500) where query="sushi" and location="san francisco, ca"
I am trying to get the count of the items in a sharepoint document library programatically. The scale I am working with is 30-70000 items. We have usercontrol in a smartpart to display the count . Ours is a TEAM site.
This is the code to get the total count:
SPList VoulnterrList = web.Lists[ListTitle];
SPQuery query = new SPQuery();
query.ViewAttributes = "Scope=\"Recursive\"";
string queries = "<Where><Eq><FieldRef Name='ApprovalStatus' /><Value Type='Choice'>Pending</Value></Eq></Where>";
query.Query = queries;
SPListItemCollection lstitemcollAssoID = VoulnterrList.GetItems(query);
lblCount.Text = "Total Proofs: " + VoulnterrList.Items.Count.ToString() + " Pending Proofs: " + lstitemcollAssoID.Count.ToString();
The problem is this has serious performance issue it takes 75 to 80 sec to load the page. if we comment this page load will decrees to 4 sec. Any better approch for this problem
Ours is sharepoint 2007
Use VoulnterrList.ItemCount instead of VoulnterrList.Items.Count.
When List.Items is used, all items in the list are loaded from the content database. Since we don't actually need the items to get the count this is wasted overhead.
This will fix performance at line 8, but you may still have issues at line 9 depending on the number of results returned by the query.
You can do two optimizations here:
Create an index on one of the column of your list
Use that column in <ViewFields> section of your CAML query so that only that indexed column is retrieved.
This should speed up. See this article on how to create index on column:
http://sharepoint.microsoft.com/Blogs/GetThePoint/Lists/Posts/Post.aspx?ID=162