Varying search results for multiple pages YouTube data API v3 - youtube

I have been using the YouTube Data API v3 to make search requests to get basic details of all of the videos on a particular channel. The channel I am searching contains a few hundred videos so I make search requests in batches of 50 using the nextPageToken, an example of which is shown below:-
https://www.googleapis.com/youtube/v3/search?part=snippet&maxResults=50&channelId=<channelid>&key=<key>&pageToken=<nextPageToken>
This appears to work fine until the penultimate and final searches, where I begin to see varying results. Some of the videos in the results don't have an Id associated with them, which seems a little strange? However, more interesting is the fact that the number of videos which are missing an id varies if you make the same request repeatedly.
An example of the sorts of results I am getting (based upon 100 same searches):-
Penultimate page (100 requests): 60 results of 41 videos with an id and 9 without an id, 38 results of 37 videos with an id and 13 without an id and 2 results of 38 videos with an id and 12 without an id.
Final page (100 requests): 64 results of 25 videos all with an id and 36 results of 24 videos with an id and 1 without an id.
Clearly this will lead to an inconsistent total number of videos returned as I require there to be an Id (which is how I noticed this was occurring in the first place).
I am testing these results in a unit test currently to keep things isolated.
Is there something I'm missing here or is there just a bug in the API?
UPDATE
Added the parameter "type=video" as suggested in the comments. This seems to limit the issue to the last page of the search as noted, but the issue still persists.

Related

Why do I get inconsistent results consuming the v3 YouTube API?

As far as I can tell, I'm paging through API results as I should:
Make a request
Get a result back containing the 'totalResults' and the 'nextPage' token
Make the same request, adding the 'pageToken' parameter
Some issues I'm having:
If I make any request multiple times, I'll often get one of two different 'totalResults' values
If I page through and grab all the results for various queries, I'll get different numbers of items
Here's a set of queries followed by their 'nextPage' and 'totalResults' values:
https://www.googleapis.com/youtube/v3/search?key=~key~&part=id&channelId=UC8hzWaP0JfFJ8a1UjLIl1FA&maxResults=50&publishedAfter=2016-02-23T14%3A32%3A58.928-06%3A00&publishedBefore=2017-02-24T14%3A32%3A58.928-06%3A00&type=video
CDIQAA/239
https://www.googleapis.com/youtube/v3/search?key=~key~&part=id&channelId=UC8hzWaP0JfFJ8a1UjLIl1FA&maxResults=50&pageToken=CDIQAA&publishedAfter=2016-02-23T14%3A32%3A58.928-06%3A00&publishedBefore=2017-02-24T14%3A32%3A58.928-06%3A00&type=video
CGQQAA/188
https://www.googleapis.com/youtube/v3/search?key=~key~&part=id&channelId=UC8hzWaP0JfFJ8a1UjLIl1FA&maxResults=50&pageToken=CGQQAA&publishedAfter=2016-02-23T14%3A32%3A58.928-06%3A00&publishedBefore=2017-02-24T14%3A32%3A58.928-06%3A00&type=video
CJYBEAA/188
https://www.googleapis.com/youtube/v3/search?key=~key~&part=id&channelId=UC8hzWaP0JfFJ8a1UjLIl1FA&maxResults=50&pageToken=CJYBEAA&publishedAfter=2016-02-23T14%3A32%3A58.928-06%3A00&publishedBefore=2017-02-24T14%3A32%3A58.928-06%3A00&type=video
null/239
The first three queries contained 50 items, and the last contained 18, so I got 168 items total. This is really frustrating since I don't know if any of the three counts is the correct count.
Again, if I put any one query in my browser and hit 'refresh' over and over, I'll get either 188 or 239.

Youtube V3 - LiveChatMessages.list only returns a max of 75 requests

When I try and send the following request:
GET https://www.googleapis.com/youtube/v3/liveChat/messages?liveChatId={..}&part=snippet&maxResults=250key={...}
I only get a max of 75 results returned even though there are more than 75 comments in my livestream. These 75 comments returned are also the 75 newest comments. Setting the nextPageToken with the value in the response above returns an empty set of comments. It's almost like I need a way to view previous pages. Setting maxResults, as I have in the url above does nothing as well. When I add a new comment to the livestream, the first entry of the 75 disappears and the new comment shows up at the bottom of the list.
I am perplexed over why I cannot receive more than 75 comments and why the number 75 as this is not mentioned anywhere in the documentation. Do you guys have any idea whats going on here? I can provide more information as needed.
Having testing the liveChatMessages google youtube-api with the most active rooms I could find and reviewing the documentation I have concluded that when you query the the liveChatMessages api you get the 75 most recent messages ordered from oldest to newest, and the nextTokenId given if added to the next query will return any more recent messages which have been gathered by Google's servers since your first query. I am not sure how to get older messages, it does not seem possible.

Accessing an item beyond start_index=1000 in a YouTube user upload feed

I am currently trying to pull data about videos from a YouTube user upload feed. This feed contains all of the videos uploaded by a certain user, and is accessed from the API by a request to:
http://gdata.youtube.com/feeds/api/users/USERNAME/uploads
Where USERNAME is the name of the YouTube user who owns the feed.
However, I have encountered problems when trying to access feeds which are longer than 1000 videos. Since each request to the API can return 50 items, I am iterating through the feed using max_length and start_index as follows:
http://gdata.youtube.com/feeds/api/users/USERNAME/uploads?start-index=1&max-results=50&orderby=published
http://gdata.youtube.com/feeds/api/users/USERNAME/uploads?start-index=51&max-results=50&orderby=published
And so on, incrementing start_index by 50 on each call. This works perfectly up until:
http://gdata.youtube.com/feeds/api/users/USERNAME/uploads?start-index=1001&max-results=50&orderby=published
At which point I receive a 400 error informing me that 'You cannot request beyond item 1000.' This confused me as I assumed that the query would have only returned 50 videos: 1001-1051 in the order of most recently published. Having looked through the documentation, I discovered this:
Limits on result counts and accessible results
...
For any given query, you will not be able to retrieve more than 1,000
results even if there are more than that. The API will return an error
if you try to retrieve greater than 1,000 results. Thus, the API will
return an error if you set the start-index query parameter to a value
of 1001 or greater. It will also return an error if the sum of the
start-index and max-results parameters is greater than 1,001.
For example, if you set the start-index parameter value to 1000, then
you must set the max-results parameter value to 1, and if you set the
start-index parameter value to 980, then you must set the max-results
parameter value to 21 or less.
I am at a loss about how to access a generic user's 1001st last uploaded video and beyond in a consistent fashion, since they cannot be indexed using only max-results and start-index. Does anyone have any useful suggestions for how to avoid this problem? I hope that I've outlined the difficulty clearly!
Getting all the videos for a given account is supported, but you need to make sure that your request for the uploads feed is going against the backend database and not the search index. Because you're including orderby=published in your request URL, you're going against the search index. Search index feeds are limited to 1000 entries.
Get rid of the orderby=published and you'll get the data you're looking for. The default ordering of the uploads feed is reverse-chronological anyway.
This is a particularly easy mistake to make, and we have a blog post up explaining it in more detail:
http://apiblog.youtube.com/2012/03/keeping-things-fresh.html
The nice thing is that this is something that will no longer be a problem in version 3 of the API.

Collecting follower/friend Ids of large number of users - Twitter4j

I'm working on a research project which analyses closure patterns in social networks.
Part of my requirement is to collect followers and following IDs of thousands of users under scrutiny.
I have a problem with rate limit exceeding 350 requests/hour.
With just 4-5 requests my limit is exceeding - ie, when the number of followers I collected exceeds the 350 mark.
ie, if I have 7 members each having 50 followers, then when I collect the follower details of just 7 members, my rate exceeds.(7*50 = 350).
I found a related question in stackoverflow here - What is the most effective way to get a list of followers using Twitter4j?
The resolution mentioned there was to use lookupUsers(long[] ids) method which will return a list of User objects... But I find no way in the API to find the screen names of friends/followers of a particular "User" object. Am I missing something here.. Is there a way to collect friends/followers of thousands of users effectively?
(Right now, I'm using standard code - Oauth authentication(to achieve 350 request/hour) followed by a call to twitter.getFollowersIDs)
It's fairly straightforward to do this with a limited number of API calls.
It can be done with two API calls.
Let's say you want to get all my followers
https://api.twitter.com/1/followers/ids.json?screen_name=edent
That will return up to 5,000 user IDs.
You do not need 5,000 calls to look them up!
You simply post those IDs to users/lookup
You will then get back the full profile of all the users following me - including screen name.

Delay and Inconsistent results using Twitter search API when using "since_id" parameter

We've noticed what seems to be a delay and/or inconsistent results using the Twitter Search API when specifying a sinceid in the param clause. For example:
http://search.twitter.com/search?ors=%23b4esummit+#b4esummit+b4esummit&q=&result_type=recent&rpp=100&show_user=true&since_id=
Will give the most recent Tweets, but:
http://search.twitter.com/search?ors=%23b4esummit+#b4esummit+b4esummit&q=&result_type=recent&rpp=100&show_user=true&since_id= 12642940173
will often not give tweets that are after that ID for several hours (even though they're visible in the first query)...
anyone have similar problems?
First off, those are not Twitter search API URLs. You should be querying the API like this:
http://search.twitter.com/search.json?q=%23b4esummit%20OR%20#b4esummit%20OR%20b4esummit&result_type=recent&rpp=100&show_user=true
Second, since_id cuts off from the bottom of the list. You can see the behavior illustrated in this documentation: https://dev.twitter.com/docs/working-with-timelines
For an example, at the time of this writing, the above URL returns 31 entries. Picking the ID of a Tweet in the middle of that list, I constructed:
http://search.twitter.com/search.json?q=%23b4esummit%20OR%20#b4esummit%20OR%20b4esummit&result_type=recent&rpp=100&show_user=true&since_id=178065448397574144
Which only returns 12 entries, which match the top 12 entries of the first URL.

Resources