Collecting follower/friend Ids of large number of users - Twitter4j - twitter

I'm working on a research project which analyses closure patterns in social networks.
Part of my requirement is to collect followers and following IDs of thousands of users under scrutiny.
I have a problem with rate limit exceeding 350 requests/hour.
With just 4-5 requests my limit is exceeding - ie, when the number of followers I collected exceeds the 350 mark.
ie, if I have 7 members each having 50 followers, then when I collect the follower details of just 7 members, my rate exceeds.(7*50 = 350).
I found a related question in stackoverflow here - What is the most effective way to get a list of followers using Twitter4j?
The resolution mentioned there was to use lookupUsers(long[] ids) method which will return a list of User objects... But I find no way in the API to find the screen names of friends/followers of a particular "User" object. Am I missing something here.. Is there a way to collect friends/followers of thousands of users effectively?
(Right now, I'm using standard code - Oauth authentication(to achieve 350 request/hour) followed by a call to twitter.getFollowersIDs)

It's fairly straightforward to do this with a limited number of API calls.
It can be done with two API calls.
Let's say you want to get all my followers
https://api.twitter.com/1/followers/ids.json?screen_name=edent
That will return up to 5,000 user IDs.
You do not need 5,000 calls to look them up!
You simply post those IDs to users/lookup
You will then get back the full profile of all the users following me - including screen name.

Related

Google Ads API: getMetrics doesn't get results for all passed keywords

I'm on migration from the old Google AdWords API to the new Google Ads API, using PHP-SDK by Google.
This is the use case, where I'm stuck:
I feed an amount of keywords (paginating them by keyword plans a 10k) to generateHistoricalMetrics($keywordPlanResource) and collect the results.
To do so I followed instructions at https://developers.google.com/google-ads/api/docs/keyword-planning/generate-historical-metrics and, especially, https://developers.google.com/google-ads/api/docs/keyword-planning/generate-historical-metrics#mapping_to_the_ui, with using of KeywordPlanAdGroupKeywords (with a single ad group) and avoiding to pass a specific date range for now, relying on the default value.
Further I had to apply some filters on my keywords because of KEYWORD_HAS_INVALID_CHARS and KEYWORD_TEXT_TOO_LONG, but all the errors which I'm aware of are gone now.
Now, I found out, that the KeywordPlanHistoricalMetrics object does not contain any keyword id (of the form customers//keywordPlanAdGroupKeywords/) So, I have to rely on the correct ordering. This is ok as it seems, that the original ordering of keywords is preserved within the results, as here https://developers.google.com/protocol-buffers/docs/encoding#optional
But still I have the problem, that
count($keywordPlanServiceClient->generateHistoricalMetrics($keywordPlanResource)->getMetrics()) is lower then count($passedKeywords), where each of $passedKeywords where passed to
new KeywordPlanAdGroupKeyword([
'text' => $passedKeyword,
'match_type' => KeywordMatchType::EXACT
'keyword_plan_ad_group' => $planAdGroupResource
]);
Q: So I have two questions here:
Why getMetrics() does not yield the same amount of results as the amount of passed keywords?
I'm struggling with debugging at this moment: Say, I want to know which keywords are let out. Either for providing more information at this place or just to skip them, and let my customer know, that these particular keywords were not queried. How to do this, when although I have a keyword id for every passed keyword I cannot match the returned metrics to them, because the KeywordPlanHistoricalMetrics object does not contain any keyword id.
Detail: While testing I found out, that the reducing of an amount of queried keywords reduces the amount of lost keyword data:
10k of queried keywords - 4,72% loss,
5k - 2,12%,
2,5k - 0,78%,
1,25k - 0,43%,
625 - 0,3%,
500 - 0,24%,
250 - 0,03%
200 - 0,03% of lost keywords.
But I can't imagine, that keywords should be queried one by one.

Max Subscribers Returned (and duplicates) | YouTube API

[Problem 1]
I am using https://developers.google.com/youtube/v3/docs/subscriptions/list for a large channel (1 million subscribers) but after 100 successful pages of results (50 subscribers per page), the API always returns 0 subscribers.
Is there a hard limit of 100 pages or 5,000 subscribers that can be returned?
[Problem 2]
Of the 5,000 Subscribers returned, only 3,577 are unique. The API seems to be returning duplicates in some cases which I know is a long standing issue with getting channel subscribers. Hoping to learn if this will be fixed?
I ran into the second problem today and it seems like the duplicates happens because the default order of the API list is SUBSCRIPTION_ORDER_RELEVANCE
Acceptable values are:
alphabetical – Sort alphabetically.
relevance – Sort by relevance.
unread – Sort by order of activity.
So setting order to be alphabetical solves the problem entirely.

minimizing parse requests while looping through array

I'm working on a pet project using parse as a back end. I'm setting up a viewcontroller that contains a list of people you can possibly add as "friends"; these are people that
a) exist in your contacts list and
b) have already downloaded the app and signed up.
Different buttons will be displayed depending on their status as a user (invite button if they only exist in your contacts list, add to friends button if they're also using the app already).
I'm trying to keep my Parse account to 30 requests/second so that I'm not out of pocket for a pet app.
One way I've thought to figure out who is registered as a user AND who exists in my contacts list is to loop through the contacts list on my phone and query that phone number on parse. However, this would obviously go over my limit on requests/second.
Is there a way (I've looked through Parse documentation and googled it) to take an array (list of contacts on my phone) and run a PFQuery ON THAT ARRAY, checking each object and returning matches?
Unless you have a quarter million users in your app you shouldn't be much concerned, it doesn't work like: 1 user goes through 30 count for loop with one query each and you get 30 req/s:
How does the requests/second limit translate to concurrent users?
Generally when your user count doubles, your requests per second also double. However, different apps send different numbers of requests per second depending on how frequently they save objects or issue queries. We estimate that the average app's active user will issue 10 requests. Thus, if you had a million users on a particular day, and their traffic was evenly spread throughout the day, you could estimate your app would need about 10,000,000 total API requests, or about 120 requests per second. Every app is different, so we strongly encourage you to measure how many requests your users send.
I have run through loops of requests and I barely hit 1 req/s
Is there a way (I've looked through Parse documentation and googled
it) to take an array (list of contacts on my phone) and run a PFQuery
ON THAT ARRAY, checking each object and returning matches?
Yes there is, use:
query?.whereKey(key: String, containedIn: [AnyObject])

Using Yahoo Pipes to filter tweets

I am trying to create a yahoo pipe that takes ideally takes all tweets tweeted at any point in time and filters down by a number of attributes to then display a filtered feed.
Basically in order this is what I want to happen:
Get a feed of all tweets at any one time.
Filter tweets by geolocation origin, i.e. UK,
Filter by a number of of different combinations of keywords.
Output as an RSS feed (though this isn't really the crucial stage as Yahoo Pipes takes care of this anyway)
Disclaimer: of course I understand that there are limits to the amount of tweets that could come through etc but I would like to cast the input net as wide as possible.
I have managed to get stages 3 & 4 working correctly and for the time being I am not really worrying about step 2 (although if you have any suggestions I am all ears), but stages 1 is where I am struggling. What I have attempted is using a Fetch Feed module with the URL - http://search.twitter.com/search.atom?q=lang:en - however it seems that this only pulls 15 tweets. Is there any way that I can pull more than 15 tweets every time the pipe is run, otherwise I think this may all be in vain.
FYI, here is the link to the pipe as it stands - http://pipes.yahoo.com/ludus247/182ef4a83885698428d57865da5cf85b
Thanks in advance!

Accessing an item beyond start_index=1000 in a YouTube user upload feed

I am currently trying to pull data about videos from a YouTube user upload feed. This feed contains all of the videos uploaded by a certain user, and is accessed from the API by a request to:
http://gdata.youtube.com/feeds/api/users/USERNAME/uploads
Where USERNAME is the name of the YouTube user who owns the feed.
However, I have encountered problems when trying to access feeds which are longer than 1000 videos. Since each request to the API can return 50 items, I am iterating through the feed using max_length and start_index as follows:
http://gdata.youtube.com/feeds/api/users/USERNAME/uploads?start-index=1&max-results=50&orderby=published
http://gdata.youtube.com/feeds/api/users/USERNAME/uploads?start-index=51&max-results=50&orderby=published
And so on, incrementing start_index by 50 on each call. This works perfectly up until:
http://gdata.youtube.com/feeds/api/users/USERNAME/uploads?start-index=1001&max-results=50&orderby=published
At which point I receive a 400 error informing me that 'You cannot request beyond item 1000.' This confused me as I assumed that the query would have only returned 50 videos: 1001-1051 in the order of most recently published. Having looked through the documentation, I discovered this:
Limits on result counts and accessible results
...
For any given query, you will not be able to retrieve more than 1,000
results even if there are more than that. The API will return an error
if you try to retrieve greater than 1,000 results. Thus, the API will
return an error if you set the start-index query parameter to a value
of 1001 or greater. It will also return an error if the sum of the
start-index and max-results parameters is greater than 1,001.
For example, if you set the start-index parameter value to 1000, then
you must set the max-results parameter value to 1, and if you set the
start-index parameter value to 980, then you must set the max-results
parameter value to 21 or less.
I am at a loss about how to access a generic user's 1001st last uploaded video and beyond in a consistent fashion, since they cannot be indexed using only max-results and start-index. Does anyone have any useful suggestions for how to avoid this problem? I hope that I've outlined the difficulty clearly!
Getting all the videos for a given account is supported, but you need to make sure that your request for the uploads feed is going against the backend database and not the search index. Because you're including orderby=published in your request URL, you're going against the search index. Search index feeds are limited to 1000 entries.
Get rid of the orderby=published and you'll get the data you're looking for. The default ordering of the uploads feed is reverse-chronological anyway.
This is a particularly easy mistake to make, and we have a blog post up explaining it in more detail:
http://apiblog.youtube.com/2012/03/keeping-things-fresh.html
The nice thing is that this is something that will no longer be a problem in version 3 of the API.

Resources