Can a group of 3 researchers share/pool Twitter API tokens to accelerate/improve data collection on a sentiment analysis project? - twitter

Our group is working on a sentiment analysis research project. We are trying to use the Twitter API to collect tweets. Out aimed dataset involves a lot of query terms and filters. However, since each of us has a developer account, we were wondering if we can pool API access tokens to accelerate the data collection. For example, we will make an app that allows us to define a configuration file that contains a list of our access tokens that the app will try to use to search for a tweet. This app will be run on our local computer. Since the app uses our individual access tokens, we believe that we are not actually not bypassing or changing any Twitter limit as the record is kept for each access token. Are there any problems legal/technical that may arise from this methodology? Thank you! =D
Here is a pseudocode for what we are trying to do:
1. define a list of search terms such as 'apple', 'banana'
and 'oranges' (we have 100 of these search terms, we are okay
with the 100 limit per tweet)
2. define a list of frequent emotional adjectives such as 'happy', 'sad', 'crazy', etc. (we have have 100 of these) using TF-IDF
3. get the product of the search terms and emotional adjectives,
in total we have 10,000 query terms and we have computed
through the rate limit rules that we would need at least
55 runs of 15-minute sessions with 180 tweets per 15-minute.
55 * 15 = 825 minutes or ~14 hours to collect this amount of tweets.
4. we were thinking of improving the data collection by
pooling access tokens so that we can trim down the time
of collection from 14 hours to ~4 hours, e.g. by dividing the query items into subsets and letting a specific access token work on a subset
We were pushing for this since we just think it's efficient if it's possible and permitted since why not and it might help future researches as well?
The question is, are we actually breaking any Twitter rules or policies by doing this? By sharing one access token per each of us three and creating an app that we name as clones of the research project, we believe that in turn we are also losing something which is the headroom for one more app that we fully control.
I can't find specific rule in Twitter so far about this. Our concern is that we will publish a paper and will publish the app we will program and use for documentation and the app we plan to build. Disclaimer: Only the app's source code will be published and not the dataset because of Twitter's explicit rules about datasets.

This is absolutely not allowed under the Twitter Developer Policy and Agreement.
Twitter developer policy 5a:
Do not do any of the following:
Use a single application API key for multiple use cases or multiple application API keys for the same use case.
Feel free to check with Twitter directly via the developer forums. StackOverflow is not really the best place for this question since it is not specifically a coding question.

Related

Is it possible to receive keyword-level Google Display Network ad cost via the API?

I've skimmed through the Keywords Performance Report of the API documentation, and couldn't understand whether it would be possible for me to use this report to determine daily keyword costs.
What I want is basically to be able to look for keyword to an API request result and get the cost associated with it. Is such a thing possible? Am I looking in the right place?
Apparently, it's not possible to do so, since all costs on all Display Network items are listed with a special ID (3000000) in costs, meant to capture all GDN displays.

youtube data api pricing [duplicate]

Is there a way to (paid or unpaid) to increase the 5,000,000 units/day quota limit imposed when using version 3 of the YouTube API?
I have read that a video upload alone uses 16,000 units - this equates to only ~312 videos a day.
I have signed up for 'billing' but still don't get an option to increase from the "courtesy limit" of 5 million.
Yes, there is an elaborate form for applying for higher quota.
It is linked in the developer console for example at: http://console.developers.google.com/apis/api/youtube-json.googleapis.com/quotas?project=YOURPROJECTID
The form starts off with some statements:
This Application will ask you detailed questions about:
(i) your business, (ii) your use of each YouTube API (current and
proposed use, as applicable), and (iii) each of your website or
software application that uses or will use YouTube API(s) (each an
“API Client”).
This application also requires you to submit screenshots and design
documents relating to your API Client(s) and your use of YouTube
API(s). If you do not have these ready, please apply once these are
available.
We will strive to respond to your application as soon as possible,
provided that, all required supporting materials are submitted and
sufficient, and all questions are thoroughly answered.
Note: Please do not apply for more quota unless you are actually close to hitting your current limit.
There is a Form but they do not approve extra quota at the moment. Maybe in the future. Best would be to contact your Google Account Manager (If you have one assigned)
https://docs.google.com/spreadsheet/viewform?formkey=dDdnNVF3aGpuLXV1R2V2Nzg3QjJoZWc6MQ

How does Buzzfeed's "Pound" work?

A week ago Buzzfeed announced a new viral traffic tracking tool called "Pound" (Process for Optimizing and Understanding Network Diffusion). Whereas marketers and webmasters are currently used to seeing social traffic in aggregate buckets per source, Pound promises to help us visualize the actual person-to-person sharing of content and the traffic resulting from each step... sorta, apparently the tool can't (or opts not to) match individual users to their corresponding node in the network:
Pound does not store usernames or any personally identifiable information (PII) with the share events. Each node in the sharing graph is anonymous. We are not able to figure out who a user is by looking at the graph data.
Interesting. I assume Buzzfeed is keeping this anonymous to preempt complaints when the company uses Pound to sell ads. More interesting, the hint the Buzzfeed engineers provide as to how this tool works:
Pound data is collected based on an oscillating, anonymous hash in a sharer’s URL as a UTM code.
How might this work? Does the UTM code mutate every time a link is shared or reshared? I don't understand how this is possible. If it's not, how might this functionality be possible?

Are there two distinct rate limits for the Youtube API?

YouTube has stated that there's a rate limit for their API. And that's totally fine and understandable. However, it appears that, even respecting their rate limit and following their best practices is insufficient. In the YouTube terms of service, section 4H states that "You agree not to use or launch any automated system, including without limitation, "robots," "spiders," or "offline readers," that accesses the Service in a manner that sends more request messages to the YouTube servers in a given period of time than a human can reasonably produce in the same period by using a conventional on-line web browser"
So YouTube has an API to automate certain actions, but you have to limit yourself to an ill-defined notion of some human equivalent. Would following the best practices (in particular, waiting 10 minutes after any "too_many_recent_calls" 403 suffice to obey 4H?)
In my particular application I intend to upload tens of thousands of videos to my YouTube channel, and I'm concerned that even obeying the best practices will still result in YouTube terminating my account without explanation.
(For those concerned that tens of thousands of videos is spammy and illegitimate, I assure you that this is not the case. These are not advertising any product, and according to a couple hundred test case uploads, these are videos which people like much more often than dislike and which have high audience retention. For an example of such a channel (not mine), see http://www.youtube.com/user/EmmaSaying)
The terms of use you linked (containing the 4H section) is for the YouTube website. The API has a different set of terms and you can check the quota information here.

About data mining by using twitter data

I plan to write a thesis about using sentiment information to enhance the predictivity of some financial trading model for currency.
The sentiment data should be twitter threads including some keyword, like "EUR.USD". And I will filter out some sentiment words to identify the sentiment. Simple idea. Then we try to see whether here is any relation between the degree of sentiment and the movement of EUR.USD.
My big concern is on twitter data. As we all know that the twitter set up the limit to see the history data. You could only browser back for like 5 days. It is not enough since our strategy based on daily sentiment.
I noticed that google have some fantastic thing like timeline about the twitter updates: http://www.readwriteweb.com/archives/googles_twitter_timeline_lets_you_explore_the_past.php
But first of all, I am in Switzerland and seems I have no such function on my google which is too smart to identify my location and may block some US google version function like this. Secondly, even I could see some fancy interactive google timeline control on my firefox, How could I dig out data from my query and save them? Does google supply such api?
The Google service you mentioned has shut down recently so you won't be able to use it. (http://www.searchenginejournal.com/google-realtime-shuts-down-as-twitter-deal-expires/31007/)
If you need a longer timespan of data to analyze I see the following options:
pay for historical data :) (https://dev.twitter.com/docs/twitter-data-providers)
if you don't want to pay, you need to fetch tweets containing EUR/USD whatever else (you could use the streaming API for this) and store them somehow. Run this service for a while (if possible) and you'll have more than just 5 days of data.

Resources