Using Yahoo Pipes to filter tweets - twitter

I am trying to create a yahoo pipe that takes ideally takes all tweets tweeted at any point in time and filters down by a number of attributes to then display a filtered feed.
Basically in order this is what I want to happen:
Get a feed of all tweets at any one time.
Filter tweets by geolocation origin, i.e. UK,
Filter by a number of of different combinations of keywords.
Output as an RSS feed (though this isn't really the crucial stage as Yahoo Pipes takes care of this anyway)
Disclaimer: of course I understand that there are limits to the amount of tweets that could come through etc but I would like to cast the input net as wide as possible.
I have managed to get stages 3 & 4 working correctly and for the time being I am not really worrying about step 2 (although if you have any suggestions I am all ears), but stages 1 is where I am struggling. What I have attempted is using a Fetch Feed module with the URL - http://search.twitter.com/search.atom?q=lang:en - however it seems that this only pulls 15 tweets. Is there any way that I can pull more than 15 tweets every time the pipe is run, otherwise I think this may all be in vain.
FYI, here is the link to the pipe as it stands - http://pipes.yahoo.com/ludus247/182ef4a83885698428d57865da5cf85b
Thanks in advance!

Related

Tweepy: Find all tweets in a specific language

I'd like to extract all tweets in the Arabic language in all countries.
I modified the code in this tutorial.
This is my search query.
api.search(q="*", count=tweetsPerQry, lang ['ar'],tweet_mode='extended'). I expect to find a very large number of tweets, but I only collected about 7000 tweets.
I checked the content of some of them and I noticed that they are posted in my country even I did not specify the location/Country (Can anyone explain why this happen??).
I tried to know the reason for finding a limited number of tweets, so I modified the query by replacing the lang parameter by geocode to find tweets in a city. I fetched more than 65,000 Arabic tweets. After that, I used the lang parameter with the geocode and I found a very limited number of tweets.
Can anyone help me to know why I'm not able to get a large number of tweets when I used lang parameter?
The free twitter API's are good for small projects, but keep in mind that they don't display all of the tweets. Twitter has paid API's that are much more powerful, though what you are trying to achieve should be possible. I ran the query attached bellow, it seemed to work I was able to find a considerable amount of tweets. This method also seemed to work for #ebt_dev too I think it was just the structure of your request was set out like the stream listener version not the cursor search.
# Search Query change the X of .items(X) to the amount of tweets you are looking for
for tweet in tweepy.Cursor(api.search, q='*',tweet_mode='extended', lang='de').items(9999999):
# Defining Tweets Creators Name
tweettext = str( tweet.full_text.lower().encode('ascii',errors='ignore')) #encoding to get rid of characters that may not be able to be displayed
# Defining Tweets Id
tweetid = tweet.id
# printing the text of the tweet
print('\ntweet text: '+str(tweettext))
# printing the id of the tweet
print('tweet id: '+str(tweetid))

Getting a sum from Parse with Parse Cloud Code (for iOS app)

I'm new to Parse Cloud Code and am struggling with a seemingly simple task.
I'm working on a small iOS game where the users can choose from a list of characters to play -- imagine mario or luigi. In addition to tracking user scores in the game, I'm tracking total points for each character in Parse, so I can display a "mario" total and a "luigi" total (from all users.)
There could be multiple users playing at once (I hope), so I don't have Parse saving to just one mario and one luigi counter. Instead, each user gets a running count of their own mario and luigi scores.
So how do I pull the total marioPoints and total luigiPoints?
Parse doesn't have SQL-styled querying, so I've been looking at Parse Cloud Code and their "average stars" example (https://parse.com/docs/cloudcode/guide#cloud-code) looked kind of close at first glance:
But I can't get it sorted. And even if I could, it's limited to 1,000 responses, which wouldn't be enough. (I'm optimistic.)
Thanks!
Your best option is to keep a running total when any individual user update is saved. Do that using a save hook and the increment( attr, amount ) function.

Collecting follower/friend Ids of large number of users - Twitter4j

I'm working on a research project which analyses closure patterns in social networks.
Part of my requirement is to collect followers and following IDs of thousands of users under scrutiny.
I have a problem with rate limit exceeding 350 requests/hour.
With just 4-5 requests my limit is exceeding - ie, when the number of followers I collected exceeds the 350 mark.
ie, if I have 7 members each having 50 followers, then when I collect the follower details of just 7 members, my rate exceeds.(7*50 = 350).
I found a related question in stackoverflow here - What is the most effective way to get a list of followers using Twitter4j?
The resolution mentioned there was to use lookupUsers(long[] ids) method which will return a list of User objects... But I find no way in the API to find the screen names of friends/followers of a particular "User" object. Am I missing something here.. Is there a way to collect friends/followers of thousands of users effectively?
(Right now, I'm using standard code - Oauth authentication(to achieve 350 request/hour) followed by a call to twitter.getFollowersIDs)
It's fairly straightforward to do this with a limited number of API calls.
It can be done with two API calls.
Let's say you want to get all my followers
https://api.twitter.com/1/followers/ids.json?screen_name=edent
That will return up to 5,000 user IDs.
You do not need 5,000 calls to look them up!
You simply post those IDs to users/lookup
You will then get back the full profile of all the users following me - including screen name.

Getting adjusted price information from Yahoo! Finance API for multiple symbols in one call

I would like to get the adjusted price (adjusting for splits and dividends) for a group of stock symbols using Yahoo! Finance. It looks like the historical prices call is limited to one symbol at a time. Could please let me know if there is a way to get multiple symbols in one call?
I would like to get this data so I can do some back testing on that data. Since I may require quite a few symbols (say 500-1000), it will be easier if I can make just a few batch calls to Yahoo!'s servers instead of making one call per symbol everyday.
Another way of getting the adjusted price is to use their daily stock price api and adjust it manually using dividend and splits information (they allow multiple symbols for their daily stock quotes). Unfortunately I cannot find any way to get splits information from the http call (guessing based on 50% or 200% is one option but if you deal with penny stocks, this can be dangerous and cannot figure out uneven splits). Also, the dividend information returned by it is not easy to decode. They seem to be returning the total over 4 quarters and the dividend date doesn't really correspond with the actual dividend date based on the historical price. The various options for the call can be found here: http://www.gummy-stuff.org/Yahoo-data.htm
Any suggestions on getting adjusted price for multiple symbols? Or Am I unnecessarily worrying about making 100s of calls to Yahoo! everyday? Ideally I would like to download all the required data within a couple of hours each day - that would be 10-20 calls per minute. Is that too much? I couldn't find any documentation on the permissible number of requests per second.
I am open to other places where I can get similar data. However, since I am just trying to learn the basics of quant trading and not trade, I would prefer free downloads.
Thanks
-e
This is an old question, but I did find a source where split data is available. Not sure how comprehensive these announcements are though:
http://biz.yahoo.com/c/09/s1.html
In the url, the "09" part is the year (2009), and the "s1" part is the month (s1 = Jan, s2 = Feb., s3 = Mar., etc.)
It isn't a nice clean CSV, but the format of the page is consistent and should be parseable. Just make a query each day for the current month, parse the page, and process any splits that you didn't see the day before.
ETA: And another source (probably less reliable than Yahoo, but can be queried by ticker):
http://getsplithistory.com/
I am not sure which language you are using but I have a sample in C#. I think it will give you the idea at least or may be help some one else
private string BASE_URL = "http://query.yahooapis.com/v1/public/yql?q=" + "select%20*%20from%20yahoo.finance.quotes%20where%20symbol%20in%20({0})" + "&env=store%3A%2F%2Fdatatables.org%2Falltableswithkeys";
Collection<Quote> quotes;
string symbolList = String.Join("%2C", quotes.Select(w => "%22" + w.Symbol + "%22").ToArray());
string url = string.Format(BASE_URL,symbolList);
XDocument doc = XDocument.Load(url);
Parse(quotes,doc);
What we are doing here is appending "," to each array item then passing that symbol list to yahoo. I have successfully fetched prices for 700 symbols in each call. Hitting yahoo servers for each ticker is a pain. I fetch stock prices for all of 6500+ tickers everyday. Earlier it use to take 3 hours now it is less than 2 mins.....sweet
Source link for that code is here - http://www.jarloo.com/get-yahoo-finance-api-data-via-yql/
P.S. Please get a api key to work smoothly. The above url is a public link where tables are timed out most of the time. Once you get an api key then your url will be (minus "public")
http://query.yahooapis.com/v1/yql

tag generation from a small text content (such as tweets)

I have already asked a similar question earlier but I have notcied that I have big constrain: I am working on small text sets suchs as user Tweets to generate tags(keywords).
And it seems like the accepted suggestion ( point-wise mutual information algorithm) is meant to work on bigger documents.
With this constrain(working on small set of texts), how can I generate tags ?
Regards
Two Stage Approach for Multiword Tags
You could pool all the tweets into a single larger document and then extract the n most interesting collocations from the whole collection of tweets. You could then go back and tag each tweet with the collocations that occur in it. Using this approach, n would be the total number of multiword tags that would be generated for the whole dataset.
For the first stage, you could use the NLTK code posted here. The second stage could be accomplished with just a simple for loop over all the tweets. However, if speed is a concern, you could use pylucene to quickly find the tweets that contain each collocation.
Tweet Level PMI for Single Word Tags
As also suggested here, For single word tags, you could calculate the point-wise mutual information of each individual word and the tweet itself, i.e.
PMI(term, tweet) = log [ P(term, tweet) / (P(term)*P(tweet))
Again, this will roughly tell you how much less (or more) surprised you are to come across the term in the specific document as appose to coming across it in the larger collection. You could then tag the tweet with a few terms that have the highest PMI with the tweet.
General Changes for Tweets
Some changes you might want to make when tagging with tweets include:
Only use a word or collocation as a tag for a tweet, if it occurs within a certain number or percentage of other tweets. Otherwise, PMI will tend to tag tweets with odd terms that occur in just one tweet but that are not seen anywhere else, e.g. misspellings and keyboard noise like ##$##$%!.
Scale the number of tags used with the length of each tweet. You might be able to extract 2 or 3 interesting tags for longer tweets. But, for a shorter 2 word tweet, you probably don't want to use every single word and collocation to tag it. It's probably worth experimenting with different cut-offs for how many tags you want to extract given the tweet length.
I have used a method earlier, for small text content such as SMSes, where I would just repeat the same line two times. Surprisingly, that works well for such content where a noun could well be the topic. I mean, you don't need it to repeat for it to be the topic.

Resources