Here is my example query. It specifies that Tweets must be:
Written in English
Tweeted between 23Jan2010 and 24Jan2010
Have at least 100 "favorites" (likes)
My idea is to use something like the binary search algorithm to narrow down the minimum number of likes the Tweet has. Once only one Tweet is returned by a query, I'll know it is the Tweet with the most likes. The problem is, min_faves--the value that specifies the minimum number of likes--doesn't seem to work. Look at this query. It specifies min_faves as 100. As you can see, this Justin Bieber Tweet appears. It has 1.6k likes. Now, when I attempt to increase the min_faves value to 300 (to narrow down the most liked Tweet), the Justin Bieber Tweet is excluded! I don't know if I am not understanding the query system correctly, or if it is not working, but this seems incorrect. The Justin Bieber Tweet should show up, as it has more than 300 likes. This is just one example of how it doesn't seem to work.
Perhaps this is ocurring because, within the specified time range, the Justin Bieber Tweet did not have enough likes to meet the requirements. This would be very good for me, as I am trying to find the most liked Tweet on that particular day, and not the Tweet with the most likes right now that happens to have been posted on that particular day.
But, I do not believe this is the case. For instance, this query includes 3 Tweets from "Rev Run" when min_faves is set to 249, but returns 0 Tweets when min_faves is set to 250. I doubt that these Tweets all had exactly 249 likes on that day (as implied by these symptoms).
Does anyone either:
Understand why these results occur and how I can use this method to find the most liked Tweet of a particular day
Know of a better, alternative way I can find the most liked Tweet of a particular day
Thank you all
#sinanspd requested an example from 2018:
Here is a search with min_faves at 300k. It includes a post with 769k likes and a post with 479k likes. When the query's min_faves is bumped up to 400k neither are returned.
Related
I'd like to extract all tweets in the Arabic language in all countries.
I modified the code in this tutorial.
This is my search query.
api.search(q="*", count=tweetsPerQry, lang ['ar'],tweet_mode='extended'). I expect to find a very large number of tweets, but I only collected about 7000 tweets.
I checked the content of some of them and I noticed that they are posted in my country even I did not specify the location/Country (Can anyone explain why this happen??).
I tried to know the reason for finding a limited number of tweets, so I modified the query by replacing the lang parameter by geocode to find tweets in a city. I fetched more than 65,000 Arabic tweets. After that, I used the lang parameter with the geocode and I found a very limited number of tweets.
Can anyone help me to know why I'm not able to get a large number of tweets when I used lang parameter?
The free twitter API's are good for small projects, but keep in mind that they don't display all of the tweets. Twitter has paid API's that are much more powerful, though what you are trying to achieve should be possible. I ran the query attached bellow, it seemed to work I was able to find a considerable amount of tweets. This method also seemed to work for #ebt_dev too I think it was just the structure of your request was set out like the stream listener version not the cursor search.
# Search Query change the X of .items(X) to the amount of tweets you are looking for
for tweet in tweepy.Cursor(api.search, q='*',tweet_mode='extended', lang='de').items(9999999):
# Defining Tweets Creators Name
tweettext = str( tweet.full_text.lower().encode('ascii',errors='ignore')) #encoding to get rid of characters that may not be able to be displayed
# Defining Tweets Id
tweetid = tweet.id
# printing the text of the tweet
print('\ntweet text: '+str(tweettext))
# printing the id of the tweet
print('tweet id: '+str(tweetid))
List item
each day I want to find the "most popular" post on the website and feature it on the home page.
For each post, I'm keeping track of how many times it has been "liked", "disliked", "favorited" and "viewed".
I would like to run a daily cron job where I do something like:
post = Post.order("popularity_score DESC").first
post.feature!
My question is, how should I compute the value of popularity_score?
Is there a formula that takes into consideration "statistical significance"? Meaning, a post which has 1 "like" vote and nothing else, although having a 100% approval rating, it shouldn't mean much because only one person voted on it.
In general I have these loose ideas off the top of my head:
a post with 10 likes and no other votes is more popular than a
post with 1 like vote.
a post post with more "dislikes" than
"likes" should have a lower score than a post with more "likes" than
"dislikes"
a post with 20 views and no other votes is more
popular than a post with 3 views.
I've punched in some arbitrary formulas to try to satisfy this goal, but there are exactly that, arbitrary and I don't really know if there is a better way to go about this?
Suggestions?
Maybe you could just take the SO approach? it seems rather decent.
+ gives 10 points
- substracts 2 points
view add a low number, like 0.01 point
comment add 2 points
One suggestion is to not reset your counter each day (that leaves the "most popular" open to a single vote).
Instead, weight the votes by their age -- newer votes count more than older votes. This will give you gradual and meaningful rerankings over time.
looks like there is a method that gives the retweets of a particular tweet. IS there any way to find out the total number of all retweets of my tweets?
The answer is no. There may be a few hacks to get an approximation, but the answer is still no.
Twitter urges developers to think of timelines as an infinite stream rather than a finite list of tweets. You cannot count something when it has infinite length, so you cannot get the total number of retweets.
What you can do is take a small piece of the timeline (1000 tweets?) and say "I was retweeted 200 times in my past 1000 tweets".
When developing Twitter applications, always take this into consideration. There's no such thing as "all tweets", just the last x.
I'd like to retrieve the tweets for given a hashtag and sort them from the most retweeted to the less retweeted.
The closest thing I've found is using the search call and use the type tag:
E.g.: http://search.twitter.com/search.json?q=TheHashTagHere&result_type=popular
However, I'm not sure on how "popular" option works.
For instance, if it finds 100 tweets with that hashtag I believe it should show the X most retweeted tweets, and if none of those tweets have been retweeted then it should show X of them randomly (or sorted in some other way like the most recent).
Unfortunately, if follows some kind of unknown rule to identify what's popular and what not and even hashtags with thousands of tweets might return only one or two results.
I hope I made myself clear. Thanks in advance :)
PS: I'll use PHP but I think that shouldn't affect the question?
Results will sometimes contain a
result_type field into the metadata
with a value of either "recent" or
"popular". Popular results are derived
by an algorithm that Twitter computes,
and up to 3 will appear in the default
mixed mode that the Search API
operates under. Popular results
include another node in the metadata
called recent_retweets. This field
indicates how many retweets the Tweet
has had.
Source (Emphasis are mine)
Just call with result_type=popular and check the recent_retweets node to see how popular it is. result_type=popular will become the default in an upcome release so beware if you omit this parameter.
Results with popular tweets aren't ordered chronologically. *
If you would like to always have results to show, use result_type=mixed: they will have the result_type in the "metadata" section with a value of "recent", and popular results will have "popular". A small reference about result_types:
mixed: Include both popular and real time results in the response.
recent: return only the most recent results in the response
popular: return only the most popular results in the response.
If a search query has any popular results, those will be returned at the top, even if they are older than the other results. *
*[Twitter API Announcements]
This isn't a programmatic method but rather works in the browser with a chrome extension (HackyBird) :
Install the extension
Search for a phrase e.g. #Social (twitter.com/search?q=%23Social)
Click the extension to sort it (you can adjust the ratio of retweets/likes used for sorting in extension options).
P.S. It'll also sort your or any other user's timeline.
I have already asked a similar question earlier but I have notcied that I have big constrain: I am working on small text sets suchs as user Tweets to generate tags(keywords).
And it seems like the accepted suggestion ( point-wise mutual information algorithm) is meant to work on bigger documents.
With this constrain(working on small set of texts), how can I generate tags ?
Regards
Two Stage Approach for Multiword Tags
You could pool all the tweets into a single larger document and then extract the n most interesting collocations from the whole collection of tweets. You could then go back and tag each tweet with the collocations that occur in it. Using this approach, n would be the total number of multiword tags that would be generated for the whole dataset.
For the first stage, you could use the NLTK code posted here. The second stage could be accomplished with just a simple for loop over all the tweets. However, if speed is a concern, you could use pylucene to quickly find the tweets that contain each collocation.
Tweet Level PMI for Single Word Tags
As also suggested here, For single word tags, you could calculate the point-wise mutual information of each individual word and the tweet itself, i.e.
PMI(term, tweet) = log [ P(term, tweet) / (P(term)*P(tweet))
Again, this will roughly tell you how much less (or more) surprised you are to come across the term in the specific document as appose to coming across it in the larger collection. You could then tag the tweet with a few terms that have the highest PMI with the tweet.
General Changes for Tweets
Some changes you might want to make when tagging with tweets include:
Only use a word or collocation as a tag for a tweet, if it occurs within a certain number or percentage of other tweets. Otherwise, PMI will tend to tag tweets with odd terms that occur in just one tweet but that are not seen anywhere else, e.g. misspellings and keyboard noise like ##$##$%!.
Scale the number of tags used with the length of each tweet. You might be able to extract 2 or 3 interesting tags for longer tweets. But, for a shorter 2 word tweet, you probably don't want to use every single word and collocation to tag it. It's probably worth experimenting with different cut-offs for how many tags you want to extract given the tweet length.
I have used a method earlier, for small text content such as SMSes, where I would just repeat the same line two times. Surprisingly, that works well for such content where a noun could well be the topic. I mean, you don't need it to repeat for it to be the topic.