Twitter: Unique ID for mention - twitter

Can one man occured twice (or more times) in user_mentions for specific tweet?
I need to create an unique combination for mentions for specific tweet.
For this i want to use: "tweet_id"+"_"+"person_id"
But!
If a man can occur twice in user_mentions, i can't use this combination and must use indeces of his name position?
"tweet_id"+"_"+"person_id"+"-"+"left_position"+"-"+"right_position"

Can one man occured twice (or more times) in user_mentions for
specific tweet?
It can happen.
If a man can occur twice in user_mentions, i can't use this
combination and must use indeces of his name position?
"tweet_id"+"_"+"person_id"+"-"+"left_position"+"-"+"right_position"
Your answer is in your question. Tweet Entities gives you all the information you want.

Related

What is an effecient way to capture the username after the "#" on instagram using regex?

In other words, if I made a call to the Instagram API and I wanted to capture the usernames of someone tagged in the comments, and I knew the comment was on the first line every time, what would be the most accurate way to capture the string (username) after the '#' that precedes a tagged user, and have that expression repeat for a known number of posts.
Matching characters wouldn't be useful since the tagged name constantly changes form post to post. So if the comment was "picture taken by #JohnSmith," what is the best way to capture the string immediately preceding with no known length or character value?
Implementation example- user wants to find the tagged photographer for all photos on a page. There are 100 photos on the #coolphotographer Instagram page and the photographer is tagged on the first line in the comments on each post.
You can try the following regex:
(?:^|[^\w])(?:#)([A-Za-z0-9_](?:(?:[A-Za-z0-9_]|(?:\.(?!\.))){0,28}(?:[A-Za-z0-9_]))?)
NOTE: Above regex is taken from this blog
Thanks #RobbyCornelissen.
Try this. Uses gsub.
"#JohnSmith".gsub('#', "")
You can use next regexp
/#(\w+)/
http://rubular.com/r/5bgU2DtEqX
Try this. Uses gsub.
"#JohnSmith".split("#").last

Automatically updating Data Validation lists based on user input

I have a very large data set (about 16k rows). I have 10 higher level blocks and within each block I have 4 categories (10 rows for each) which use Data Validation lists to show items available in each category. The lists should automatically update based on user input. What I need your help with is that I want to use the same data set for each block and preferably a least calculation/size intensive approach. I have put together a sample file that outlines the issue with examples.
Sample File
Thank you for your help in advance.
Okay, I've found something, but it can be quite time consuming to do.
Select each range of cells. For instance, for the first one, select B3:B18 and right click on the selection. Find 'Name a Range..." and give it the name "_FIN_CNY". Repeat for all the other ranges, changing the name where necessary.
Select the first range of cells to get the data validation, and click on "Data validation", pick the option "Allow: List" (you already have it) and then in the source, put the formula:
=INDIRECT($G$4&"_CNY")
$G$4 is where the user will input. This changes as you change blocks.
_CNY is the category. Change it to _CNY2 for the second category.
Click "OK" and this should be it. Repeat for the other categories.
I have put an updated file on dropbox where you can see I already did it for the data of _FIN for categories CNY, CNY2 and INT and did the one for _GER as well. You'll notice the category of INT for _GER doesn't work, that's because the Named Range _GER_INT doesn't exist yet.

Retrieve most retweeted tweets for a given hashtag

I'd like to retrieve the tweets for given a hashtag and sort them from the most retweeted to the less retweeted.
The closest thing I've found is using the search call and use the type tag:
E.g.: http://search.twitter.com/search.json?q=TheHashTagHere&result_type=popular
However, I'm not sure on how "popular" option works.
For instance, if it finds 100 tweets with that hashtag I believe it should show the X most retweeted tweets, and if none of those tweets have been retweeted then it should show X of them randomly (or sorted in some other way like the most recent).
Unfortunately, if follows some kind of unknown rule to identify what's popular and what not and even hashtags with thousands of tweets might return only one or two results.
I hope I made myself clear. Thanks in advance :)
PS: I'll use PHP but I think that shouldn't affect the question?
Results will sometimes contain a
result_type field into the metadata
with a value of either "recent" or
"popular". Popular results are derived
by an algorithm that Twitter computes,
and up to 3 will appear in the default
mixed mode that the Search API
operates under. Popular results
include another node in the metadata
called recent_retweets. This field
indicates how many retweets the Tweet
has had.
Source (Emphasis are mine)
Just call with result_type=popular and check the recent_retweets node to see how popular it is. result_type=popular will become the default in an upcome release so beware if you omit this parameter.
Results with popular tweets aren't ordered chronologically. *
If you would like to always have results to show, use result_type=mixed: they will have the result_type in the "metadata" section with a value of "recent", and popular results will have "popular". A small reference about result_types:
mixed: Include both popular and real time results in the response.
recent: return only the most recent results in the response
popular: return only the most popular results in the response.
If a search query has any popular results, those will be returned at the top, even if they are older than the other results. *
*[Twitter API Announcements]
This isn't a programmatic method but rather works in the browser with a chrome extension (HackyBird) :
Install the extension
Search for a phrase e.g. #Social (twitter.com/search?q=%23Social)
Click the extension to sort it (you can adjust the ratio of retweets/likes used for sorting in extension options).
P.S. It'll also sort your or any other user's timeline.

Users/Friends with twitterizer that don't follow back

I am using twitterizer in an ASP.NET-project. I nned an example (code) how I can use twitterizer to show a list of followers (paged) for a special account that don't follow back.
The query is: Give me all users that I follow for e.g. 5 days but that didn't follow me back. The result should be displayed in a GridView with paging.
Thanks!
As I stated in the email, there is no way to filter friends or followers by date.
Your best bet to do this is to use TwitterFriendship.FriendsIds() and TwitterFriendship.FollowersIds(), then select the difference between the two. That will give you the list of followers that you don't follow. (Friends are users you follow.) In order to identify new friends/followers, you'll need to keep a list of the Ids, then consult that list at a later date to see the changes over time.
You could create a database (or list, etc) of users you followed and users who follow you. Update this as often you you need and add a time stamp for each new addition. Then you could query this database to create the list you want.

tag generation from a small text content (such as tweets)

I have already asked a similar question earlier but I have notcied that I have big constrain: I am working on small text sets suchs as user Tweets to generate tags(keywords).
And it seems like the accepted suggestion ( point-wise mutual information algorithm) is meant to work on bigger documents.
With this constrain(working on small set of texts), how can I generate tags ?
Regards
Two Stage Approach for Multiword Tags
You could pool all the tweets into a single larger document and then extract the n most interesting collocations from the whole collection of tweets. You could then go back and tag each tweet with the collocations that occur in it. Using this approach, n would be the total number of multiword tags that would be generated for the whole dataset.
For the first stage, you could use the NLTK code posted here. The second stage could be accomplished with just a simple for loop over all the tweets. However, if speed is a concern, you could use pylucene to quickly find the tweets that contain each collocation.
Tweet Level PMI for Single Word Tags
As also suggested here, For single word tags, you could calculate the point-wise mutual information of each individual word and the tweet itself, i.e.
PMI(term, tweet) = log [ P(term, tweet) / (P(term)*P(tweet))
Again, this will roughly tell you how much less (or more) surprised you are to come across the term in the specific document as appose to coming across it in the larger collection. You could then tag the tweet with a few terms that have the highest PMI with the tweet.
General Changes for Tweets
Some changes you might want to make when tagging with tweets include:
Only use a word or collocation as a tag for a tweet, if it occurs within a certain number or percentage of other tweets. Otherwise, PMI will tend to tag tweets with odd terms that occur in just one tweet but that are not seen anywhere else, e.g. misspellings and keyboard noise like ##$##$%!.
Scale the number of tags used with the length of each tweet. You might be able to extract 2 or 3 interesting tags for longer tweets. But, for a shorter 2 word tweet, you probably don't want to use every single word and collocation to tag it. It's probably worth experimenting with different cut-offs for how many tags you want to extract given the tweet length.
I have used a method earlier, for small text content such as SMSes, where I would just repeat the same line two times. Surprisingly, that works well for such content where a noun could well be the topic. I mean, you don't need it to repeat for it to be the topic.

Resources