I'm getting familiar with data on Twitter-Insight on Blue-Mix. I have noticed that there is a added field "USER-GENDER". As this field is not in user profiles, I tried to look in the documentation but it's not mentioned how the gender is evaluated (by Surname? by previous tweets (this would probably means that a specific algorithm is used? which one?).
If you can point me to any documentation that I missed that would be highly appreciated.
Martin
You are probably referring to the Insights for Twitter documentation that lists gender information as an enrichment.
That information can be directly obtained from the Twitter profiles, by analyzing the profile data. There are other services including Twitter itself that provide gender information based on the profile data. How the gender is derived is not documented. :)
I was asked to find Twitter accounts associated with the Dominican Republic (the project had to do with voting). This was a strange request since some twitter accounts have GeoSpatial data associated with the account, we have no idea whether it is accurate.
I wound up searching by hand for keywords that I knew were related: #dominican, #washingtonheights and I hopped along their friends and followers and I found the people I was looking for.
More genereally:
How do I search for Twitter accounts associated with a given topic? How might it be possible to train a bot to identify hashtags relevant to a given topic? And then we can search for those keywords.
#Moderators: This is not really a coding question. If you can think of a better StackExchange, please migrate this!
Since you already have a given Topic i would suggest he following:
Get a couple of Account by Hand by these Hashtags you already mentioned.
Retrieve X tweets for these Accounts
Do some Natural Language Processing of these Tweets to get new ideas for Keywords.
Some things i used in this/similar contex:
tf-idf + NMF to get Topics and then sort by components to retrieve
the topics a user is talking about (user can have multiple topics).
some sort of clustering (your biggest problem here will be the high
sparesity of the data, so PCA could be an option)
use wordnet etc to collect similar keywords
I am not a very avid user and hence having a hard time figuring out what each field means in twitter api response. Going thru documentation has only resulted in me going in circles.
What I am trying to do is analyzing how things go viral. So, what I did was grabbed data from twitter streaming api with hope of analyzing the response but I am totally confused.
So a sample json response is :
{"created_at":"Thu Mar 14 18:19:12 +0000 2013","id":312266679390457857,"id_str":"312266679390457857","text":"The first four winners of our March Madness Giveaway (4x ADATA Technology (USA) 16GB DashDrives) are:\n\nAaron... http:\/\/t.co\/ikPbfRZQdq","source":"\u003ca href=\"http:\/\/www.facebook.com\/twitter\" rel=\"nofollow\"\u003eFacebook\u003c\/a\u003e","truncated":false,"in_reply_to_status_id":null,"in_reply_to_status_id_str":null,"in_reply_to_user_id":null,"in_reply_to_user_id_str":null,"in_reply_to_screen_name":null,"user":{"id":179622147,"id_str":"179622147","name":"Levetron","screen_name":"Levetron","location":"Los Angeles","url":"http:\/\/www.aziocorp.com","description":"Official Twitter for Levetron by AZiO. Here for customer questions, gaming tips & tricks, sharing cool ideas, product launch releases, reviews and more!","protected":false,"followers_count":1042,"friends_count":25,"listed_count":4,"created_at":"Tue Aug 17 18:56:29 +0000 2010","favourites_count":5,"utc_offset":-28800,"time_zone":"Pacific Time (US & Canada)","geo_enabled":false,"verified":false,"statuses_count":707,"lang":"en","contributors_enabled":false,"is_translator":false,"profile_background_color":"131516","profile_background_image_url":"http:\/\/a0.twimg.com\/images\/themes\/theme14\/bg.gif","profile_background_image_url_https":"https:\/\/si0.twimg.com\/images\/themes\/theme14\/bg.gif","profile_background_tile":true,"profile_image_url":"http:\/\/a0.twimg.com\/profile_images\/3223061028\/999ac6efc782d85983cbcf7f2deab7c1_normal.png","profile_image_url_https":"https:\/\/si0.twimg.com\/profile_images\/3223061028\/999ac6efc782d85983cbcf7f2deab7c1_normal.png","profile_banner_url":"https:\/\/si0.twimg.com\/profile_banners\/179622147\/1360294489","profile_link_color":"009999","profile_sidebar_border_color":"EEEEEE","profile_sidebar_fill_color":"EFEFEF","profile_text_color":"333333","profile_use_background_image":true,"default_profile":false,"default_profile_image":false,"following":null,"follow_request_sent":null,"notifications":null},"geo":null,"coordinates":null,"place":null,"contributors":null,"retweet_count":0,"entities":{"hashtags":[],"urls":[{"url":"http:\/\/t.co\/ikPbfRZQdq","expanded_url":"http:\/\/fb.me\/M6YPCk9W","display_url":"fb.me\/M6YPCk9W","indices":[112,134]}],"user_mentions":[]},"favorited":false,"retweeted":false,"possibly_sensitive":false,"filter_level":"medium"}
1)My guess is if this tweet is a result of a retweet then "retweeted" should be true.
But how do i figure out from which user it was retweeted?
2) Is "id" user id or tweet id?
Basically, if lets say I want to analyze how (say) Gangham style went viral.. who retweeted/followed that particular tweet, how should i do this?
Also, has twitter recently changed its api. I am using python for this but looks to me that all of those api's examples are not working
For example: https://github.com/tweepy/tweepy
Any suggestions.
Thanks
Please see the Twitter API documentation relating to tweets. It describes all parameters returned in the Twitter JSON response.
That tweet was not retweeted because retweeted is set to false and retweeted_count is 0.
From the documentation:
id = The integer representation of the unique identifier for this
Tweet.
Retweets can be distinguished from typical Tweets by the existence of
a retweeted_status attribute. This attribute contains a representation
of the original Tweet that was retweeted.
Also, here is a list of supported twitter libraries. There are several other libraries other than tweepy listed that might work for you.
I have reviewed the twitter documentation #anywhere
where I can use User Object Properties but in the userdata I don't find the gender property.
When you create a twitter account it never asks for gender, so you cant get it through the API.
You would need some kind of AI to determine it.
Demographic data such as gender and age are not available from the API -- and not always appropriate as Twitter accounts can represent many things not limited to persons alive or dead.
Latest answer #2016
You still can't get twitter users' gender but can you can deduce it with some workable apis.
See node-cheat gender_by_name for all reasonable and tested solutions.
Quick question, I'm having trouble finding this info on the Twitter dev centre.
On the object for a tweet, in the user section there is 'favourites_count', what exactly does this imply? Is that how many tweets they themsevles have favourited? or is it perhaps how many of their tweets have been favourited? Or is it something else?
It's the number of tweets that given user has marked as favorite.
I check'd this with an API call with my own screen name.
The number of Tweets this user has liked in the account’s lifetime. British spelling used in the field name for historical reasons.