Retrieving tweets from twitter using twitter4j - twitter

I am developing an application to guess locations of tornadoes by analyzing twitter data. For this, I would first need to train a neural network on some manually annotated tweets. I am trying to get tweets from last year which have the word 'tornado' in them. This is my code below :-
Query query = new Query("tornado");
query.setRpp(100);
query.setSince("2010-11-01");
query.setUntil("2011-01-13");
QueryResult queryResult = instance.search(query);
tweetList = queryResult.getTweets();
I am able to retrieve tweets from periods closer to now such as last week and such, but am unable to get any results for periods such as the one listed above. Any clues, suggestions would be help. Thanks in advance.

I just found out the reason through a different medium, thought i'd share the answer in case there are other people with the same issue.
It turns out that the twitter search api does not return tweets older than around a week and also, depending on the server load, at times this could be as low as 24 hours ! Hence, any 3rd party libraries (such as twitter4j) which have a wrapper for the twitter search api will behave similarly.
The best way to go about this would be to use third party search and indexing sites such as snapbird, topsy, etc..

Related

How can I retrieve the N most popular tweets for a country using the Twitter API?

TL;DR: I want to be able to retrieve the N most popular tweets for any arbitrary country within the last X hours (up to 24 hours)
More detail
I want to show the details of the most popular tweets by geographic region (country) over the past few hours (adjustable up to 24 hours). How can I use the Twitter REST API to achieve this (v1.1 or v2)?
There are endpoints for querying tweets and filtering by popularity, but they require a search string (e.g. "NASA") and return the most popular tweets matching that search string. I am not interested in the contents of the tweets, I just want to know what is most popular.
I plan on using this functionality to show a world map (using Leaflet) to summarise the most popular tweets by country for the past day.
I am using Twit in NodeJS but not looking for answers specific to Node, rather how to leverage the capabilities of the API.
I am not aware of a way that this can be done directly through the API itself (V1 or V2). I also do not think that this is going to be a trivial task at all.
What I would suggest is using the search endpoint...
V1: Reference
V2: Reference Note that to use geolocation search parameters (see below) you'll need academic access.
... in conjunction with one of the geolocation search parameters. For example, you could pull some subset of tweets from within a country (you will not be able to download all tweets within a single country on any given day, not to mention all countries). After you get this data, you'll need to do some of your own data processing based on how you want to define "popular" (e.g. retweets, likes, etc.) and then go from there.
As I said earlier, this seems like a very large project and not something that can be solved simply with the Twitter API.

Not able to see time zone, place or geolocation of any tweets

I am following two tutorials right now and both are up and running and I've gotten plenty of tweets/sentiment scores from them:
1) Twitter Stream Analytics on Azure https://azure.microsoft.com/en-us/documentation/articles/stream-analytics-twitter-sentiment-analysis-trends/
2) Twitter Analysis with Spark Streaminghttp://ampcamp.berkeley.edu/3/exercises/realtime-processing-with-spark-streaming.html
I am using the free oauth tool provided from apps.twitter.com.
Problem
I've tried getPlace, getGeoLocation in the Spark Streaming app and every tweet I get has a null value for those two fields. I have tried filtering for tweets that only have values for getPlace, get GeoLocation and I get null for both (I ran the app for almost 20 minutes).
I've also tried getting TimeZone in the Azure app (so I can get some sort of geography data) and even then I kept getting null values for TimeZone.
Possible Obstacles
1) Does the free twitter api filter out the place/geoLocation information so I end up buying a subscription to a better api?
2) Do I need to explicitly search for tweets that have geoLocation/Places? Rather than getting all tweets and then filtering out ones that have geoLocation/Places? If so, can I execute this search in Spark Streaming?This is the code that I have in Spark Streaming:
val stream = TwitterUtils.createStream(ssc, None, filters)
val hashTags = stream.map(status => Tweet(status.getPlace().getName(), classifyTweet(status.getText())))
Thank you for the help!
I've personally used the free Twitter api to get locations and publish them on a a map on PowerBi. So you can rule out the first obstacle.
One thing to note is that location field is only available if the client specifically allows the application to have location, which renders it quite rare to be found. The ratio for data with location in my sample data was about 8%.
Don't have an answer for spark side, just wanted to help you rule out the first possibility.
Hope this helps.

twitter api 1.1 url count alternative

I've been using the old url api(v1) to get the count of a given url, lately I needed to get also the re-tweets and started searching about that.
this is the exact url I'm using right now:
http://urls.api.twitter.com/1/urls/count.json?url=http://google.com
As I viewed with some reading the v1 api is deprecated but at least it's still working.
I found some questions on the dev page of twitter:
https://dev.twitter.com/discussions/12643
those are a little old questions and have no specific solving to the problem. I mean, the most near solution was using the search api(search/tweets) which could be good but not a exactly replacement for the urls/count method.
Please note that Twitter's search service and, by extension, the
Search API is not meant to be an exhaustive source of Tweets. Not all
Tweets will be indexed or made available via the search interface.
also it has a limit for 100 results at maximum per 'page', even it throws the link to get the next set of objects, thats good but when the search reaches 1 million of results I'll need to get page over page to now how much tweets I got and having to do to much request to the api...
I sought some question over the dev page on twitter suggested using the stream api, I've tried using (statuses/filter) but that don't work very well given a URL as track param(which they said that is the keyword to track).
So, anyone who's been using the old urls/count has found a reliable alternative with the new apiv1.1, especiffically to get the tweets and re-tweets for a given url ?
The official suggestion by Twitter staff is that either the search/tweets endpoint (having just the last 7 days data) or the Streaming API be used (handling yourself the counters, making everything just too complicated for a d*mn counter).
As an extra warning, the old endpoint (http://urls.api.twitter.com/1/urls/count.json?url=YOUR_URL) will stop working on November 20th, and according to this blog post from Twitter there are no plans to replace it with anything in the short term and they are even removing the count from their own buttons.

Can I use twitter api to search tweets one week before?

I am trying to search keywords in twitter through tweepy.
However, I found it seems like that I can not search the tweets one week before, the code below is the main search code.
for searched_tweets in tweepy.Cursor( API.search,
q = "python",
since = 2014-02-03,
until = 2014-02-04,
lang='en' ).items()
I am not sure whether there is any limited or any better way to search by time, thanks for your help!!
:)
Unfortunately, you can't get tweets older than a week with the twitter search API.
Also note that the search results at twitter.com may return historical results while the Search API usually only serves tweets from the past week. - Twitter documentation.
You can get specific tweets older than a week by looking at individual users or using specific post ids (if you have them) but it's not reasonable to index every single tweet ever to be searchable using the API.
If you need a large time range, you can collect them yourself using the streaming API or check out a service that does (see this dev twitter thread for examples).

How can I obtain a sampling of all tweets from a specific time period?

I want to gather samplings of all tweets from the past year. Being able to request tweets from a specific date would be great, but I'll take what I can get.
I do not want to find tweets by a specific user or containing a specific term, just a sampling of all tweets. The Twitter search API claims that a query term is optional, but if I try an empty query like
http://search.twitter.com/search.atom
as opposed to giving a search term,
http://search.twitter.com/search.atom?q=twitter
the response is
<hash>
<error>
You must enter a query.
</error>
</hash>
If the API really doesn't provide any functionality for this type of query, how can I hack around it? Are tweet ids roughly sequential by date and can I somehow use this info to grab bunches of tweets centered around an id of a tweet whose date I know?
You are referencing the obsolete documentation. If you read the current version you will find that a query is required.
You should also know that the Search API only provides results going back about two weeks. You might be able to find historical data from sites like infochimps.
Not useful for historical data, but in case someone stumbles across this question looking for a sampling of all current tweets, you want the streaming API. (This is my first foray into Twitter and I hadn't noticed it. I only saw the public timeline method in the normal API.)

Resources