I would like to retrieve the GPS longitude and latitude coordinates of Twitter users from the posts. I needed high granular geolocations, so want to collect tweets whose location
is automatically recorded by Twitter trough the GPS and not self-reported by the user.
Before, Twitter provides this access through the tweepy.Stream class, so for example:
import tweepy
LOCATION = [-124.7771694, 24.520833, -66.947028, 49.384472]
class MyStreamListener(tweepy.Stream):
# class methods
stream = MyStreamListener(apikey,apikeysecret,accesstoken,accesstokensecret)
stream.filter(locations=LOCATION)
However, as stated in the tweepy documentation, new Twitter apps cannot use Stream class beyond April 29, 2022.
New Twitter Developer Apps created on or after April 29, 2022 will not be able to gain access to v1.1 statuses/sample and v1.1 statuses/filter, the Twitter API v1.1 endpoints that Stream uses. Twitter API v2 can be used instead with StreamingClient.
Unfortunately, the StreamingClient does not provide the locations parameter in its filter() or any other method of the class.
Does that mean Twitter stops providing this metadata to researchers?
First of all, addressing this comment from your original question...
so want to collect tweets whose location is automatically recorded by
Twitter t[h]rough the GPS and not self-reported by the user.
Twitter does not automatically record user location at all. It is entirely down to a user to choose to add location data to a Tweet. A relatively small proportion of Tweets carry location information, and far fewer of those carry specific GPS information since the option to add information at that level was removed from the Twitter app several years ago.
To get into the details...
Streaming works differently in the modern version of the Twitter API.
In v1.1 you would provide "track" and "filter" options to a single API endpoint and then listen as matches came in.
In v2 of the API, you have a connection (via StreamingClient if you are using Tweepy) and you create rules via a separate endpoint (the StreamRule in Tweepy). These can contain a number of operators.
value (str | None) – The rule text. If you are using a Standard
Project at the Basic access level, you can use the basic set of
operators, can submit up to 25 concurrent rules, and can submit rules
up to 512 characters long. If you are using an Academic Research
Project at the Basic access level, you can use all available
operators, can submit up to 1,000 concurrent rules, and can submit
rules up to 1,024 characters long.
You'll need to refer to the Twitter documentation on creating search queries for rules. At a high level, you can use the has:geo operator to find Tweets with geo information, and potentially the place: operator to narrow down to areas. That's how you can use the current API to filter based on location. Note that available operators may vary based on your Twitter API access level.
Related
Does Twitter still allow searching tweets by location (for tweets that have been geo-tagged)? The articles I've read describe an advanced search that no longer works the same way, so I'm thinking Twitter discontinued this capability sometime within the past year or two.
I want to filter tweets by geo location programmatically using their API. I can't find documentation on how to do this anywhere. There is one search query operator called point_radius that is exactly what I want, but it appears only accessible to "Academic Research only" projects while I've only been granted "Standard product track".
TL;DR: I want to be able to retrieve the N most popular tweets for any arbitrary country within the last X hours (up to 24 hours)
More detail
I want to show the details of the most popular tweets by geographic region (country) over the past few hours (adjustable up to 24 hours). How can I use the Twitter REST API to achieve this (v1.1 or v2)?
There are endpoints for querying tweets and filtering by popularity, but they require a search string (e.g. "NASA") and return the most popular tweets matching that search string. I am not interested in the contents of the tweets, I just want to know what is most popular.
I plan on using this functionality to show a world map (using Leaflet) to summarise the most popular tweets by country for the past day.
I am using Twit in NodeJS but not looking for answers specific to Node, rather how to leverage the capabilities of the API.
I am not aware of a way that this can be done directly through the API itself (V1 or V2). I also do not think that this is going to be a trivial task at all.
What I would suggest is using the search endpoint...
V1: Reference
V2: Reference Note that to use geolocation search parameters (see below) you'll need academic access.
... in conjunction with one of the geolocation search parameters. For example, you could pull some subset of tweets from within a country (you will not be able to download all tweets within a single country on any given day, not to mention all countries). After you get this data, you'll need to do some of your own data processing based on how you want to define "popular" (e.g. retweets, likes, etc.) and then go from there.
As I said earlier, this seems like a very large project and not something that can be solved simply with the Twitter API.
There are Twitter platforms like http://www.socialbro.com/ or https://manageflitter.com/ that allow you search users using filters such as genre, country, language, bio, age, profile photo...
How can they find users with these conditions? I didn't find on Twitter API documentation any method for that.
For example, if you look for "marketing" on ManageFlitter it returns us 700k users with that word in their bio. How they are getting that amount? Twitter API only returns 1000 users using search/users method.
I found there are Google commands like [site:twitter.com bio:*keyword -inurl:status] which return every user with a keyword in their bio. Are these platforms using something like this?
There isn't a way to do that with the API, which is limited on every endpoint. The only way to access this much Twitter information is via a data provider. Twitter acquired Gnip, who was one of these companies and there are others. You can subscribe to their service and get the entire history of tweets plus other value-added benefit each of them offers. Unlike the API, you'll have to pay, but the trade-off is that you receive better features and service.
I am trying to extract data from Twitter. The data includes the tweets and people who retweeted a particular tweet. I have 46,000 tweets and I need to find retweeters for each of the tweet. Further, using Twitter call: retweet/id, you can pass only one id at a time, limiting 15 requests per 15 minutes.
Is there any way to surpass this limit and make unlimited calls?
Not through the REST API, no.
You may want to investigate Twitter's Streaming API to see if the functionality it provides meets your needs. Accessing it is a little more complex than the REST API, but it may be able to help you meet your needs.
You will find people who will tell you to do things like set up dummy accounts and dummy applications. Don't do this. Twitter actively monitors the API for use patterns like this and you will find your applications and IP addresses blacklisted.
I'm trying to do a write up of Twitter4J for part of a uni project, but I'm getting hung up on a few things. From the Twitter4J api:
void sample()
Starts listening on random sample of all public
statuses. The default access level provides a small proportion of the
Firehose. The "Gardenhose" access level provides a proportion more
suitable for data mining and research applications that desire a
larger proportion to be statistically significant sample.
This implies that by default, a "default access" is provided to the stream, but another type of access, "Gardenhose access" is available. Is this correct? And if so, how do you access the higher Gardenhose access?
I'm asking as I've seen some answers on SO suggest that there is only one level of access - the Gardenhose, and I'm trying to clear this up once and for all.
In addition to this, I would like a reference (if possible) to the number of tweets the sample stream allows access to. I've read lots of people cite 1% for "default access" and 10% for "gardenhose access" - but I can't find this anywhere in the API.
So to sum up, two questions:
Does the sample stream have a "default access" and a "gardenhose access", or just one of those?
How much of the Twitter firehose stream can these levels of access gain?
If replying, please have links to reference-able API where possible.
The gardenhose is different from the default sample stream, you would have had to request access from Twitter in order to use it.
However, I am not sure if Twitter still allows access to the gardenhose, or even if it still exists. It seems the current mechanism may be to use one of Twitter's preferred data partners:
Using the Streaming API?
Every Twitter account can connect to a small sampling of the Streaming API. Accounts that need increased access for data gathering or analytical reasons should check out our preferred partners page.
(source)
It may be different for students or educational instutions and that the gardenhose is still available to you. Previously you would have to either e-mail api-research#twitter.com or you could use the following form, but I have no idea if these methods work still - the post is quite old.
As for the percentage of Tweets that the default sample stream allows access to, the best reference I could find was a comment made by a Twitter employee on the developer forums - emphasis mine:
I would recommend just using the 1% sample stream from https://stream.twitter.com/1/statuses/sample.json that you can connect to with your Twitter account. It's unlikely that you'll be in a situation where you can access all of the data and will have to make do with a sample. At about 230 million tweets a day, you'd still be theoretically getting 2.3 million tweets a day.
(source)
Although, again this is an old post.
Regarding the firehose stream, as specified by the documentation you need to be granted permission to access it, I believe very few people have full access to this stream:
GET statuses/firehose
This endpoint requires special permission to access.
Returns all public statuses. Few applications require this level of access. Creative use of a combination of other resources and various access levels can satisfy nearly every application use case.
Overall documentation is scarce on the different access levels and what they offer, I suggest contacting Twitter directly to discuss your requirements or contacting one of their data partners.
Apologies if this wasn't as concrete as you would have liked, good luck with your research.