Filter #tag tweets for a specific account using Twitter Streaming API - twitter

I am able to get tweets from a specific account using the streaming API. I can also manage to get tweets for specific #tags like below:
endpoint.trackTerms(Lists.newArrayList("twitterapi", "#myTwitter"));
and
endpoint.trackTerms(Lists.newArrayList("twitterapi", "#yolo"));
I wonder how to merge these two queries as I want to get specific tweets (#yolo) from a specific user (#myTwitter)
Code can be found here
https://github.com/twitter/hbc

Take a look to Twitter's documentation on the streaming API, how to track terms:
A comma-separated list of phrases which will be used to determine what
Tweets will be delivered on the stream. A phrase may be one or more
terms separated by spaces, and a phrase will match if all of the terms
in the phrase are present in the Tweet, regardless of order and
ignoring case. By this model, you can think of commas as logical ORs,
while spaces are equivalent to logical ANDs (e.g. ‘the twitter’ is the
AND twitter, and ‘the,twitter’ is the OR twitter).
twitter-hbc only allows to track terms separated by commas, so if you do this,
endpoint.trackTerms(Lists.newArrayList("#myTwitter", "#yolo"));
You will actually be doing #myTwitter OR #yolo, take a look to the implementation of the method trackTerms,
/**
* #param terms a list of Strings to track. These strings should NOT be url-encoded.
*/
public StatusesFilterEndpoint trackTerms(List<String> terms) {
addPostParameter(Constants.TRACK_PARAM, Joiner.on(',').join(terms));
return this;
}
Instead of using trackTerms, you could add the terms directly to the endpoint like this,
endpoint.addPostParameter(Constants.TRACK_PARAM, Joiner.on(' ').join(Lists.newArrayList("twitterapi", "#yolo")));
Or of course you could create a new method.
Hope it helps.

Related

exclude term in YouTube Data API without including term

I'm using the YouTube Data API's search.list method to return a list of videos by date. I'm interested in filtering out certain content without having to specify a search term. The documentation specifies that you can use the - operator as a Boolean NOT, but this only seems to work if I precede that with a search term, meaning I can do this:
q:'food -pizza'
which will return results for the query term 'food' but not 'pizza'. Now say I want it to return any result excluding pizza you'd think this would work:
q:'-pizza'
but this returns an empty Array (no results). Am I doing this wrong? is there a way to exclude certain terms without having to specify a specific search term to include before hand?

Insights for Twitter service in IBM Bluemix: Search case-sensitive string and other questions

I am trying to search for a case-sensitive string in Twitter. I know the standard query is case-insensitive. How can I do case-sensitive search?
Also, the search arguments are "AND". Is there a way that I can write arguments and treat them as "OR"? To say it more clear, for illustration, I would like to search for tweets with either one of the following arguments: bio_location:"Philippines" or country_code:PH. I don't want to use "AND" because I am aware that there are users without bio_location and also some users only have country_code populated. So I want to get those who will satisfy any one of these arguments.
Another question, is there a way that I can filter out retweets?
Thank you!
2) "OR" is supported. See Query language for usage of the OR operator. Searching for "bio_location:"Philippines" OR country_code:PH" should work.
In answer to your questions....
1) You can use an 'exact phrase match' for case-sensitivity - see API docs for more on this. Not sure what your use case is, so this may or may not work for you.
2) Currently the 'OR' conditional is not supported by the API, or at least it is not documented in the API. I will get in touch with our developers to see if it is possible but not documented. In the mean time, you can do an two API calls and process the data server side (such as removing duplicates).
3) Filtering out retweets is possible. See API docs here under
get /v1/messages/search. Look specifically under the message.bodywhere it states: For Retweets, Twitter modifies the value of the body at the root level. Your application should look at the object.body to ensure that it is extracting the non-modified text.
Hope this helps!

Parsing Wikipedia countries, regions, cities

Is it possible to get a list of all Wikipedia countries, regions and cities with relations between them? I couldn't find any API appropriate for this task.
What is be the easiest way to parse all the information I need?
PS: I know, that there are another datasources I can get this information from. But I am interested in Wikipedia...
[2020 update] this is now best done using the Wikidata Query Service, you can run super specific queries with a bit of SPARQL, example: Find all countries and their label. See Wikidata Query Help
It might be a bit tedious to get the whole graph but you can get most of the data from the experimental/non-official Wikidata Query API.
I suggest the following workflow:
Go to an instance of the kind of entities you want to work with, say Estonia (Q191) and look for its instance of (P31) properties, you will find: country, sovereign state, member of the UN, member of the EU, etc.
Use the Wikidata Query API claim command to output every entity
that as the chosen P31 property. Lets try with country (Q6256):
http://wdq.wmflabs.org/api?q=claim[31:6256]
It outputs an array of numeric ids: that's your countries! (notice that the result is still incomplete as there are only 141 items found: either countries are missing from Wikidata, or, as suggested by Nemo in comments, some countries are to be found in country (Q6256) subclasses(P279))
You may want more than ids though, so you can ask Wikidata Official API for entities data:
https://www.wikidata.org/w/api.php?action=wbgetentities&ids=Q16&format=json&props=labels|claims&languages=en|fr
(here Canada(Q16) data, in json, with only claims and labels data, in English and French. Look at the documentation to adapt parameters to your needs)
You can query multiple entities at a time, with a limit of 50, as follow:
https://www.wikidata.org/w/api.php?action=wbgetentities&ids=Q16|Q17|Q20|Q27|Q28|Q29|Q30|Q31|Q32|Q33|Q34|Q35|Q36|Q37|Q38|Q39|Q40|Q41|Q43|Q45|Q77|Q79|Q96|Q114&format=json&props=labels|claims&languages=en|fr
From every countries data, you could look for entities registered as administrative subdivisions (P150) and repeat on those new entities.
Aternatively, you can get all the tree of administrative subdivisions with the tree command. For instance, for France(Q142) that would be http://wdq.wmflabs.org/api?q=tree[142][150] Tadaaa, 36994 items! But that's way harder to refine given the different kinds of subdivision you can encounter from a country to another. And avoid doing this kind of query from a browser, it might crash.
You now just have to find cities by countries by refining this last query with the claim command, and the appropriate sub-class(P279) of municipality(Q15284) entity (all available here): for France, that's commune (Q484170), so your request looks like
http://wdq.wmflabs.org/api?q=tree[142][150] AND claim[31:484170]
then repeat for all the countries: have fun!
You should go with Wikidata and/or dbpedia.
Personally I'd start with Wikidata as it's directly using MediaWiki, with the same API so you can use similar code. I would use pywikibot to get started. Like that you can still request pages from Wikipedia where that makes sense (e.g. list pages or categories).
Here's a nice overview of ways to access Wikidata

Does Youtube API have "AND and OR" search and explicit match search?

Does Youtube API allow me to do searches like
Search videos which have (in their title) strings both Lady Gaga AND (Cyrus OR Muse)
And does Youtube API allow me to do searches like
Search videos which have (in their title) string exactly Katy Perry. I don't want titles which have Katy Elizabeth Perry.
What's the most efficient code to write that type of search request? I want to code it using Ruby on rails.
I've gone through various introduction about how to search Youtube but they were mainly talking about other filtering things like relevance and view counts filtering.
And is supported with include and exclude just like the search query in the Web UI.
You can use -{query term} to exclude a query term. Or |{gaga} to OR.
like {lady -gaga} or in decoded form
https://www.googleapis.com/youtube/v3/search?part=snippet&q=lady+-gaga&key={YOUR_API_KEY}
You can also make separate calls, put results into sets and do all these operations in your client.

Twitter: How can I form a set of related hashtags?

Now that I know I can no longer communicate with Twitter mashups out there, how can I create a set of related hashtags? For instance, how can I get all tags similar or related to yankees?
You might be interested in the mathematical equations for clustering related things.
Another, naive option, would be to just look at what hashtags frequently (subjective, I know) appear with a known hashtag and work from there.
You can use a term extractor on set of tweets returned by the topic of your choice. Eg: Get the list of tweets for search query 'yankees' and apply term extractor on the set of tweets you have. You can find Term Extraction APIs from Yahoo! and AlchemyAPI.
The result would set of important terms used in the tweets and you can use them with a hash to search for more related information.

Resources