Track multiple search terms with twitter streaming - twitter

I would like to build a web application that tracks some user defined search terms in real-time and provides a real-time visualization. http://www.monitter.com/ is an app I've found that is similar in its requirements. What is the appropriate API to use for it? Initially I thought the streaming API was the obvious choice, but the limitation of one concurrent connection means that I can only track one search term at a time(with one user account). I could get around this by making multiple user accounts, but that seems like the wrong approach.
I looked at user streams but the language for that API seems to be more geared towards desktop applications.
So, what is the most best API for my use case? Thanks.

Actually you can track up to 400 keywords/terms via one streaming API connection.
https://dev.twitter.com/docs/streaming-api/methods#track
Depending on language you are using there are multiple interfaces you can use.
If you are using PHP, then I can suggest Phirehose as it works quite well and has multiple examples for different usages scenarios included.
http://code.google.com/p/phirehose/wiki/Introduction
Whats not there - when processing received tweets you will need to figure out how to match which tweet corresponds to which keyword/term because twitter streaming API gives all matching tweets in one stream.

Investigating further using Firebug, I found that monitter.com simply polls the REST search api every second or so on the client side. This is what I ended up doing as well.

Related

How can I get share count using the new Twitter Search APIs?

I wonder if it's still possible to use the Search API on client side using an AJAX call, C# would be fine too, but so far I'm finding a lot of talk about this but no code sample or any good instructions how this can be done,
Some sites like these are doing it but obviously this got way more complicated to do than before
I don't see any practice examples on https://dev.twitter.com/rest/public/search
Any suggestions?
As you've discovered, Twitter has turned off the sharecount API - see https://twittercommunity.com/t/a-new-design-for-tweet-and-follow-buttons/52791/5 and https://blog.twitter.com/developer/en_us/a/2015/hard-decisions-for-a-sustainable-platform.html
There are some alternatives like https://opensharecount.com/
That works by continually searching the Twitter API and counting up all the shares that it sees.

What is a good strategy for staying up-to-date with external API's?

My project is reliant on several API's, like Twitter and Youtube for example. Recently, Youtube deprecated their old API, and it caused issues with my team's iPad app.
We could have stayed ahead of the change if we were paying attention to Youtube's announcements of the upcoming deprecation. But alas, we were not and the idea of staying up to date with all of our dependencies manually(browsing the web) seems exhausting and inefficient.
I have found the following tool to help notify when changes occur with external library dependencies, https://libraries.io. However, this does not help with API dependencies.
Besides checking the API source webpages every so often, I was wondering if anyone had suggestions on how to stay notified and up-to-date with news regarding updates to a specified list of external API's?
After some time looking at different options, I have found a solution that is not perfect, but seems to work best at fitting this need.
Solution Description
This solution uses a combination of Twitter, Google Scripts, and website blogtrottr.com. I am creating a twitter list of reliable dev handles that often post updates on new API. For example I made a list that contained #twitterapi and #YouTubeDev. Used Google Scripts to create an online feed out of the twitter list. Then used blogtrottr to email me every time that feed gets a new posting.
Steps to Implement
Create a twitter list of reliable handles that often post about updates to their API
Create an RSS Feed from that Twitter list. The details for how to do this can be found here.
Plug that url that you get from Google Script into blogtrotter.
I did find some other ways to do this, but so far this is the only solution that was 100% free!

Get public users of a service (Tumblr, Twitter)

Assuming it's not available as part of API, how can one obtain a full or partial list of public users of a web service, e.g. Twitter, Tumblr, YouTube?
Acceptable alternative: get a random public user.
I was interested in this for testing APIs with a random account. This is useful to catch edge cases when developing an app for the API; For example when developing a Tumblr theme, seeing what volumes of text/images are posted, special character use, and so on.
Can you even imagine a full list of (public) users of largely used web services? That's a vast load of data. I hardly believe that any API would offer that for many reasons:
performance/load issues,
data/information privacy,
abusing possibilities,...
For regular usage of the service's API you simply don't need that. Otherwise it would stink with some gray/black techniques.
Anyway to answer you question objectively: In order to get full or partial list of users from web service it have to provide any kind of API which would allow you to do that. So good starting point is to look at documentation, for example Twitter API, Youtube API, etc...
By swift look I don't see any method that would offer that. It might change in the future but as mentioned above I strongly doubt about that.
Another option is to mine partial list of users via search APIs or traversing the site with a robot. Also obtaining such a list is an option. However I would check whether this is even legal and not against terms of use or something like that.

What's the best service to use for filtering out spam/abuse/malware links for a link shortening webapp?

I have two services - Lincr and LinkBunch. Lincr is a plain jane URL shortening service, while LinkBunch lets you shorten multiple links into one link. I've had too much spam posted into the services, so I had to shut down Lincr. Now, even LinkBunch seems to be facing the same problem, and it's been disabled by my web host for that reason.
I can't keep shutting down sites like this because of bad links being posted, so I need a malware-filtering API that I can use to filter out the links as and when they are posted.
There are services that let me download an entire bunch of bad links to check against, but instead, I'd prefer doing a live API call on a per-link basis. What can I use for that?
Finally, what's the best malware filtering service out there?
Lincr is down. On LinkBunch, where is your Captcha?
On either site, do you limit the number of posts by IP? Do you use a delay in your response? What about using hidden fields to reduce spam (http://www.reviewmylife.co.uk/blog/2008/05/30/hidden-field-spam-trap-for-phpformmail/)?
I know I'm dodging the question a bit, but you should at least take basic anti-spam measures before resorting to API calls. Even APIs will still fail for newly-hacked / newly-spammed sites.

Search Engine without crawling?

Is there a way to collect web content in order to use it in a search engine without passing by the web crawling phase? Any alternative to web crawling?
Thanks
No, to collect the content you have to...collect the content. :-)
Yes (and sort-of no).
:)
You can download existing data dumps from various websites (wikipedia, stackoverflow, etc.) and construct a partial index that way. It obviously won't be a complete index of the internet.
You could also use meta-search to construct your search engine. This is where you use the APIs of other search engines and use THEIR search results as the basis of your index. Examples include citosearch and opensearch. duckduckgo uses yahoo's boss api (and now yahoo uses bing...) as part of their search engine.
There are also real-time streaming APIs that you could use instead of crawling the web. Look at datasift as an example. There are lots more resources you could cleverly use and avoid/minimize crawling.
If you want to be updated with the latest content on pages, then you can use something like pubsubhubbub protocol to get push notifications for subscribed links.
Or use paid services like superfeedr that make use of the same protocol.
directly or indirectly you have to crawl the web in order to get the content.
Well if you don't want to crawl, you can follow a wiki-like approach, where users can submit links to sites (with title, description and tags). So a collaborative link collection can be built.
To avoid spam a +/- system can be involved, to vote useful sites or tags up and useless ones down.
To avoid spammers mass voting SERPs you can weight votes by user reputation.
User reputation can be gained by submitting useful sites. Or somehow tracing usage patterns.
And considering other abuse patterns too.
Well, you got the point, I think.
As spammers gradually discover weaknesses of traditional search engines (see Google bomb, content scraper sites, etc.), a community based approach may work. But it would suffer severely from the cold start effect, and when community is small the system is easy to abuse and poison...
At least Wikipedia and Stack Exchange is not spammed to useless levels so far...
PS: http://xkcd.com/810/

Resources