Good way to get streaming Twitter data using Apache Storm? - twitter

I'm building a Twitter crawler system. Requirement is to crawl both Twitter Profile and Twitter Streaming. There's a project manager which puts all projects (1 project is a Twitter profile, or a keyword for Twitter Streaming) into Kafka. Then Storm will read from Kafka to get project metadata and start to run. The project manager will check all projects periodically and eventually restart the project (by putting data into Kafka), so every project has the latest data. I have a couple of questions:
Since we need to keep a connection to Twitter Streaming, we cannot let a Bolt run for a very long time for the Twitter streaming project. Can you suggest a good way to do this, like implement a separate process for crawling?
Another question is about tokens. We want each access token to run on one server only in order to improve stability and prevent reaching the rate limit too soon. When a project (tuple) starts to be processed in Storm, it would be assigned an access token of its supervisor IP. Is there any good solution for this? Someone recommends me to use Zookeeper to assign access token but I'm not sure if it's a good way and how to implement?

Related

how can we retrieve the publicly stored statements from Tin Can API?

what Tin Can API can do other than storing the state of the agent and how can we retrieve the publicly stored statements from Tin Can API
Thanks in advance
You can do a lot with the Tin Can API (Experience API). The point of the xAPI is to store user experiences, anything from I completed a course to I started watching a video. I've seen or worked on things as simple as using the xAPI to send SCORM tracking to an LRS, to support mobile, tracking sensor data from field exercises, to storing information collected in games and simulations. And the Experience API gives you the ability, like you said, to get data back out in a standard way, to support reporting and evaluation of data.
There are groups working with the Experience API to do interesting things. https://groups.google.com/a/adlnet.gov/forum/#!forum/xapi-design
There is also a spec working group forum where you can get more resources and answers: https://groups.google.com/a/adlnet.gov/forum/#!forum/xapi-spec
There are also resources and articles talking about what you can do with the Experience API. http://www.adlnet.gov/tla/experience-api/
and http://en.wikipedia.org/wiki/Tin_Can_API
There are some open source projects on ADL's GitHub page that also show how you can use the Experience API. https://github.com/adlnet
For sending and retrieving info from an LRS in web browsers there's a JavaScript library: https://github.com/adlnet/xAPIWrapper .. it's been built and minified..you can just include the xapiwrapper.min.js in your page and use the readme examples to get started.
For reporting and querying data you can look at the new project: https://github.com/adlnet/xAPI-Dashboard
There's a starting Java library to make talking to an LRS easier in Java, which could be used for regular Java apps or for Android apps: https://github.com/adlnet/jxapi
They're also starting a JQuery Mobile Plugin: https://github.com/adlnet/xapi-jqm
And even an example of using the Experience API with MedBiquitous and Common Core competencies to identify learner's progress toward becoming competent in some aspect: https://github.com/adlnet/xci
As for your question about getting statements from an LRS, you would just do a GET request to the statements endpoint. The spec currently says that requests must include the Experience API version header: https://github.com/adlnet/xAPI-Spec/blob/master/xAPI.md#62-api-versioning . And you will probably need to authenticate as a client using the LRS. This is generally done by registering on the LRS and getting some sort of credentials. This will vary based on the LRS you use, but they all have instructions on how to use and send the credentials. https://github.com/adlnet/xAPI-Spec/blob/master/xAPI.md#64-security
ADL's hosted example LRS opened up the GET statements endpoint so that people new to the Experience API could hit it and see statements without needing to figure out the request rules: https://lrs.adlnet.gov/xapi/statements

How to create a server accessible by an iphone app

I a thinking of creating an iPhone/iOS app that would include a feature where one user could create a list of words and then save them to their account on a server. Also (and this is very important), the user could share their list with other users by giving them permission.
So my question is, how can I go about creating such a server? For right now, I have a home computer (running Windows XP that just stores data for my music system) which I can use to host the server. I am also open to the use of other online storage services like Google Drive or Dropbox (I can't remember if Amazon does anything like that). However (and I know this may complicate things a bit), but at least for now, I want/need to stick with free services/options.
Just to recap, the key features that I am looking for are:
create users/accounts (on the server)
eventually I may [try] to incorporate the use of other services to log users in like with their email account, OpenId, etc.
the ability to access (log in to) the server (with credentials) from my app
the ability to send/receive data between the server and my app
the ability to share data between users
I know this is a lot to ask for, but if anyone has any suggestions or can get me going in the right direction, it would be much appreciated.
The basic setup would be as follows:
Backend: Database (MySQL), Web server (Apache), with server side scripting (PHP).
Client: iOS device with developed app.
Communication: use HTTP client/server model, communicating with something like JSON.
This is much the same setup as a web server, but instead of serving html/css/javascript etc the results will be JSON.
As far as implementing specifics such as login in, and sharing data between users, this is purely dependent on your implementation. This is not trivial, and not something that can be easily stated in a single post.
Hope this helps.
You could build your own webservice in PHP, Ruby or Python. If you do so I would recommend building a RESTful webservice (http://en.wikipedia.org/wiki/Representational_state_transfer) and then use RestKit (http://restkit.org/) to handle the data in the iOS app. Especially RestKit's CoreData integration is nice in my opinion.
Another solution would be using a service like Parse (https://parse.com/products/data). The first million or so requests per month are free but after that it could get pricy. I personally have not tried it so I couldn't tell you if it is any good.

How can I retrieve updated records in real-time? (push notifications?)

I'm trying to create a ruby on rails ecommerce application, where potential customers will be able to place an order and the store owner will be able to receive the order in real-time.
The finalized order will be recorded into the database (at the moment SQLite), and the storeowner will have a browser window open, where the new orders will appear just after the order is finalized.
(Application info: I'm using the HOBO rails framework, and planning to host the app in Heroku)
I'm now considering the best technology to implement this, as the application is expected to have a lot of users sending in a lot of orders:
1) Each browser window refreshes the page every X minutes, polling the server continuously for new records (new orders). Of course, this puts a heavy load on the server.
2) As above, but poll the server with some kind of AJAX framework.
3) Use some kind of server push technology, like 'comet' asynchronous messaging. Found Juggernaut, only problem is that it is using Flash and custom ports, and this could be a problem as my app should be accessible behind corporate firewalls and NAT.
4) I'm also checking node.js framework, seems to be efficient for this kind of asynchronous messaging, though it is not supported in Heroku.
Which is the most efficient way to implement this kind of functionality? Is there perhaps another method that I have not thought of?
Thank you for your time and help!
Node.js would probably be a nice fit - it's fast, loves realtime and has great comet support. Only downside is that you are introducing another technology into your solution. It's pretty fun to program in tho and a lot of the libraries have been inspired by rails and sinatra.
I know heroku has been running a node.js beta for a while and people were using it as part of the recent nodeknockout competition. See this blog post. If that's not an option, you could definitely host it elsewhere. If you host it at heroku, you might be able to proxy requests. Otherwise, you could happily run it off a sub domain so you can share cookies.
Also checkout socket.io. It does a great job of choosing the best way to do comet based on the browser's capabilities.
To share data between node and rails, you could share cookies and then store the session data in your database where both applications can get to it. A more involved architecture might involve using Redis to publish messages between them. Or you might be able to get away with passing everything you need in the http requests.
In HTTP, requests can only come from the client. Thus the best options are what you already mentioned (polling and HTTP streaming).
Polling is the easier to implement option; it will use quite a bit of bandwidth though. That's why you should keep the requests and responses as small as possible, so you should definitely use XHR (Ajax) for this.
Your other option is HTTP streaming (Comet); it will require more work on the set up, but you might find it worth the effort. You can give Realtime on Rails a shot. For more information and tips on how to reduce bandwidth usage, see:
http://ajaxpatterns.org/Periodic_Refresh
http://ajaxpatterns.org/HTTP_Streaming
Actually, if you have your storeowner run Chrome (other browsers will follow soon), you can use WebSockets (just for the storeowner's notification though), which allows you to have a constant connection open, and you can send data to the browser without the browser requesting anything.
There are a few websocket libraries for node.js, but i believe you can do it easily yourself using just a regular tcp connection.

How can I get twitter running on my local server?

I want to put the Twitter service on my server and customize it for my purpose. I have no idea how it works.
My goal is to communicate to your own Twitter server rather than the original twitter server and serve my purpose.
You should check out: StatusNet. It is an open source micro blogging platform. From their site, you can download the source and deploy it on your own server. Once you have it installed you can customize it to your liking.
Twitter isn't an Open Source project - they don't provide their server code.
From my experience at another company deploying very widely distributed systems, the chances are there's a bucket-load of infrastructure you'd need to get running first - complete overkill for a single-server solution, but vital for a global service with many millions of users. In other words, even if Twitter did provide their code, it probably wouldn't be an appropriate solution for your situation.
The actual Twitter (twitter.com) service is proprietary, you can't run it yourself.
There are plenty of open source twitter clones out there. The more general name is "microblogging". Pinax for example has basic microblogging. Try searching google for 'open source microblogging' for other projects.
I don't believe the Twitter platform is freely available to the general public. If you want to make your own "Twitter server", you're going to have to clone the service yourself.
You can't run Twitter on your own server, but you can write your own application that talks to Twitter through Twitter's API.
It all depends on what you mean by "customizing" Twitter. There are many applications like Twitpic and TweetDeck that are built "on top of" Twitter. They add their own functionality while leaving Twitter to do the "heavy lifting".
For example, I have written a personal project for moderating a stream of tweets. This application runs on my local server, but it gets its data by querying Twitter's API.
There are two main advantages to extending rather than rebuilding Twitter:
It takes a lot less effort because you can reuse all the basic functions of Twitter
You can take advantage of Twitter's huge user base. Even if you succeeded in cloning Twitter, it would be far less interesting than the original because Twitter works by strength of numbers.
You could use Wordpress and get the twitter developer add in then get a api code from them and there users can use your site and vice versa also apps for twitter will work for your site.
Wow. That's a highly ambitious request that you have there. Twitter isn't like Wordpress, there's no .org version that can be downloaded and run locally. Twitter is a highly scalable service that is designed to run on large scale servers.
Sorry to be the bearer of bad news to you on this.

How to provide your app with a network API

I am going to write a Ruby application that implements a video conversion workflow consisting of multiple audio and video encoding/processing steps.
The application interface has two core features:
queueing new videos
monitoring the progress for each video
The user can access these features using a website written in Ruby on Rails.
The challenge is this: I want make the workflow app a self-sufficient application, not dependent on the existence of the web view.
To enable this separation I think that adding a network API to the workflow application is a good solution because this allows the workflow app to reside on a different server than the web server.
My question is: Which solution do you suggest for such a network API?
A few options are:
implement a simple TCP server and invent my own string based API
use some sort of REST api (I don't know if this is appropriate for this situation)
some sort of web-services solution (SOAP, XML-RPC)
another existing framework
Feel free to share your thoughts on this.
I would suggest two things:
First, use REST as your API. This allows you to write one core application with both a user interface and an API for outside applications to use.
Second, take a look at PandaStream. It's a Merb application that encodes videos from multiple formats into flash. It has a REST API, and there's even a Rails plugin so you can integrate it with your application. It might be a good example codebase, or even a replacement for the one you're trying to build.
Hope my answer helped,
Mike

Resources