How can I use the YouTube SUP API to retrieve recent uploads of some predefined users? - ruby-on-rails

I wish to be able to check for the latest videos (in near realtime or at most a couple of minutes out) for a set of users (up to 200 or so) in a single call to the YouTube API and then store the IDs of uploaded videos in my own database. The only solution I believe there is for this is the YouTube SUP API but I'm not entirely clear on how it works and was wondering if someone could please explain it. I have read the entire API documentation on it but am still not completely clear.
I was assuming that one can call the SUP URL (http://gdata.youtube.com/sup) and check if the user hash has had any activity recently and if they have, then do something with that. My issue is I don't understand how you interpret the activity from ["b305e88","afd4"] in the SUP feed and is there any way to specify a subset of users or must you search through the entire feed? It seems to take a fair few seconds to load the SUP feed.
On the SUP API page it also states that you can visit a URL such as https://gdata.youtube.com/feeds/api/users/bbc/events?v=2 to obtain the hash key for a user's feed, but as you can see if you try to visit it, the link appears to be broken. How else could I obtain the hash?
I'm currently wanting to do this in a Rails project while using the youtube_it gem but I don't believe this has support for it. Correct me if I'm wrong.
Edit
My mistake. The developer key is required to obtain the events of a user such as https://gdata.youtube.com/feeds/api/users/bbc/events?v=2&key=YOUR_DEVELOPER_KEY
Still no progress with the SUP method although I'm potentially considering using a channel and just automatically subscribing to each user. Every minute I will then poll for the list of new videos by the users.

I'd suggest using PubSubHubbub: http://apiblog.youtube.com/2010/10/pubsubhubbub-for-youtube-activities.html
A handler in your web application will automatically receive a POST whenever one of the feeds you're watching is updated, and the content of the POST will be the updated feed itself, saving you the trouble of having to fetch it.
There isn't much documentation specific to using PuSH and the YouTube API beyond that blog post, but the general PuSH docs all apply: https://pubsubhubbub.appspot.com/
Failing that, SUP should still work, so we could try to debug that further if you'd rather use that.

Related

Beginner Question: How to access the number of impressions from *other users'* tweets?

I've got a bunch of free online HTML, CSS, and JS tutorials under my belt and I want to try using them to make a browser extension. But I want to make sure that the data I want to use is actually accessible before getting started.
My goal is to make a browser extension for twitter.com that shows the number of impressions of any tweet next to the likes, retweets, and replies. My basic idea is to get the status URL of any given tweet, poll the Twitter API for the number of impressions of that tweet, store that in a variable, and then use CSS to display a little eye icon and the number stored in the impressions variable.
I know that I can find the number of impressions of all of my tweets, both through Twitter Analytics, and also just going to my profile page and clicking the little bar chart icon next to views, retweets, etc. But I'm not clear on whether I can do that for other people's tweets via Twitter's API or anything else. Can you?
For the record, I'm not too concerned about the varying definition of "impression," since it will be consistently applied across all tweets and I'm mostly interested in giving users a comparison between tweets. This is part of a research project to see how this might change how people engage with social media if they know how many views a given post has. If there's a simpler way to go about that using existing platforms, I'm open to suggestions.
Thanks for the advice!
No, impressions data is private. If you are authenticated to the Twitter API then you can use the new Twitter Developer Labs Tweets API to get private metrics like impressions, but you cannot get that for other people's Tweets. Also, the Twitter API does not support CORS, so I don't think you'll be successful trying to use it from a browser extension.

Why does iTunes Store Reviews RSS feed sometimes return no results?

I'm trying to import reviews for certain apps on the iTunes App Store via the public reviews RSS feed. Most of the time the feed returns a list of 50 reviews per page, and gives me links for up to 10 pages. But in the case of some apps, some or all of those pages have 0 reviews, and I can't tell why.
At the time of this writing, the feed for Instagram (link below) returns no reviews, despite reporting that there's 10 pages of reviews available.
https://itunes.apple.com/us/rss/customerreviews/page=1/id=389801252/sortBy=mostrecent/xml
Even more confusing, I noticed last night that page 2 had 50 reviews but none of the other pages had any. This morning, page 2 is empty again.
If I remove the sortBy=mostrecent portion of the URL above, I actually do get 50 results back, but none of the other pages have any results.
Finally, it appears as if the JSON version of this page (link below) actually returns results better than the XML version. Unfortunately, the JSON version leaves off the date of the review in the data so I can't use it.
https://itunes.apple.com/us/rss/customerreviews/page=1/id=389801252/sortBy=mostrecent/json
Can anyone explain this? Is Apple's XML feed API just extremely unreliable? Am I forming a bad URL?
While this answer isn't very satisfying, it's the best I could work out after many trials. It appears as if the XML feed is really fallible and shouldn't be used for real-world usage. Furthermore, when using the public JSON feed, certain fields such as review date are missing. Neither feed reports developer response.
It's also clear that Apple doesn't use these feeds for iTunes (desktop) or App Store (iOS). I ultimately reverse-engineered the way iTunes requests review data and figured out that making a request the same way, making sure to match their User Agent and version, would return the data I needed. These requests seem to be rate-limited to a certain extent and the data comes as a mix of HTML and JSON that requires a lot of parsing. Furthermore, because they're private calls, Apple could easily shut the door at any moment.

How do I get the retweet_count of a given tweet WITHOUT user interaction?

I have a list of tweets that are using a hashtag I made. I'm getting these tweets using the search api. All I want is to get the number of retweets. I DO NOT need to post on their behalf. It seams ridiculous that I would need to have every single user login to my site, login to twitter and approve my application via OAUTH for EVERY TWEET IN MY LIST. There's gotta be a way to get that number without the need for oauth.
I tried getting it directly from the search api, but that's not consistently there. I've tried https://api.twitter.com/1/statuses/show/275729088709283840.json but that doesn't work, for some reason. Is there anyway to do this extremely simple task without going down the asinine road of user-interaction?
You have to create a background-process that uses the stream API. Phirehose is a php library that is set up to do this: https://github.com/fennb/phirehose

Twitter Profile Image API deprecated

I am using Twitter API version 1 and replacing them to v1.1.
Then I cannot find users/profile_image/:screen_name API on version 1.1.
https://api.twitter.com/1/users/profile_image/shonanshachu
Does anyone know which API can be the best practice for replacing users/profile_image?
What I want is list of profile images or simple url with parameter of Twitter ID or screen name.
Maybe this will be useful for you. Those calls return all the info you need, including the image link. If you need only one user's image for a given user_id or user_screen, then you can read more from here:
https://dev.twitter.com/docs/user-profile-images-and-banners
https://api.twitter.com/1.1/followers/list.json (you can replace followers with friends)
I haven't found a great solution to this yet. The closest thing I have found is the twitter API 1.1 docs for users/show, but that is an authenticated call (requiring a user context) which is rate-limited (180 requests per 15 minutes). They mention this about getting the profile image url on the page about uploading a new photo:
You can either update your local cache the next time you request the
user's information, or, at least 5 seconds after uploading the image,
ask for the updated URL using GET users/show.
I imagine once the 1.0 API is shut off in March 2013, this question will get a lot of votes. :)
There are some hints that twitter is adding methods to their API to aid in retrieving images. We'll have to wait and see.

Ruby Rss parser and event trigger

I'm using RSS library so i can parse Atom and RSS in Ruby and Rails and store it in a model.
I've looked at the standard RSS library, but is there one library that will auto-detect that there is a new rss feed so i can update my database ?
what are the best practice to trigger an instruction in order to store the new rss feed ?
should i use threads to handle that problem ?is it going to be slow?
thank you for your help
OK heres the deal.
If you want a real fast feed parser go for Feedzirra. Does not work on windows. http://github.com/pauldix/feedzirra
Autodiscovery?
-Theres truffle-hog if you don't want to do GET redirects. http://github.com/pauldix/truffle-hog
-Theres feedbag if you want to do GET redirects to find feeds from given urls. This is slower though. http://github.com/damog/feedbag
Feedzirra is the best bet if you want to poll for new entries for your feed. But if you want a more non-polling solution to your problem then i would suggest going through the pubsubhubbub spec. Make sure while parsing your feeds they are pubsubhubbub enabled. Check for the link tag. If it points to pubsubhubbub.appspot.com or any other pubsub enabled hub then just subscribe to the feed by sending a subscription request to the hub. You can then define a endpoint in your app which will in turn receive updated entry pings for your feed subscription from the hub. Just read the raw POST data and store it in your database. Stats are that 95% of the blogger blogs are pubsub enabled. That is a lot of data in your hands already. :)
If you are polling for changes then you should check the last-modified or etag from the header rather than parse the entire feed again. Saves you from wasting resources. Feedzirra takes care of this for you.
I am not sure what you mean by "auto-detect" a new feed?
Are you looking for code that can discover when someone creates a new feed on a site? Or, do you mean discover when an existing feed has a new article?
The first is tough because your code needs to know what site to look at so it needs some sort of auto-discovery of sites with new feeds. Searching the google for "new rss feeds" doesn't return anything that looks useful, at least not on the first page. If you, or your users, know of a new site then you can have an interface to add new sites to search. Then you grab the page at that URL, look for the RSS/Atom auto-discovery links, and go from there. Auto-discovery links can open a can of worms because of duplicate content being served using different protocols (RDF, RSS and Atom), so you have to determine which to use, or multiple feeds with alternate content listed.
If you mean you want to discover when an existing feed has new articles, then you have to keep track of the last time your code looked at the feed, and the last article that was seen, then retrieve the feed and see if any articles were not in your list of previously seen articles. Your code needs to be sensitive to the time-to-live information in a lot of feeds too. Hitting the feed every fifteen minutes when they update once a week is bad form. Most aggregation code can do those things already but you might need to configure a database and tell the code how to find it.
Generally, for this sort of task I set up a crontab entry on a production Linux or Unix system and fire off the job periodically, looking in the database for feeds whose last-run-time plus the stored time-to-live value is in the past.
Does that help any?
Very easy solution is to use Dynamic attribute-based finders
When you are filling your model with RSS feed data, instead of Model.create(...) use Model.find_or_create_by_column(value, :other_column => other_value).
You can specify a date as unique value or RSS message title ... (whatever you want)
I think this is pretty easy. You can make some cron task to fill your model once per hour for example. Only new feeds will be added.
There is no chance to get some "event" when RSS is updated without downloading whole RSS feed again.

Resources