Twitter Streaming API - tracking exact multiple keywords in exact order - twitter

I'm just beginning to play with the Twitter Streaming API.
If I specify
$sc->setTrack(array('just bought from'));
This will correctly pull only tweets that have all 3 keywords - but doesn't maintain the order.
1) I want the keywords to appear in the same order like
"I just bought apple from itunes"
but the above also returns tweets like
"I bought some apples and just removed them from the bag"
2) Is there a way to specify the exact words say "NBA basketball" with nothing in between - in the sense I dont want tweets like this to be returned
Watching basketball on NBA tv
I just want tweets which contain the exact phrase to be returned like
I love watching NBA basketball
3) Also is there a way to specify negative keywords
Any tips if this is possible.
Thanks

Currently the answer to all three questions is no. The general recommendation is to do this post processing on your side. The negative keyword is something that's been asked for quite a bit, but currently we don't have a scheme that would let us support this in a scalable way

Related

How to walk throuh youtube?

I am curious to know what would be the most efficient way to walk the youtube website. My goal is to eventually index all videos on youtube (hypothetically) and the only way I can think of is to go channel by channel indexing all of the videos. I am not very familiar with the v3 APi, so if there is a better way to accomplish this, please let me know. This gives rise to a few problems I can think of:
Where to begin? Channels and videos are accessed using random string IDs, so if I simply start with IDs beginning with 'A' I am going to run into a lot of null values. Not sure how IDs are assigned, but this also may keep the indexing in a certain segment/section of video types if it is based on the ID alphanumerics.
I am hoping to move methodically through the youtube directory, trying to avoid accidently indexing the same channel/video.
Should I somehow seperate the videos into groups and request them based on other parameters? A grouped scheme may be easier to work with, update, etc.
I won't know if the video has anything I am interested in indexing before accessing it.
First you need to understand that there are way too many videos for you to do this without having access to the stack directly, which you do not have and will not get.
As to automate the selection of video's, you can try to use the video ID's.
They are 11 characters long, consisting of only "a-z A-Z _ and - " . So that would reduce (still is 54 to the power of 11) the indexing/scanning if a video exists. Then save that ID (with related info) and move on.
Not a perfect option, but best I can see with your options and requirements.

How to filter out unwanted/official Twitter posts

I am now doing an NLP project which needs some resources from twitter.
I want to get those tweets posted by "real people" instead of any kind of "official accounts", including celebrities, ads, institutions, media, etc. such as #CNN #TodayWeather #obama #DailySale #BestPrice #FashionTrend.
Hence, is there a better way to do so?
I have considered about it for a long time. By using twitter's API, the returned JSON includes a key called "verified". This can be used to detect weather an account is that kind of "official account". However, today, this blue "V" tick is not only for those shining celebrities. Anyone can apply for it as long as they are a real person. So, I think using this solution will rule out a lot of precious resources.
Moreover, I also considered using textual spam filter. yeah, of course, they are quite good in most cases. However, some accounts, such as #FT, their posts never sound like a spammy ad. But it is not what I want.
I want to ask for a better solution. It can be a long term solution, such as using NLP and NeuroNets to learn from labels. But, well, a prompt solution will be very welcomed.
THX

how to collect millions of tweets?

I was browsing through fflick, nicely made app on top of twitter. How do they
collect millions of tweets?
accurately (mostly) categorize tweets into postive or negative sentiments?
The collect millions of tweets probably by crawler twitter with their API. Probably searching with Streaming API for keywords related to films, or just searching their own timeline looking for what their followers have to say about films.
Don't know. Probably using some natural language processing techniques from good old AI textbooks. :-)
2) look for smileys - ;), :), :D, :(
A few places provide the latter vas a service now. Check out ViralHeat and Evri:
http://www.viralheat.com/home/features
http://www.readwriteweb.com/archives/sentiment_analysis_is_ramping_up_in_2009.php

Search results from data api full of pirated music videos?

I'm using the YouTube data API. It's worrisome that a lot of the content my users are searching for is likely pirated material - in other words, music videos that probably don't have the artist's permission to be on YouTube.
I see YouTube has a 'YouTube music' URL, which looks to be approved by artists participating. I'm not sure if these results are taking precedence over others, or if they're even allowed to show up in third party search results. For example, if I search for:
Bad Romance - Lady Gaga
will the 'official' music video appear in the results? How can we tell?
I'm thinking to shut off my service as a result of not being able to control this, I want to play fair,
Thanks
To make this a programming question, I'm going to assume you mean "can I programatically determine Google's opinion of their agreement with the copyright holder?", I think the answer is no. See for example: http://www.youtube.com/watch?v=kffacxfA7G4 - no music url or other indicator (that I can see), but I'd guess at 270m views it must have attracted sufficent attention for the copyright agreement status to be resolved.
So far as the rest of your question goes:
"How can I tell if the owner of the artist has an agreement with Google which covers a specific video" - this is something to be answered by a court, not a programming question. -
The ethical question raised by co-operation with Google as a potential infringer of copyright is not a programming question either.

Contest ranking question - how to rank entries in multiple categories?

I'm currently developing a video contest web application using Ruby on Rails. It integrates closely with YouTube, which it uses for submitting videos, comments, average rating, and popularity stats. The application will also count Twitter and (possibly) Facebook mentions, and count the number of times visitors have clicked an "Add This" social network button.
Instead of direct voting it will use each video's YouTube rating and social media presence to pick a winner.
My question is: What is the fairest method for ranking the entries?
My basic idea is to just find each video's ranking in each category separately by sorting the results of an ActiveRecord query, then compute the average of all these numbers and use it as the video's master rank. Then I'd sort all the entries by this rank, with the lowest number coming in first, etc. Is this a fair way to rank the contest entries?
Shouldn't the contest organizer be telling you how they want everything rated?
I would personally count the number of youtube submissions, add the score for each, then divide by the number to get their average score, and then suppliment that somehow by social media mentions, but it is up to them to tell you which should carry more weight. They have to understand that you can design the app to do whatever they want, but they are in charge of letting you know what precisely they want. That sort of decision should not be left up to the designer. Let them wrestle with it in committee for a bit, don't sweat the actual algorithm until they come up with the answer for you.
It depends on what you're trying to accomplish. However, it seems to me that the social media score is pointless. The net result is that someone bothered to watch and/or rate the video on YouTube. Those scores alone should tell you if someone is "doing a good job" on the social media front.

Resources