i would like to crawl Youtube for videos of a specific language that contains subtitles/closed-captions(CC).
For example,
I want to crawl for 200 random English videos with English subtitles/(CC).
I want to crawl for 300 random Chinese videos with Chinese subtitles/(CC).
I want to crawl for 550 random Malay videos with Malay subtitles/(CC).
There's an api here that helps to extract transcripts, but the main bottleneck right now is that i have to go youtube to search for these videos and watch one by one to find out if they are indeed in the correct language, and if they really contain subtitles/CC.
An option is:
Use YouTube Data API - search request for search videos that contains subtitles; for that, use videoCaption parameter with value: closedCaption.
You might need use another parameters for reduce the search terms to specific topics or get certain desired results; for example, for the q parameter, use a search term that retrieves the desired results; also all parameters like: videoDuration, type = video, relevanceLanguage.
Once you got such results, copy/paste the videoId you got from the results of the request and use your web-crawler for get more videos and the related ones.
For anyone still struggling with this, and as per the YouTube Data API for videoCaption to work, you need to also set the type parameter's value to video:
If you specify a value for this parameter, you must also set the type
parameter's value to video.
Related
I need get a list of most viewed music video on Youtube.
I used this API:
https://youtube.googleapis.com/youtube/v3/videos?part=snippet%2CcontentDetails%2Cstatistics&chart=mostPopular®ionCode=US&videoCategoryId=10&key=[YOUR_API_KEY]
By default regionCode is US, I want get results by global.
Can someone help me?
You can't really using using the chart means you must include a reagion and the default is the USA.
string
The regionCode parameter instructs the API to select a video chart available in the specified region. This parameter can only be used in conjunction with the chart parameter. The parameter value is an ISO 3166-1 alpha-2 country code.
remove ®ionCode=US Should be the same as US
GET https://youtube.googleapis.com/youtube/v3/videos?part=snippet%2CcontentDetails%2Cstatistics&chart=mostPopular&key=[YOUR_API_KEY]
You would have to query EACH region and take the videos with the highest cumulative viewCount over all regions. This would be difficult as different regions will have different music videos that are most popular. Unfortunately the API doesn't have a global regionCode option.
I'm trying to use Youtube Data API for a project, the goal is seeking for a keyword in a channel and show the response to the user. I used the Search for "snippet" part and launched it querying for a specific keyword and specifing the channel id but the response didn't include all the videos that I was expecting. For instance, let's say that the channel has 10 videos with "c" charachter in the title, setting the q field with "c" value will return only one video.
On the other side, if I search for a whole word it returns some videos with that word on title and some other videos that doesn't have it, neither in description. The order criteria in this case seems to be ok (from the strongest match to the most weak), but I don't know if all of this is working fine.
Is this a normal behaviour or am I doing something wrong?
Setting an order on the search, the issue seems to be solved.
Is there any way to have updated lists of top 100 videos of youtube by genre and/or by country!
Any kind of resources like json files or xml.
You'll want to use the chart parameter of the videos.list endpoint. Set the chart to mostPopular, and then include a regionCode parameter and videoCategoryId parameter for further narrowing down. For example,
GET https://www.googleapis.com/youtube/v3/videos?part=snippet&chart=mostPopular®ionCode=UA&key={YOUR_API_KEY}
Will retrieve the 5 most popular videos in the Ukraine.
GET https://www.googleapis.com/youtube/v3/videos?part=snippet&chart=mostPopular&maxResults=25®ionCode=DE&videoCategoryId=1&key={YOUR_API_KEY}
Will retrieve the 25 most popular videos in Germany that relate to Film/Animation. And so on.
Note that if you don't include a videoCategoryId parameter, it will return results from all categories. If you don't include a regionCode, it returns the most popular videos across all regions. You can only set videoCategoryId to a value that's valid in the region you're searching in (you can use the videoCategories.list endpoint to find valid categories for regions, languages, etc.)
I am trying to search for "Food+Show" from two youtube channels. ABCNetwork and FoxBroadcasting. The query I gave is
http://gdata.youtube.com/feeds/api/videos?v=2&alt=jsonc&q=Food+Show&max-results=3&authors=ABCNetwork,FoxBroadcasting&prettyprint=true
The first result I got was id UKfLsIgJB1g where uploader is wafelsanddinges and not ABC or Fox. Please tell me why my query is not retuning correct result.
The parameter for the v2 data API is "author," not "authors." Unfortunately, fixing that won't solve the problem, as the retrieval of videos from a particular channel can only accept one author at a time. This is also true for v3 of the API.
The reason behind this is that the comma is treated as a concatenator, looking for a video that was published on FoxBroadcasting AND ABCNetwork (the use case for having multiple authors in that parameter is if you are retrieving activity feeds, in which case you want both feeds so having the comma serve as an AND is correct).
So for now, the only solution is two separate calls.
I'd like to retrieve the tweets for given a hashtag and sort them from the most retweeted to the less retweeted.
The closest thing I've found is using the search call and use the type tag:
E.g.: http://search.twitter.com/search.json?q=TheHashTagHere&result_type=popular
However, I'm not sure on how "popular" option works.
For instance, if it finds 100 tweets with that hashtag I believe it should show the X most retweeted tweets, and if none of those tweets have been retweeted then it should show X of them randomly (or sorted in some other way like the most recent).
Unfortunately, if follows some kind of unknown rule to identify what's popular and what not and even hashtags with thousands of tweets might return only one or two results.
I hope I made myself clear. Thanks in advance :)
PS: I'll use PHP but I think that shouldn't affect the question?
Results will sometimes contain a
result_type field into the metadata
with a value of either "recent" or
"popular". Popular results are derived
by an algorithm that Twitter computes,
and up to 3 will appear in the default
mixed mode that the Search API
operates under. Popular results
include another node in the metadata
called recent_retweets. This field
indicates how many retweets the Tweet
has had.
Source (Emphasis are mine)
Just call with result_type=popular and check the recent_retweets node to see how popular it is. result_type=popular will become the default in an upcome release so beware if you omit this parameter.
Results with popular tweets aren't ordered chronologically. *
If you would like to always have results to show, use result_type=mixed: they will have the result_type in the "metadata" section with a value of "recent", and popular results will have "popular". A small reference about result_types:
mixed: Include both popular and real time results in the response.
recent: return only the most recent results in the response
popular: return only the most popular results in the response.
If a search query has any popular results, those will be returned at the top, even if they are older than the other results. *
*[Twitter API Announcements]
This isn't a programmatic method but rather works in the browser with a chrome extension (HackyBird) :
Install the extension
Search for a phrase e.g. #Social (twitter.com/search?q=%23Social)
Click the extension to sort it (you can adjust the ratio of retweets/likes used for sorting in extension options).
P.S. It'll also sort your or any other user's timeline.