Retrieving Data in a While Loop from the Youtube API

Retrieving Data in a While Loop from the Youtube API - youtube-api

I do have a hughe database where some data sets link to certain youtube videos. As we all know some youtube videos disappear after a while from youtube and this leads to my solution and my problem as well --> I'd like to check if the youtube video still exists by simply checking via JSON if there is data to retrieve from a video. If not than I'd simply delete that certain data set.
So the first part of my solution would be to go through each row of my data table and checking for each id if there is data to retrieve from youtube as seen in the following code:
$result = $db->query("SELECT id, link FROM songs");
while($row = $result->fetch_assoc())
{
$number = 1+$rown++;
$id = $row['id'];
$link = $row['link'];
$video_ID = $link;
$JSON = file_get_contents("https://gdata.youtube.com/feeds/api/videos/{$video_ID}?v=2&alt=json");
$JSON_Data = json_decode($JSON);
$views = $JSON_Data->{'entry'}->{'yt$statistics'}->{'viewCount'};
echo $number .' row<br />';
echo $link .' link<br />';
echo $views .' views<br /><br />';
}
This attempt works fine and outputs me the data I need. The only problem is, that it just gets me data from the first 150-190 rows and that's it. Now I am checking for a solution that checks each row for empty youtube data and this lead to two concrete questions I have:
1st) Might youtube be responsible for that due to a restriction in retrieving data from one single query?
2nd) Might this be a server issue of mine that stops queries after x-seconds (but I already expand the time limit by putting a line set_time_limit (10000000); into my php code but without success)?
Hope you can help, thanks in advance.

YouTube, naturally, enforces limits on how many requests you can make per period of time. Unfortunately, there are no clear guidelines on what those limits are ... for v2, the guidelines merely state:
The YouTube API enforces quotas to prevent problems associated with
irregular API usage. Specifically, a too_many_recent_calls error
indicates that the API servers have received too many calls from the
same caller in a short amount of time. If you receive this type of
error, then we recommend that you wait a few minutes and then try your
request again.
If time isn't an issue for you, you could slow down each query so that you only make 1 request per every 10-15 seconds or so. Alternatively, you'd probably have better luck batch processing. With this, you can make up to 50 requests at once (this counts as 50 requests against your overall request per day quota, but only as one against your per time quota). Batch processing with v2 of the API is a little involved, as you make a POST request to a batch endpoint first, and then based on those results you can send in the multiple requests. Here's the documentation:
https://developers.google.com/youtube/2.0/developers_guide_protocol?hl=en#Batch_processing
Batch processing is much easier with v3, as you just have the videoId parameter be a comma delimited list of the videos you want info on -- so in your case, you'd execute file_get_contents on a URL like this:
https://www.googleapis.com/youtube/v3/videos?part=id&id={comma-separated-list-of-IDs}&maxResults=50&key={YOUR_API_KEY}
Any video ID in your list that doesn't come back in the JSON response doesn't exist anymore. IF you do 50 at a time, wait for 15 seconds, do another 50, etc., that should give you better performance.

Related

How to find if a youtube channel is currently live streaming without using search?

I'm working on a website to load multiple youtube channels live streams. At first i was trying to figure out a way to do this without utilizing youtube's api but have decided to give in.
To find whether a channel is live streaming and to get the live stream links I've been using:
https://www.googleapis.com/youtube/v3/search?part=snippet&channelId={CHANNEL_ID}&eventType=live&maxResults=10&type=video&key={API_KEY}
However with the minimum quota being 10000 and each search being worth 100, Im only able to do about 100 searches before I exceed my quota limit which doesn't help at all. I ended up exceeding the quota limit in about 10 minutes. :(
Does anyone know of a better way to figure out if a channel is currently live streaming and what the live stream links are, using as minimal quota points as possible?
I want to reload youtube data for each user every 3 minutes, save it into a database, and display the information using my own api to save server resources as well as quota points.
Hopefully someone has a good solution to this problem!
If nothing can be done about links just determining if the user is live without using 100 quota points each time would be a big help.

Since the question only specified that Search API quotas should not be used in finding out if the channel is streaming, I thought I would share a sort of work-around method. It might require a bit more work than a simple API call, but it reduces API quota use to practically nothing:
I used a simple Perl GET request to retrieve a Youtube channel's main page. Several unique elements are found in the HTML of a channel page that is streaming live:
The number of live viewers tag, e.g. <li>753 watching</li>. The LIVE NOW
badge tag: <span class="yt-badge yt-badge-live" >Live now</span>.
To ascertain whether a channel is currently streaming live requires a simple match to see if the unique HTML tag is contained in the GET request results. Something like: if ($get_results =~ /$unique_html/) (Perl). Then, an API call can be made only to a channel ID that is actually streaming, in order to obtain the video ID of the stream.
The advantage of this is that you already know the channel is streaming, instead of using thousands of quota points to find out. My test script successfully identifies whether a channel is streaming, by looking in the HTML code for: <span class="yt-badge yt-badge-live" > (note the weird extra spaces in the code from Youtube).
I don't know what language OP is using, or I would help with a basic GET request in that language. I used Perl, and included browser headers, User Agent and cookies, to look like a normal computer visit.
Youtube's robots.txt doesn't seem to forbid crawling a channel's main page, only the community page of a channel.
Let me know what you think about the pros and cons of this method, and please comment with what might be improved rather than disliking if you find a flaw. Thanks, happy coding!
2020 UPDATE
The yt-badge-live seems to have been deprecated, it no longer reliably shows whether the channel is streaming. Instead, I now check the HTML for this string:
{"text":" watching"}
If I get a match, it means the page is streaming. (Non-streaming channels don't contain this string.) Again, note the weird extra whitespace. I also escape all the quotation marks since I'm using Perl.

Here are my two suggestions:
Check my answer where I explain how you can check how retrieve videos from channels who are livesrteaming.
Another option could be use the following URL and somehow make request(s) each time for check if there's a livestreaming.
https://www.youtube.com/channel/<CHANNEL_ID>/live
Where CHANNEL_ID is the channel id you want check if that channel is livestreaming1.
1 Just notice that maybe the URL wont work in all channels (and that depends of the channel itself).
For example, if you check the channel_id UC7_YxT-KID8kRbqZo7MyscQ - link to this channel livestreaming - https://www.youtube.com/channel/UC4nprx9Vd84-ly7N-1Ce6Og/live, this channel will show if he is livestreaming, but, with his channel id UC4nprx9Vd84-ly7N-1Ce6Og - link to this channel livestreaming -, it will show his main page instead.

Adding to the answer by Bman70, I tried eliminating the need of making a costly search request after knowing that the channel is streaming live. I did this using two indicators in the HTML response from channels page who are streaming live.
function findLiveStreamVideoId(channelId, cb){
$.ajax({
url: 'https://www.youtube.com/channel/'+channelId,
type: "GET",
headers: {
'Access-Control-Allow-Origin': '*',
'Accept-Language': 'en-US, en;q=0.5'
}}).done(function(resp) {
//one method to find live video
let n = resp.search(/\{"videoId[\sA-Za-z0-9:"\{\}\]\[,\-_]+BADGE_STYLE_TYPE_LIVE_NOW/i);
//If found
if(n>=0){
let videoId = resp.slice(n+1, resp.indexOf("}",n)-1).split("\":\"")[1]
return cb(videoId);
}
//If not found, then try another method to find live video
n = resp.search(/https:\/\/i.ytimg.com\/vi\/[A-Za-z0-9\-_]+\/hqdefault_live.jpg/i);
if (n >= 0){
let videoId = resp.slice(n,resp.indexOf(".jpg",n)-1).split("/")[4]
return cb(videoId);
}
//No streams found
return cb(null, "No live streams found");
}).fail(function() {
return cb(null, "CORS Request blocked");
});
}
However, there's a tradeoff. This method confuses a recently ended stream with currently live streams. A workaround for this issue is to get status of the videoId returned from Youtube API (costs a single unit from your quota).

I found youtube API to be very restrictive given the cost of search operation. Apparently the accepted answer did not work for me as I found the string on non live streams as well. Web scraping with aiohttp and beautifulsoup was not an option since the better indicators required javascript support. Hence I turned to selenium. I looked for the css selector
#info-text
and then search for the string Started streaming or with watching now in it.
To reduce load on my tiny server that would have otherwise required lot more resources, I moved this test of functionality to a heroku dyno with a small flask app.
# import flask dependencies
import os
from flask import Flask, request, make_response, jsonify
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
base = "https://www.youtube.com/watch?v={0}"
delay = 3
# initialize the flask app
app = Flask(__name__)
# default route
#app.route("/")
def index():
return "Hello World!"
# create a route for webhook
#app.route("/islive", methods=["GET", "POST"])
def is_live():
chrome_options = Options()
chrome_options.binary_location = os.environ.get('GOOGLE_CHROME_BIN')
chrome_options.add_argument('--disable-gpu')
chrome_options.add_argument('--no-sandbox')
chrome_options.add_argument('--disable-dev-shm-usage')
chrome_options.add_argument('--headless')
chrome_options.add_argument('--remote-debugging-port=9222')
driver = webdriver.Chrome(executable_path=os.environ.get('CHROMEDRIVER_PATH'), chrome_options=chrome_options)
url = request.args.get("url")
if "youtube.com" in url:
video_id = url.split("?v=")[-1]
else:
video_id = url
url = base.format(url)
print(url)
response = { "url": url, "is_live": False, "ok": False, "video_id": video_id }
driver.get(url)
try:
element = WebDriverWait(driver, delay).until(EC.presence_of_element_located((By.CSS_SELECTOR, "#info-text")))
result = element.text.lower().find("Started streaming".lower())
if result != -1:
response["is_live"] = True
else:
result = element.text.lower().find("watching now".lower())
if result != -1:
response["is_live"] = True
response["ok"] = True
return jsonify(response)
except Exception as e:
print(e)
return jsonify(response)
finally:
driver.close()
# run the app
if __name__ == "__main__":
app.run()
You'll however need to add the following buildpacks in settings
https://github.com/heroku/heroku-buildpack-google-chrome
https://github.com/heroku/heroku-buildpack-chromedriver
https://github.com/heroku/heroku-buildpack-python
Set the following Config Vars in settings
CHROMEDRIVER_PATH=/app/.chromedriver/bin/chromedriver
GOOGLE_CHROME_BIN=/app/.apt/usr/bin/google-chrome
You can find supported python runtime here but anything below python 3.9 should be good since selenium had problems with improper use of is operator
I hope youtube will provide better alternatives than workarounds.

I know this is a old thread, but i thought i share my way of checking to for example grab the status code to use in an app.
This is for a single Channel, but you could easly do a foreach with it.
<?php
#####
$ytchannelID = "UCd0BTXriKLvOs1ANx3puZ3Q";
#####
$ytliveurl = "https://www.youtube.com/channel/".$ytchannelID."/live";
$ytchannelLIVE = '{"text":" watching now"}';
$contents = file_get_contents($ytliveurl);
if ( strpos($contents, $ytchannelLIVE) !== false ){http_response_code(200);} else {http_response_code(201);}
unset($ytliveurl);
?>

Adding onto the other answers here, I use a GET request to https://www.youtube.com/c/<CHANNEL_NAME>/live and then search for "isLive":true (rather than {"text":" watching"})

YouTube API showing 102 queries being made per request

So this is sort of weird. For every 1 request sent from my website using our YouTube API key, the developer console shows 102 queries actually being made. Here is the query format (using Python) -
search_q = '<query-string-here>'
service = build('youtube', 'v3', developerKey='<api-key>')
results = service.search().list(
part='snippet',
channelId='<specific-channel-id-to-search-through>',
type='video',
q=search_q,
).execute()
My logs show only one request being sent using this but my query count on the quotas page increases by 102.
Is there something I'm doing wrong? Or is this a bug on Google's end?

You can use the Quota Calculator to approximate the quota costs your request is using. Sure enough the search API request quota is on 100 range:

Accessing public Instagram content via Instagram API without expiring accesstoken

i want to show public contents from instagram related to a specific hashtag (everything works fine with that) but i can't to renew the access_token everytime it expires.
("do not assume your access_token is valid forever." -
https://www.instagram.com/developer/authentication/)
To renew it manually is not an option i have to make sure there is a valid access_token at ANY time without re-authenticating.
Any ideas or questions? :)

I have one idea, but without API (and access_token). You can make requests to the web-version of Instagram with ?__a=1 parameter. I do not know how long it will work but now there is workflow:
You want to show public content with hashtag space, for example.
Add it to url and add GET-parameter ?__a=1: https://www.instagram.com/explore/tags/space/?__a=1
Make the GET-request. It returns json with nodes in top_posts (8) and media (18). Each node has owner, caption, number of comments and likes. But the most important part is in thumbnail_src and display_src.
There is page_info in media object which helps to paginate results. You need end_cursor (for example, J0HWE9rjAAAAF0HWE9qvgAAAFiYA)
Add the value from end_cursor to the url: https://www.instagram.com/explore/tags/space/?__a=1&max_id=J0HWE9rjAAAAF0HWE9qvgAAAFiYA
Repeat 3-6 to get newest posts with specific hashtag.

Update to the ?__a=1 url param. This appears to have stopped working with users '/account/?__a=1' endpoints.:( Still works on tags apparently.

Instagram shut down their public API. Here's a quick and dirty workaround in PHP:
<?php
function getPublicInfo($username) {
$url = sprintf("https://www.instagram.com/$username");
$content = file_get_contents($url);
$content = explode("window._sharedData = ", $content)[1];
$content = explode(";</script>", $content)[0];
$data = json_decode($content, true);
return $data['entry_data']['ProfilePage'][0];
}
Not sure for how long it's gonna work. Here's one for Javascript.

Instagram /tags/\(hashtag)/media/recent endpoint not returning pagination?

I've been trying to get this to work for probably 6 hours now to no avail, read every stackoverflow question I could find on the topic.
I'm trying to get 100, 200, or maybe 500 photos from a single tag:
func hashtags(hashtag: String, nextMaxTagId: String?) -> RequestParamters {
var params = "/tags/\(hashtag)/media/recent|access_token=\(accessToken)"
var parameters = Dictionary<String, AnyObject>()
parameters["access_token"] = accessToken
let urlString = "https://api.instagram.com/v1/tags/\(hashtag)/media/recent"
if let nextMaxTagId = nextMaxTagId {
params += "|max_tag_id=\(nextMaxTagId)"
parameters["max_tag_id"] = nextMaxTagId
}
let sig = HMAC.signWithKey(C.InstagramClientSecret(), usingData: params)
parameters["sig"] = sig
return (urlString: urlString, parameters: parameters)
}
This is what I use to construct my urls and parameters for my request. My first request does not have a nextMaxTagId, and that request goes through, returns 20 images and a pagination json.
Then, when I extract the next_max_tag_id from the pagination block, and create a request using that parameter, I get another 20 images, but they are the same images as before and now I do not get a pagination block.
I am signing my requests correctly (as all my other API requests throughout the app go through no problem) and I am not in Sandbox mode.
Edit: I've also tried using min_tag_id=\(nextMinTagId), still do not receive pagination in the next request.

Seems like:
1) You are using the Instagram Developer API with what seems like an authorized APIKey, and you mentioned you are NOT in Sandbox, so you're in a the Production environment for that api.
I'm trying to get 100, 200, or maybe 500 photos from a single tag
2) This means, combined with returns 20 images and a pagination json, that for 100, you need to make 5 calls minimum (100/20 == 5), 200 == 10, 500 = 25.
3) According to the developer documentation rate limits, the overall cap on Production is 5000 req/hour, with several APIs restricted to a much smaller limit (some are 30/60 req/hour). I'm not sure I see the exact tag rate limit you are hitting, but since the question mentions:
for probably 6 hours now to no avail
it's also possible you've just been hitting the overall hourly request limit each hour.
I definitely know that this is not an answer that I enjoy giving, because it's essentially saying: you're stuck. I've actually played with the rate limits myself before, and I find them extremely limiting (pun fully intended). The only other option, albeit not as "above board", is to scrape Instagram itself for the information you need. I say it's not as "above board" because if you needed info not found on a web scrape, you could theoretically scrape the mobile API through some minor reverse engineering (ie using an HTTP proxy to spoof mobile traffic systematically).
In the end, the API Instagram publishes is definitely very limited, and will face rate limits for the foreseeable future (unless you can get those somehow lifted in a specific partnership they somehow deem worthy, although I'm not sure how this could be approached).

youtube data api v3 php search pagination?

i am trying with youtube api v3 php search...
first time i'm using this api for this i am beginner...
i have 3 question;
1) how can below search list showing pagination numbers? (per page 50 result)
2) how can video duration show in list? (3:20 min:second)
3) how can order viewCount
if ($_GET['q']) {
require_once 'src/Google_Client.php';
require_once 'src/contrib/Google_YoutubeService.php';
$DEVELOPER_KEY = 'my key';
$client = new Google_Client();
$client->setDeveloperKey($DEVELOPER_KEY);
$youtube = new Google_YoutubeService($client);
try {
$searchResponse = $youtube->search->listSearch('id,snippet', array(
'q' => $_GET['q'],
'maxResults' => 50,
'type' => "video",
));
foreach ($searchResponse['items'] as $searchResult) {
$videos .= '<li style="clear:left"><img src="'.$searchResult['snippet']['thumbnails']['default']['url'].'" style="float:left; margin-right:18px" alt="" /><span style="float:left">'.$searchResult['snippet']['title'].'<br />'.$searchResult['id']['videoId'].'<br />'.$searchResult['snippet']['publishedAt'].'<br />'.$item['contentDetails']['duration'].'</span></li>';
}
$htmlBody .= <<<END
<ul>$videos</ul>
END;
} catch (Google_ServiceException $e) {
$htmlBody .= sprintf('<p>A service error occurred: <code>%s</code></p>',
htmlspecialchars($e->getMessage()));
} catch (Google_Exception $e) {
$htmlBody .= sprintf('<p>An client error occurred: <code>%s</code></p>',
htmlspecialchars($e->getMessage()));
}
}

1) how can below search list showing pagination numbers? (per page 50 result)
You need to write your own cacheing logic to implement this feature because with every result you get two tokens "NextPageToken" and "PreviousPageToken" and subsequent query must contain that token number to get next page or previous page token like below.
So whenever results are not available at cache then you should send either nextpagetoken or previous page token.
https://www.googleapis.com/youtube/v3/search?key=API_KEY&part=snippet&q=japan&maxResults=10&order=date&pageToken=NEXT_or_PREVIOUS_PAGE_TOKEN
In particular your case where you need 50 pages per page and you are showing 3 pagination like (1,2,NEXT) then you need to fetch results two times. Both the results you will keep in cache so for page 1 and 2 results will be retrieved from cache. For next you make it sure that you are making query google again by sending nextPageToken.
Thus to show pagination 1-n and every page 50 results then you need to make n-1 queries to google api. But if you are showing 10 results per page then you cane make single query of 50 results using which you can show first 5 pages (1-5) with the help of retrieved results and at next you should again send next page token like above.
NOTE- Google youtube api provide 50 results max.
2) how can video duration show in list? (3:20 min:second)
Youtube API v3 do not return video duration at simple first search response. To get video duration we need to make one extra call to youtube api like below.
https://www.googleapis.com/youtube/v3/videos?id=VIDEO_ID1%2CVIDEO_ID2&part=contentDetails&key=API_KEY (max 50 IDs)
This issue is highlighted in "http://code.google.com/p/gdata-issues/issues/detail?id=4294".I posted my answer here too.
Hence if we want to display video duration then we need to make two calls every time.
3) how can order viewCount
Trigger below query it will provide results ordered by view count.
https://www.googleapis.com/youtube/v3/search?key=KEY&part=snippet&q=japan&maxResults=5&order=viewCount
For detail please refer this - https://developers.google.com/youtube/v3/docs/search/list#order

The youTube API V3 is somehow complicated compare to API V2.
To the question above, my approach is not for search result rather is to retrieve user uploaded videos. I believe this can be useful
References
The way you create pagination in v3 is not the same as in v2 where you can make your call simply like
$youtube = "http://gdata.youtube.com/feeds/api/users/Qtube247/uploads?v=2&alt=jsonc&start-index=1&max-results=50";
In v3 you need to make two or three calls the first one will be to get the channel detail and second call will be to retrieve playlist from where we will get the channel playlist Id and finally retrieve individual video data.
I am using Php CURL
$youtube = “https://www.googleapis.com/youtube/v3/channels?part=snippet%2CcontentDetails%2Cstatistics&id=yourChannelIdgoeshere&key=yourApiKey”;
Here we retrieve user playlist ID
$result = json_decode($return, true);
$playlistId=$result['items'][0]['contentDetails']['relatedPlaylists']['uploads'];
we define pagetoken
$pageToken=’’;
Each time user click control button we retrieve pagetoken from session[] and feed the curl url, and in turn will produce nextpagetoken or prevpagetoken. Whatever you feed the url the Api know what set of list to populate.
if(isset($_REQUEST['ptk']) && $_REQUEST['ptk’]!==''){
$pageToken=$_REQUEST['ptk’];
}
Here we retrieve user playlist
$ playlistItems =”https://www.googleapis.com/youtube/v3/playlistItems?part=snippet&pageToken=”.$pageToken.”&maxResults=50&playlistId=$playlistId&key= yourApiKey”;
If user has more than maxResult, we should have nextPageToken, take for an example user has 200 uploaded videos,the first pagetoken may be CDIQAA and next pagetoken may be CGQQAA while previous may be CDIQAQ , something like that so is not a number.
Here we save the pagetoken
if(isset($result['nextPageToken'])) { $_SESSION[nextToken]=$result['nextPageToken'];
}
if(isset($result['prevPageToken'])) { $_SESSION[prevToken]=$result['prevPageToken'];
}
we can then create our control button <>
$next=$_SESSION[nextToken];
$prev=$_SESSION[prevToken];
The control button here
<a href=”?ptk=<?php echo $prev?>” ><<prev</a>
<a href=”?ptk=<?php echo $next?>” >next>></a>
From here when user click link it set either next or prev page in session variable (go to up to see how this work)
To get video duration we use same Php curl
$videoDetails="https://www.googleapis.com/youtube/v3/videos?part=id,snippet,contentDetails,statistics,status&id=videoIdHere&key=yourApiKey";
$videoData = json_decode($return, true);
$duration = $videoData['items'][0]['contentDetails']['duration'];
$viewCount = $videoData['items'][0]['statistics']['viewCount'];
you may get something like this ('PT2H34M25S')
I have answer a question Here which show you how to convert the duration data
See Working Demo Here

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart