How to get link to third party site in 'about channel' section via python - youtube-api

I want to display information about links in the YouTube profile in a text document, I tried to do it through the requests library, but Google gave links to privacy and security, I did not find information about this in the YouTube API documentation. Who knows, you can help with this

This isn't possible to get using the YouTube API, I actually found myself needing to do the same thing as yourself and was not able to because the YouTube API lacked the necessary functionality (Hopefully, It will be added soon!)
I see you mentioned Python, My only solution is in Node but I will do a large explanation and you can base your code off of it. In order to get the banner links without the YouTube API, we need to scrape the data, since YouTube uses client-side rendering we need to scrape the JSON configuration from the source.
There's a variable defined inside a script called ytInitialData which is a big JSON string with a massive amount of information about the channel, viewer, and YouTube configurations. We can find the banner links by parsing through this JSON link.
const request = require("request-promise").defaults({
simple: false,
resolveWithFullResponse: true
})
const getBannerLinks = async () => {
return request("https://www.youtube.com/user/pewdiepie").then(res => {
if (res.statusCode === 200) {
const parsed = res.body.split("var ytInitialData = ")[1].split(";</script>")[0]
const data = JSON.parse(parsed)
const links = data.header.c4TabbedHeaderRenderer.headerLinks.channelHeaderLinksRenderer
const allLinks = links.primaryLinks.concat(links.secondaryLinks || [])
const parsedLinks = allLinks.map(l => {
const url = new URLSearchParams(l.navigationEndpoint.commandMetadata.webCommandMetadata.url)
return {
link: url.get("q"),
name: l.title.simpleText,
icon: l.icon.thumbnails[0].url
}
})
return parsedLinks
} else {
// Error/ratelimit - Handle here
}
})
}
The way the links are scraped is as follows:
We make a HTTP request to the channel's URL
We parse the body to extract the JSON string that the banner links are inside using split
We parse the JSON string into a JSON object
We extract the links from their JSON section (It's a big JSON object data.header.c4TabbedHeaderRenderer.headerLinks.channelHeaderLinksRenderer
Because there are two types of links (Primary, the one that shows the text and secondary, links that don't show the text) we have to concatenate them together so we can map through them
We then map through the links and use URLSearchParams to extract the q query parameter since YouTube encrypts their outgoing links (Most likely for security reasons) and then extract the name and icon too using their appropriate objects.
This isn't a perfect solution, should YouTube update/change anything on their front end this could break your program easily. YouTube also has rate limits for their software if you're trying to mass scrape you'll run into 429/403 errors.

Related

Images/Videos CDN URL - Detect file type/extension

I'm trying to implement Story Mention rendering according to IG messenger graph API.
IG webhooks sends the payload URL of the media as CDN URLs that are extensionless,
which means I can't detect the file type(could be any kind of image or a video file).
The purpose is to render the URL to an HTML element and to prevent saving some file extensions.
Did anybody find out how to get this information?
An example for IG CDN URL
https://lookaside.fbsbx.com/ig_messaging_cdn/?asset_id=17952754300482708&signature=AbxVoHUcW3qKGZvE0FwrbpSEKBqkYGH9wFDUY9xnywlxxek8lWtrTwE173Sxhta9jbp0bgDiL17IpyiI82vqHGNPUD1wdMUZphwQOggW-_877cCI1BxaY_aDUZ8hj5OwmHK9E8OnSybqtMVmGXCX_hBF399t1Hb44zspeL3d9NWb9rib
Python:
import requests
res = requests.head(url)
print res.headers
I was able to retrieve the content type by making a request with node-fetch.
const fetch = require('node-fetch');
const response = await fetch(mediaUrl, { method: 'HEAD' });
const contentType = response.headers.get('Content-Type');

How to find if a youtube channel is currently live streaming without using search?

I'm working on a website to load multiple youtube channels live streams. At first i was trying to figure out a way to do this without utilizing youtube's api but have decided to give in.
To find whether a channel is live streaming and to get the live stream links I've been using:
https://www.googleapis.com/youtube/v3/search?part=snippet&channelId={CHANNEL_ID}&eventType=live&maxResults=10&type=video&key={API_KEY}
However with the minimum quota being 10000 and each search being worth 100, Im only able to do about 100 searches before I exceed my quota limit which doesn't help at all. I ended up exceeding the quota limit in about 10 minutes. :(
Does anyone know of a better way to figure out if a channel is currently live streaming and what the live stream links are, using as minimal quota points as possible?
I want to reload youtube data for each user every 3 minutes, save it into a database, and display the information using my own api to save server resources as well as quota points.
Hopefully someone has a good solution to this problem!
If nothing can be done about links just determining if the user is live without using 100 quota points each time would be a big help.
Since the question only specified that Search API quotas should not be used in finding out if the channel is streaming, I thought I would share a sort of work-around method. It might require a bit more work than a simple API call, but it reduces API quota use to practically nothing:
I used a simple Perl GET request to retrieve a Youtube channel's main page. Several unique elements are found in the HTML of a channel page that is streaming live:
The number of live viewers tag, e.g. <li>753 watching</li>. The LIVE NOW
badge tag: <span class="yt-badge yt-badge-live" >Live now</span>.
To ascertain whether a channel is currently streaming live requires a simple match to see if the unique HTML tag is contained in the GET request results. Something like: if ($get_results =~ /$unique_html/) (Perl). Then, an API call can be made only to a channel ID that is actually streaming, in order to obtain the video ID of the stream.
The advantage of this is that you already know the channel is streaming, instead of using thousands of quota points to find out. My test script successfully identifies whether a channel is streaming, by looking in the HTML code for: <span class="yt-badge yt-badge-live" > (note the weird extra spaces in the code from Youtube).
I don't know what language OP is using, or I would help with a basic GET request in that language. I used Perl, and included browser headers, User Agent and cookies, to look like a normal computer visit.
Youtube's robots.txt doesn't seem to forbid crawling a channel's main page, only the community page of a channel.
Let me know what you think about the pros and cons of this method, and please comment with what might be improved rather than disliking if you find a flaw. Thanks, happy coding!
2020 UPDATE
The yt-badge-live seems to have been deprecated, it no longer reliably shows whether the channel is streaming. Instead, I now check the HTML for this string:
{"text":" watching"}
If I get a match, it means the page is streaming. (Non-streaming channels don't contain this string.) Again, note the weird extra whitespace. I also escape all the quotation marks since I'm using Perl.
Here are my two suggestions:
Check my answer where I explain how you can check how retrieve videos from channels who are livesrteaming.
Another option could be use the following URL and somehow make request(s) each time for check if there's a livestreaming.
https://www.youtube.com/channel/<CHANNEL_ID>/live
Where CHANNEL_ID is the channel id you want check if that channel is livestreaming1.
1 Just notice that maybe the URL wont work in all channels (and that depends of the channel itself).
For example, if you check the channel_id UC7_YxT-KID8kRbqZo7MyscQ - link to this channel livestreaming - https://www.youtube.com/channel/UC4nprx9Vd84-ly7N-1Ce6Og/live, this channel will show if he is livestreaming, but, with his channel id UC4nprx9Vd84-ly7N-1Ce6Og - link to this channel livestreaming -, it will show his main page instead.
Adding to the answer by Bman70, I tried eliminating the need of making a costly search request after knowing that the channel is streaming live. I did this using two indicators in the HTML response from channels page who are streaming live.
function findLiveStreamVideoId(channelId, cb){
$.ajax({
url: 'https://www.youtube.com/channel/'+channelId,
type: "GET",
headers: {
'Access-Control-Allow-Origin': '*',
'Accept-Language': 'en-US, en;q=0.5'
}}).done(function(resp) {
//one method to find live video
let n = resp.search(/\{"videoId[\sA-Za-z0-9:"\{\}\]\[,\-_]+BADGE_STYLE_TYPE_LIVE_NOW/i);
//If found
if(n>=0){
let videoId = resp.slice(n+1, resp.indexOf("}",n)-1).split("\":\"")[1]
return cb(videoId);
}
//If not found, then try another method to find live video
n = resp.search(/https:\/\/i.ytimg.com\/vi\/[A-Za-z0-9\-_]+\/hqdefault_live.jpg/i);
if (n >= 0){
let videoId = resp.slice(n,resp.indexOf(".jpg",n)-1).split("/")[4]
return cb(videoId);
}
//No streams found
return cb(null, "No live streams found");
}).fail(function() {
return cb(null, "CORS Request blocked");
});
}
However, there's a tradeoff. This method confuses a recently ended stream with currently live streams. A workaround for this issue is to get status of the videoId returned from Youtube API (costs a single unit from your quota).
I found youtube API to be very restrictive given the cost of search operation. Apparently the accepted answer did not work for me as I found the string on non live streams as well. Web scraping with aiohttp and beautifulsoup was not an option since the better indicators required javascript support. Hence I turned to selenium. I looked for the css selector
#info-text
and then search for the string Started streaming or with watching now in it.
To reduce load on my tiny server that would have otherwise required lot more resources, I moved this test of functionality to a heroku dyno with a small flask app.
# import flask dependencies
import os
from flask import Flask, request, make_response, jsonify
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
base = "https://www.youtube.com/watch?v={0}"
delay = 3
# initialize the flask app
app = Flask(__name__)
# default route
#app.route("/")
def index():
return "Hello World!"
# create a route for webhook
#app.route("/islive", methods=["GET", "POST"])
def is_live():
chrome_options = Options()
chrome_options.binary_location = os.environ.get('GOOGLE_CHROME_BIN')
chrome_options.add_argument('--disable-gpu')
chrome_options.add_argument('--no-sandbox')
chrome_options.add_argument('--disable-dev-shm-usage')
chrome_options.add_argument('--headless')
chrome_options.add_argument('--remote-debugging-port=9222')
driver = webdriver.Chrome(executable_path=os.environ.get('CHROMEDRIVER_PATH'), chrome_options=chrome_options)
url = request.args.get("url")
if "youtube.com" in url:
video_id = url.split("?v=")[-1]
else:
video_id = url
url = base.format(url)
print(url)
response = { "url": url, "is_live": False, "ok": False, "video_id": video_id }
driver.get(url)
try:
element = WebDriverWait(driver, delay).until(EC.presence_of_element_located((By.CSS_SELECTOR, "#info-text")))
result = element.text.lower().find("Started streaming".lower())
if result != -1:
response["is_live"] = True
else:
result = element.text.lower().find("watching now".lower())
if result != -1:
response["is_live"] = True
response["ok"] = True
return jsonify(response)
except Exception as e:
print(e)
return jsonify(response)
finally:
driver.close()
# run the app
if __name__ == "__main__":
app.run()
You'll however need to add the following buildpacks in settings
https://github.com/heroku/heroku-buildpack-google-chrome
https://github.com/heroku/heroku-buildpack-chromedriver
https://github.com/heroku/heroku-buildpack-python
Set the following Config Vars in settings
CHROMEDRIVER_PATH=/app/.chromedriver/bin/chromedriver
GOOGLE_CHROME_BIN=/app/.apt/usr/bin/google-chrome
You can find supported python runtime here but anything below python 3.9 should be good since selenium had problems with improper use of is operator
I hope youtube will provide better alternatives than workarounds.
I know this is a old thread, but i thought i share my way of checking to for example grab the status code to use in an app.
This is for a single Channel, but you could easly do a foreach with it.
<?php
#####
$ytchannelID = "UCd0BTXriKLvOs1ANx3puZ3Q";
#####
$ytliveurl = "https://www.youtube.com/channel/".$ytchannelID."/live";
$ytchannelLIVE = '{"text":" watching now"}';
$contents = file_get_contents($ytliveurl);
if ( strpos($contents, $ytchannelLIVE) !== false ){http_response_code(200);} else {http_response_code(201);}
unset($ytliveurl);
?>
Adding onto the other answers here, I use a GET request to https://www.youtube.com/c/<CHANNEL_NAME>/live and then search for "isLive":true (rather than {"text":" watching"})

YouTube API "ChannelSections" results don't match with channel?

So for the YouTube Channel Mindless Self Indulgence it has 4 sections on the home tab first section is they're music videos playlist, the 2nd section is albums which is a group of different playlists, then another playlist section and the last section is there uploads.
But when I do a channelSections api call I get like 20 different items and it has me scratching my head why.
Here's the api response https://notepad.pw/raw/w27ot290s
https://www.googleapis.com/youtube/v3/channelSections?key={KEYHERE}&channelId=UChS8bULfMVx10SiyZyeTszw&part=snippet,contentDetails
So I figured this out finally, I neglected to read the documentation on the channelSections api 😅
here: https://developers.google.com/youtube/v3/docs/channelSections
I was getting channel sections for all the regions where channel like music may more often have region specific sections... To filter these you need to also include the targeting object in the part parameter. If the section is region free (or atleast i assume) it won't have the targeting object so something to take into condertation when handling your api response and filter sectoins based on regions.
Here's my code just trying to get the data filtered in react app, not the most practical maybe but I fumbled through it:
const data = response2.data.items;
console.log("response2 data", data);
const filtered = data.filter(item => {
if (item.targeting === undefined) return true;
let test = false;
item.targeting.countries.forEach(i => {
if (i === "US") test = true;
});
return test;
});

YouTube IFrame Player API getVideoData is removed: how to get title?

On November 13th, I got a call from a customer reporting that the YouTube player didn't work anymore. After a quick look in the dev tool, I found that there was an error:
Uncaught TypeError: a.getVideoData is not a function
Looking into what the player object was containing, I learned that there's no function getVideoData anymore.
The function getVideoData provided a way to get the video title. Now, how can I get the title?
Is there any article from Google about this change?
To get a video's title, you can query the YouTube Data API v3:
GET https://www.googleapis.com/youtube/v3/videos
?part=snippet
&id=VIDEO_ID
&key=YOUR_API_KEY
For that you need to sign up on the Google Cloud Console and create an API key (it's free). You can restrict the API key to only be used from your website, that way you can safely make it public in your JS source code/html code without others being able to make queries on your behalf. Make sure to enable the YouTube Data API v3 in the console as well, otherwise your queries will return errors.
The above query will return a JSON representation of the information on the video that you are interested in (the snippet part). Say you parse the JSON into an object called result. Then you can get the video title via
result.items[0].snippet.title
getVideoData() seems to be back (Dec, 2017). So, try again !
As of today (October 1st, 2020), I am retrieving the title of the video from within YouTube's API object:
// Assigning YouTube's ID to your ID variable
const playerID = "xxxxxxx";
// Creating an object for the video using YouTube's API.
const yPlayer = new YT.Player(playerID, {
events: {
'onReady': onPlayerReady(),
'onStateChange': onPlayerStateChange()
}
});
function onPlayerReady() {
}
function onPlayerStateChange() {
// Title retrieved here
let videoTitle = yPlayer.j.videoData.title;
}
onYouTubeIframeAPIReady();

How to retrieve Medium stories for a user from the API?

I'm trying to integrate Medium blogging into an app by showing some cards with posts images and links to the original Medium publication.
From Medium API docs I can see how to retrieve publications and create posts, but it doesn't mention retrieving posts. Is retrieving posts/stories for a user currently possible using the Medium's API?
The API is write-only and is not intended to retrieve posts (Medium staff told me)
You can simply use the RSS feed as such:
https://medium.com/feed/#your_profile
You can simply get the RSS feed via GET, then if you need it in JSON format just use a NPM module like rss-to-json and you're good to go.
Edit:
It is possible to make a request to the following URL and you will get the response. Unfortunately, the response is in RSS format which would require some parsing to JSON if needed.
https://medium.com/feed/#yourhandle
⚠️ The following approach is not applicable anymore as it is behind Cloudflare's DDoS protection.
If you planning to get it from the Client-side using JavaScript or jQuery or Angular, etc. then you need to build an API gateway or web service that serves your feed. In the case of PHP, RoR, or any server-side that should not be the case.
You can get it directly in JSON format as given beneath:
https://medium.com/#yourhandle/latest?format=json
In my case, I made a simple web service in the express app and host it over Heroku. React App hits the API exposed over Heroku and gets the data.
const MEDIUM_URL = "https://medium.com/#yourhandle/latest?format=json";
router.get("/posts", (req, res, next) => {
request.get(MEDIUM_URL, (err, apiRes, body) => {
if (!err && apiRes.statusCode === 200) {
let i = body.indexOf("{");
const data = body.substr(i);
res.send(data);
} else {
res.sendStatus(500).json(err);
}
});
});
Nowadays this URL:
https://medium.com/#username/latest?format=json
sits behind Cloudflare's DDoS protection service so instead of consistently being served your feed in JSON format, you will usually receive instead an HTML which is suppose to render a website to complete a reCAPTCHA and leaving you with no data from an API request.
And the following:
https://medium.com/feed/#username
has a limit of the latest 10 posts.
I'd suggest this free Cloudflare Worker that I made for this purpose. It works as a facade so you don't have to worry about neither how the posts are obtained from source, reCAPTCHAs or pagination.
Full article about it.
Live example. To fetch the following items add the query param ?next= with the value of the JSON field next which the API provides.
const MdFetch = async (name) => {
const res = await fetch(
`https://api.rss2json.com/v1/api.json?rss_url=https://medium.com/feed/${name}`
);
return await res.json();
};
const data = await MdFetch('#chawki726');
To get your posts as JSON objects
you can replace your user name instead of #USERNAME.
https://api.rss2json.com/v1/api.json?rss_url=https://medium.com/feed/#USERNAME
With that REST method you would do this: GET https://api.medium.com/v1/users/{{userId}}/publications and this would return the title, image, and the item's URL.
Further details: https://github.com/Medium/medium-api-docs#32-publications .
You can also add "?format=json" to the end of any URL on Medium and get useful data back.
Use this url, this url will give json format of posts
Replace studytact with your feed name
https://api.rss2json.com/v1/api.json?rss_url=https://medium.com/feed/studytact
I have built a basic function using AWS Lambda and AWS API Gateway if anyone is interested. A detailed explanation is found on this blog post here and the repository for the the Lambda function built with Node.js is found here on Github. Hopefully someone here finds it useful.
(Updating the JS Fiddle and the Clay function that explains it as we updated the function syntax to be cleaner)
I wrapped the Github package #mark-fasel was mentioning below into a Clay microservice that enables you to do exactly this:
Simplified Return Format: https://www.clay.run/services/nicoslepicos/medium-get-user-posts-new/code
I put together a little fiddle, since a user was asking how to use the endpoint in HTML to get the titles for their last 3 posts:
https://jsfiddle.net/h405m3ma/3/
You can call the API as:
curl -i -H "Content-Type: application/json" -X POST -d '{"username":"nicolaerusan"}' https://clay.run/services/nicoslepicos/medium-get-users-posts-simple
You can also use it easily in your node code using the clay-client npm package and just write:
Clay.run('nicoslepicos/medium-get-user-posts-new', {"profile":"profileValue"})
.then((result) => {
// Do what you want with returned result
console.log(result);
})
.catch((error) => {
console.log(error);
});
Hope that's helpful!
Check this One you will get all info about your own post........
mediumController.getBlogs = (req, res) => {
parser('https://medium.com/feed/#profileName', function (err, rss) {
if (err) {
console.log(err);
}
var stories = [];
for (var i = rss.length - 1; i >= 0; i--) {
var new_story = {};
new_story.title = rss[i].title;
new_story.description = rss[i].description;
new_story.date = rss[i].date;
new_story.link = rss[i].link;
new_story.author = rss[i].author;
new_story.comments = rss[i].comments;
stories.push(new_story);
}
console.log('stories:');
console.dir(stories);
res.json(200, {
Data: stories
})
});
}
I have created a custom REST API to retrieve the stats of a given post on Medium, all you need is to send a GET request to my custom API and you will retrieve the stats as a Json abject as follows:
Request :
curl https://endpoint/api/stats?story_url=THE_URL_OF_THE_MEDIUM_STORY
Response:
{
"claps": 78,
"comments": 1
}
The API responds within a reasonable response time (< 2 sec), you can find more about it in the following Medium article.

Resources