Providing discovery of your web page in oembed means adding a link such as
<link rel="alternate" type="application/json+oembed"
href="http://flickr.com/services/oembed?
url=http%3A%2F%2Fflickr.com%2Fphotos%2Fbees%2F2362225867%2F&format=json"
title="Bacon Lollys oEmbed Profile" />
so thats a item URL of
http://flickr.com/photos/bees/2362225867/&format=json
However the URL scheme for this general area in flicker is
http://www.flickr.com/photos/*
and this ends up in the following JSON
https://oembed.com/providers.json
as an entry of
{
"provider_name": "Flickr",
"provider_url": "https://www.flickr.com/",
"endpoints": [
{
"schemes": [
"http://*.flickr.com/photos/*",
"http://flic.kr/p/*",
"https://*.flickr.com/photos/*",
"https://flic.kr/p/*"
],
"url": "https://www.flickr.com/services/oembed/",
"discovery": true
}
]
},
Where is this scheme URL obtained from as I don't see it explicitly declared anywhere?
The provider's JSON is supposed to be maintained through discovery via the original s but they do not contain the scheme URL (with the wildcard *'s).
I'm using sw-precache in a jekyll website to add offline capabilities with the following configuration:
gulp.task('generate-service-worker', function(cb) {
var path = require('path');
var swPrecache = require('sw-precache');
var rootDir = '_site';
var packageJson = require('./package.json');
swPrecache.write('./service-worker.js', {
staticFileGlobs: [rootDir + '/**/*.{html,css,png,jpg,gif,svg}', rootDir + '/js/*'],
stripPrefix: rootDir + '/',
runtimeCaching: [{
urlPattern: /\/$/,
handler: 'networkOnly'
}],
handleFetch: argv.cacheAssets || false,
maximumFileSizeToCacheInBytes: 10485760, // 10 mb
cacheId: packageJson.name + '-v' + packageJson.version
}, cb);
});
The problem is that, when I change content in the website (for example, text in a blog post, or some text from the index page) the changes won't be shown until the new serviceworker version has been installed and the browser has been refreshed, which of course, is the expected behaviour of cacheFirst.
What I want is to make the request to the index of the site always network first, which is what I'm trying here:
runtimeCaching: [{
urlPattern: /\/$/,
handler: 'networkFirst'
}]
But this isn't working, the index is always getting fetch from the serviceworker and not from network, how can I accomplish this?
My problem is that I was including the actual page contents for precache: '/**/*.{html,css,png,jpg,gif,svg}'.
Excluding the html files works as expected:
'/**/*.{css,png,jpg,gif,svg}'
Change the url pattern to
urlPattern: "'/'"
This is a exact match pattern. Your index will match to this and nothing else.
The solution for this is, treat your index.html as dynamic content.
Change you sw webpack config to
new SWPrecacheWebpackPlugin({
cacheId: 'yourcacheid',
filename: 'service-worker.js',
staticFileGlobs: [
'dist/**/*.{js,css}'
],
minify: true,
stripPrefix: 'dist/',
runtimeCaching: [{
urlPattern: /\/$/,
handler: 'networkFirst'
}]
})
Remove your index.html from staticFileGlobs and add you root index to runtime caching.
Then look at your cache storage. You will see something like $$$toolbox-cache$$$https://your-domain.com as a new cache item. Inspect that and you can see your index cached there.
With the new firefox webextensions: Is there a way to save the current page (or a part of it) as html (or text) to disk? If not, how are the chances, such an API will be implemented in the future?
I didn't find any suitable API and appreciate any help.
Regards
There are probably several ways to do this. The following will get you started. It saves the webpage in the currently focused tab in the active window to the browser's default downloads path. The file name is set to 'samplePage.html' (you can change that by modifying the filename value in the downloads.download() options; or you can remove that field entirely and leave it to the default naming).
You will need to store icon images in your webextension package for the user to be able to click on. Also, be sure to navigate to a webpage you want to save before you try to use the webextension; webextensions are not active on the Firefox about:debugging page.
manifest:
{
"name": "SavePage",
"version": "1.0",
"description": "Clicking browser icon saves page html",
"manifest_version": 2,
"icons": {
"48": "icons/clickme-48.png"
},
"permissions": [
"tabs",
"activeTab",
"downloads"
],
"browser_action": {
"default_icon": "icons/clickme-32.png"
},
"background": {
"scripts": ["background.js"]
}
}
background script:
/* BACKGROUND SCRIPT
Clicking on browser toolbar button saves the webpage in the
current tab to the browser's default downloads path with a
filename of "samplePage.html". The "tabs" and "downloads"
permissions are required.
*/
browser.browserAction.onClicked.addListener((tab) => {
var currentUrl = tab.url;
function onStartedDownload(id) {
console.log(`Started to download: ${id}`);
}
function onFailed(error) {
console.log(`Something stinks: ${error}`);
}
var startDownload = browser.downloads.download({
url : currentUrl,
filename: 'samplePage.html'
});
startDownload.then(onStartedDownload, onFailed);
});
An alternative approach might be to try to save the webpage to local storage rather than to disk. I have not explored that option.
These pages may be helpful:
downloads.download()
browserAction.onClicked
There may be security risks in giving a webextension these permissions. You will have to weigh the risks for your own usage pattern.
I can't find any informations to check if a YouTube channel is actually streaming or not.
With Twitch you just need the channel name, and with the API you can check if there is a live or not.
I don't want to use OAuth, normally a public API key is enough. Like checking the videos of a channel I want to know if the channel is streaming.
You can do this by using search.list and specifying the channel ID, setting the type to video, and setting eventType to live.
For example, when I searched for:
https://www.googleapis.com/youtube/v3/search?part=snippet&channelId=UCXswCcAMb5bvEUIDEzXFGYg&type=video&eventType=live&key=[API_KEY]
I got the following:
{
"kind": "youtube#searchListResponse",
"etag": "\"sGDdEsjSJ_SnACpEvVQ6MtTzkrI/gE5P_aKHWIIc6YSpRcOE57lf9oE\"",
"pageInfo": {
"totalResults": 1,
"resultsPerPage": 5
},
"items": [
{
"kind": "youtube#searchResult",
"etag": "\"sGDdEsjSJ_SnACpEvVQ6MtTzkrI/H-6Tm7-JewZC0-CW4ALwOiq9wjs\"",
"id": {
"kind": "youtube#video",
"videoId": "W4HL6h-ZSws"
},
"snippet": {
"publishedAt": "2015-09-08T11:46:23.000Z",
"channelId": "UCXswCcAMb5bvEUIDEzXFGYg",
"title": "Borussia Dortmund vs St. Pauli 1-0 Live Stream",
"description": "Borussia Dortmund vs St. Pauli Live Stream Friendly Match.",
"thumbnails": {
"default": {
"url": "https://i.ytimg.com/vi/W4HL6h-ZSws/default.jpg"
},
"medium": {
"url": "https://i.ytimg.com/vi/W4HL6h-ZSws/mqdefault.jpg"
},
"high": {
"url": "https://i.ytimg.com/vi/W4HL6h-ZSws/hqdefault.jpg"
}
},
"channelTitle": "",
"liveBroadcastContent": "live"
}
}
]
}
The search-method (https://www.googleapis.com/youtube/v3/search) is awfully expensive to use though. It costs 100 quota units (https://developers.google.com/youtube/v3/determine_quota_cost) out of the 10,000 you have by default.
This means you only get 100 requests per day which is terrible.
You could request an increase in the quota but that seems like brute forcing the the problem.
Is there really no other simpler method?
I know this is old, but I figured it out myself with PHP.
$API_KEY = 'your api3 key';
$ChannelID = 'the users channel id';
$channelInfo = 'https://www.googleapis.com/youtube/v3/search?part=snippet&channelId='.$ChannelID.'&type=video&eventType=live&key='.$API_KEY;
$extractInfo = file_get_contents($channelInfo);
$extractInfo = str_replace('},]',"}]",$extractInfo);
$showInfo = json_decode($extractInfo, true);
if($showInfo['pageInfo']['totalResults'] === 0){
echo 'Users channel is Offline';
} else {
echo 'Users channel is LIVE!';
}
Guys I found better way to do this. Yes, it requires you to make GET requests to a YouTube page and parse HTML, but it will work with newer versions + works with consent + works with captcha (most likely, 90%)
All you need to do is make a request to https://youtube.com/channel/[CHANNELID]/live and check the href attribute of the <link rel="canonical" /> tag.
For example,
<link rel="canonical" href="https://www.youtube.com/channel/UC4cueEAH9Oq94E1ynBiVJhw">
means there is no livestream, while
<link rel="canonical" href="https://www.youtube.com/watch?v=SR9w_ofpqkU">
means there is a stream, and you can even fetch its data by videoid.
Since canonical URL is very important for SEO and redirect does not work in GET or HEAD requests anymore, I recommend using my method.
Also here is the simple script I use:
import { parse } from 'node-html-parser'
import fetch from 'node-fetch'
const channelID = process.argv[2] // process.argv is array of arguments passed in console
const response = await fetch(`https://youtube.com/channel/${channelID}/live`)
const text = await response.text()
const html = parse(text)
const canonicalURLTag = html.querySelector('link[rel=canonical]')
const canonicalURL = canonicalURLTag.getAttribute('href')
const isStreaming = canonicalURL.includes('/watch?v=')
console.log(isStreaming)
Then run npm init -y && npm i node-html-parser node-fetch to create project in working directory and install dependencies
Then run node isStreaming.js UC4cueEAH9Oq94E1ynBiVJhw and it will print true/false (400-600 ms per one execution)
It does require you to depend on node-html-parser and node-fetch, but you can make requests with the built-in HTTP library (which sucks) and rewrite this to use regex. (Do not parse HTML with regex.)
I was also struggling with API limits. The most reliable and cheapest way I've found was simply a HEAD request to https://www.youtube.com/channel/CHANNEL_ID/live. If the channel is live then it will auto load the stream. If not then it will load the channels videos feed. You can simply check the Content-Length header size to determine which. If live the size is almost 2x when NOT live.
And depending on your region you might need to accept the cookies consent page. Just send your request with cookies={ "CONSENT": "YES+cb.20210420-15-p1.en-GB+FX+634" }.
if you point streamlink at a https://www.youtube.com/channel/CHANNEL_ID/live link, it will tell you if it is live or not
e.g. lofi beats is usually live,
$ streamlink "https://www.youtube.com/channel/UCSJ4gkVC6NrvII8umztf0Ow/live"
[cli][info] Found matching plugin youtube for URL https://www.youtube.com/channel/UCSJ4gkVC6NrvII8umztf0Ow/live
Available streams: 144p (worst), 240p, 360p, 480p, 720p, 1080p (best)
whereas MKBHD is not
$ streamlink "https://www.youtube.com/c/mkbhd/live"
[cli][info] Found matching plugin youtube for URL https://www.youtube.com/c/mkbhd/live
error: Could not find a video on this page
The easisest way that I have found to this has been scraping the site. This can be done by finding this:
<link rel="canonical" href="linkToActualYTLiveVideoPage">
as in Vitya's answer.
This is my simple Python code using bs4:
import requests
from bs4 import BeautifulSoup
def is_liveYT():
channel_url = "https://www.youtube.com/c/LofiGirl/live"
page = requests.get(channel_url, cookies={'CONSENT': 'YES+42'})
soup = BeautifulSoup(page.content, "html.parser")
live = soup.find("link", {"rel": "canonical"})
if live:
print("Streaming")
else:
print("Not Streaming")
if __name__ == "__main__":
is_liveYT()
It is pretty weird, honestly, that YouTube doesn't have a simple way to do this through the API, although this is probably easier.
I found the answer by #VityaSchel to be quite useful, but it doesn't distinguish between channels which have a live broadcast scheduled, and those which are broadcasting live now.
To distinguish between scheduled and live, I have extended his code to access the YouTube Data API to find the live streaming details:
import { parse } from 'node-html-parser'
import fetch from 'node-fetch'
const youtubeAPIkey = 'YOUR_YOUTUBE_API_KEY'
const youtubeURLbase = 'https://www.googleapis.com/youtube/v3/videos?key=' + youtubeAPIkey + '&part=liveStreamingDetails,snippet&id='
const c = {cid: process.argv[2]} // process.argv is array of arguments passed in console
const response = await fetch(`https://youtube.com/channel/${c.cid}/live`)
const text = await response.text()
const html = parse(text)
const canonicalURLTag = html.querySelector('link[rel=canonical]')
const canonicalURL = canonicalURLTag.getAttribute('href')
c.live = false
c.configured = canonicalURL.includes('/watch?v=')
if (!c.configured) process.exit()
c.vid = canonicalURL.match(/(?<==).*/)[0]
const data = await fetch(youtubeURLbase + c.vid).then(response => response.json())
if (data.error) {
console.error(data)
process.exit(1)
}
const i = data.items.pop() // pop() grabs the last item
c.title = i.snippet.title
c.thumbnail = i.snippet.thumbnails.standard.url
c.scheduledStartTime = i.liveStreamingDetails.scheduledStartTime
c.live = i.liveStreamingDetails.hasOwnProperty('actualStartTime')
if (c.live) {
c.actualStartTime = i.liveStreamingDetails.actualStartTime
}
console.log(c)
Sample output from the above:
% node index.js UCNlfGuzOAKM1sycPuM_QTHg
{
cid: 'UCNlfGuzOAKM1sycPuM_QTHg',
live: true,
configured: true,
vid: '8yRgYiNH39E',
title: '🔴 Deep Focus 24/7 - Ambient Music For Studying, Concentration, Work And Meditation',
thumbnail: 'https://i.ytimg.com/vi/8yRgYiNH39E/sddefault_live.jpg',
scheduledStartTime: '2022-05-23T01:25:00Z',
actualStartTime: '2022-05-23T01:30:22Z'
}
Every YouTube channel as a permanent livestream, even if the channel is currently not actively livestreaming. In the liveStream resource, you can find a boolean named isDefaultStream.
But where can we get this video (livestream) id? Go to https://www.youtube.com/user/CHANNEL_ID/live, right click on the stream and copy the video URL.
You can now make a GET request to
https://youtube.googleapis.com/youtube/v3/videos?part=liveStreamingDetails&id=[VIDEO_ID]&key=[API_KEY] (this request has a quota cost of 1 unit, see here)
This will be the result if the stream is currently active/online.
{
"kind": "",
"etag": "",
"items": [
{
"kind": "",
"etag": "",
"id": "",
"liveStreamingDetails": {
"actualStartTime": "",
"scheduledStartTime": "",
"concurrentViewers": "",
"activeLiveChatId": ""
}
}
],
"pageInfo": {
"totalResults": 1,
"resultsPerPage": 1
}
}
If the stream is currently offline, the property concurrentViewers will not exist. In other words, the only difference between an online and offline livestream is that concurrentViewers is present or not present. With this information, you can check, if the channel is currently streaming or not (at least for his default stream).
I found youtube API to be very restrictive given the cost of search operation. Web scraping with aiohttp and beautifulsoup was not an option since the better indicators required javascript support. Hence I turned to selenium. I looked for the css selector
#info-text
and then search for the string Started streaming or with watching now in it.
You can run a small API on heroku with flask as well.
I want to create an extension for Google Chrome, and it will be real simple.
I will have a database on my website's server, it will check if a URL is in the "blacklist" table, and warn the user if it is.
But I don't know where to start. I tried putting all the files on my web server, and changin the manifest.json file as such:
(changed the "default_popup" line)
{
"manifest_version": 2,
"name": "My Extension",
"description": "This extension warns you if you are trying to open a blacklisted URL",
"version": "1.0",
"browser_action": {
"default_icon": "icon.png",
"default_popup": "http://www.mysite.com/my_extension/popup.html"
},
"permissions": [
"https://secure.flickr.com/"
]
}
Note : this file resides on my computer, I load it from the "Extensions" menu of Chrome.
but when I tried to install this extension, I got the error :
This web page could not be found:chrome-extension://hgfdjnsakhkijfmdnadmlacgjggggkpf/http://www.mysite.com/my_extension/popup.html
Instead of trying to hard code it in the manifest file, try putting something like this in a background page:
chrome.browserAction.setPopup({popup: "http://www.mysite.com/my_extension/popup.html"});
However, it may not be possible to specify an external popup page at all.
Nevertheless, it would be better to include the popup in the extension files and then get just the data from your server.