I am trying to download YouTube videos through Wget. The first thing necessary is to capture the URL of the actual video resource. Suppose I want to download this video: video. Opening up the page in the Firebug console reveals something like this:
The link which I have encircled looks like the link to the resource, for there we see only the video: http://www.youtube.com/v/r-KBncrOggI?version=3&autohide=1. However, when I am trying to download this resource with Wget, a 4 KB file of name r-KBncrOggI#version=3&autohide=1 gets stored in my hard-drive, nothing else. What should I do to get the actual video?
And secondly, is there a way to capture different resources for videos of different resolutions, like 360px, 480px, etc.?
Here is one VERY simplified, yet functional version of the youtube-download utility I cited on my another answer:
#!/usr/bin/env perl
use strict;
use warnings;
# CPAN modules we depend on
use JSON::XS;
use LWP::UserAgent;
use URI::Escape;
# Initialize the User Agent
# YouTube servers are weird, so *don't* parse headers!
my $ua = LWP::UserAgent->new(parse_head => 0);
# fetch video page or abort
my $res = $ua->get($ARGV[0]);
die "bad HTTP response" unless $res->is_success;
# scrape video metadata
if ($res->content =~ /\byt\.playerConfig\s*=\s*({.+?});/sx) {
# parse as JSON or abort
my $json = eval { decode_json $1 };
die "bad JSON: $1" if $#;
# inside the JSON 'args' property, there's an encoded
# url_encoded_fmt_stream_map property which points
# to stream URLs and signatures
while ($json->{args}{url_encoded_fmt_stream_map} =~ /\burl=(http.+?)&sig=([0-9A-F\.]+)/gx) {
# decode URL and attach signature
my $url = uri_unescape($1) . "&signature=$2";
print $url, "\n";
}
}
Usage example (it returns several URLs to streams with different encoding/quality):
$ perl youtube.pl http://www.youtube.com/watch?v=r-KBncrOggI | head -n 1
http://r19---sn-bg07sner.c.youtube.com/videoplayback?fexp=923014%2C916623%2C920704%2C912806%2C922403%2C922405%2C929901%2C913605%2C925710%2C929104%2C929110%2C908493%2C920201%2C913302%2C919009%2C911116%2C926403%2C910221%2C901451&ms=au&mv=m&mt=1357996514&cp=U0hUTVBNUF9FUUNONF9IR1RCOk01RjRyaG4wTHdQ&id=afe2819dcace8202&ratebypass=yes&key=yt1&newshard=yes&expire=1358022107&ip=201.52.68.216&ipbits=8&upn=m-kyX9-4Tgc&sparams=cp%2Cid%2Cip%2Cipbits%2Citag%2Cratebypass%2Csource%2Cupn%2Cexpire&itag=44&sver=3&source=youtube,quality=large&signature=A1E7E91DD087067ED59101EF2AE421A3503C7FED.87CBE6AE7FB8D9E2B67FEFA9449D0FA769AEA739
I'm afraid it's not that easy do get the right link for the video resource.
The link you got, http://www.youtube.com/v/r-KBncrOggI?version=3&autohide=1, points to the player rather than the video itself. There is one Perl utility, youtube-download, which is well-maintained and does the trick. This is how to get the HQ version (magic fmt=18) of that video:
stas#Stanislaws-MacBook-Pro:~$ youtube-download -o "{title}.{suffix}" --fmt 18 r-KBncrOggI
--> Working on r-KBncrOggI
Downloading `Sourav Ganguly in Farhan Akhtar's Show - Oye! It's Friday!.mp4`
75161060/75161060 (100.00%)
Download successful!
stas#Stanislaws-MacBook-Pro:~$
There might be better command-line YouTube Downloaders around. But sorry, one doesn't simply download a video using Firebug and wget any more :(
The only way I know to capture that URL manually is by watching the active downloads of the browser:
That largest data chunks are video data, so you can copy its URL:
http://s.youtube.com/s?lact=111116&uga=m30&volume=4.513679238953965&sd=BBE62AA4AHH1357937949850490&rendering=accelerated&fs=0&decoding=software&nsivbblmax=679542.000&hcbt=105.345&sendtmp=1&fmt=35&w=640&vtmp=1&referrer=None&hl=en_US&nsivbblmin=486355.000&nsivbblmean=603805.166&md=1&plid=AATTCZEEeM825vCx&ns=yt&ptk=youtube_none&csipt=watch7&rt=110.904&tsphab=1&nsiabblmax=129097.000&tspne=0&tpmt=110&nsiabblmin=123113.000&tspfdt=436&hbd=30900552&et=110.146&hbt=30.770&st=70.213&cfps=25&cr=BR&h=480&screenw=1440&nsiabblmean=125949.872&cpn=JlqV9j_oE1jzk7Zc&nsivbblc=343&nsiabblc=343&docid=r-KBncrOggI&len=1302.676&screenh=900&abd=1&pixel_ratio=1&bc=26131333&playerw=854&idpj=0&hcbd=25408143&playerh=510&ldpj=0&fexp=920704,919009,922403,916709,912806,929110,928008,920201,901451,909708,913605,925710,916623,929104,913302,910221,911116,914093,922405,929901&scoville=1&el=detailpage&bd=6676317&nsidf=1&vid=Yfg8gnutZoTD4G5SVKCxpsPvirbqG7pvR&bt=40.333&mos=0&vq=auto
However, for a large video, this will only return a part of the stream unless you figure out the URL query parameter responsible for stream range to be downloaded and adjust it.
A bonus: everything changes periodically as YouTube is constantly evolving. So, don't do that manually unless you carve pain.
Related
I want extract video source for example for this movie.
https://www.cda.pl/video/53956781/vfilm
There are hidden video sources. And there is something like security steps to unable do it.
I opened it in firefox with Video Download Helper and I got:
headers
[...
]
id
network-probe:5f8a4b2c
isPrivate
false
length
831760944
pageUrl
https://www.cda.pl/video/53956781/vfilm
referrer
https://www.cda.pl/video/53956781/vfilm
running
0
status
active
tabId
36
thumbnailUrl
https://icdn.2cda.pl/vid/premium/539567/frames/620x365/e86f6548543225cf0f1b61f1e9d730be.jpg
title
Istnienie (2013) Lektor PL 720p - CDA
topUrl
https://www.cda.pl/video/53956781/vfilm
type
video
url
https://vwaw607.cda.pl/539567/v_lq_lq5250559bd547af6bad2eb78dacaf5df4.mp4
urlFilename
v_lq_lq5250559bd547af6bad2eb78dacaf5df4
So there is an video source but If I go to https://vwaw607.cda.pl/539567/v_lq_lq5250559bd547af6bad2eb78dacaf5df4.mp4
it says video cannot be loaded. File is damaged
Could you give me an idea how to extract video url. Thanks
This is somewhat of a duplicate question of Does YouTube API forbid to download video captions if you are not it's owner?, Get YouTube captions and Does YouTube API forbid to download video captions if you are not it's owner?, which all basically say it's not possible unless to download captions via the YouTube API unless you are the owner or third-party contributions are not enabled; however, my question is how to sites like http://downsub.com/ or http://www.lilsubs.com/ have access to all captions?
In other words, when I access the YouTube API myself (even with youtubepartner and youtube.force-ssl scopes), I can only download the captions of some videos, but when I try the same videos that failed for me with 403: The permissions associated with the request are not sufficient to download the caption track. The request might not be properly authorized, or the video order might not have enabled third-party contributions for this caption. on these other sites, it works fine. I'm assuming they are using the YouTube API to access the captions, but what special sauce are they using? Some special partner key? An different API version? Are they just scraping from the videos themselves or something?
Send a GET request on:
http://video.google.com/timedtext?lang={LANG}&v={VIDEOID}
Example for your video in comment: http://video.google.com/timedtext?lang=ko&v=0db1_qWZjRA
Let's look at another example of yours, i.e. https://www.youtube.com/watch?v=7068mw-6lmI (and I agree about differentiation part in your comment).
There are multiple subtitles available for the video
English
Korean
Spanish
Korean (auto-generated) also called asr (automatic speech recognition)
These stand for the subtitle name parameter (i.e., name=English).
lang stands for the country code.
In your example: https://www.youtube.com/api/timedtext?lang=es-MX&v=7068mw-6lmI&name=Spanish
If subtitle track is available, it is possible to do translation form it, namely using tlang parameter.
https://www.youtube.com/api/timedtext?lang=en&v=7068mw-6lmI&name=English&tlang=lv
https://www.youtube.com/api/timedtext?lang=ko&v=7068mw-6lmI&name=Korean&tlang=lv
This would be my bid for what these sites are using, i.e. translation of the available subtitle track (confirm by trying to use a video without subtitle track as input for one of their sites).
As for asr signature seems to always be needed, but as long as one of the subtitle tracks are available, you could use that for translation. E.g. in your OP comment example:
https://www.youtube.com/api/timedtext?lang=en&v=vx6NCUyg1NE&tlang=lv
Looks like the last example is special with both of subtitle tracks being asr (checked with Chrome -> Inspect -> Network) therefore you need to omit the subtitle name parameter part. This difference unfortunately is not visible in YouTube video's settings wheel.
A 2022 answer:
Option 1: Send a curl request to the webpage: curl -L "https://youtu.be/YbJOTdZBX1g", search for timedtext in the result, and you would get a URL. replace \u0026 with & and you get the link for the subtitle.
Option 2: Use the yt-dlp package:
# For installing see: https://github.com/yt-dlp/yt-dlp#with-pip
from yt_dlp import YoutubeDL
ydl_opts = {
"skip_download": True,
"writesubtitles": True,
"subtitleslangs": ["all", "-live_chat"],
# Looks like formats available are vtt, ttml, srv3, srv2, srv1, json3
"subtitlesformat": "json3",
# You can skip the following option
"sleep_interval_subtitles": 1,
}
with YoutubeDL(ydl_opts) as ydl:
ydl.download(["YbJOTdZBX1g"])
There is this unofficial API used by Youtube :
https://www.youtube.com/api/timedtext?lang={LANG}&v={VIDEO_ID}
LANG here is ISO 639-1 2 letter country code. For your example it would be :
https://www.youtube.com/api/timedtext?lang=ko&v=0db1_qWZjRA
You can check it in network tab while toggling the closed caption button :
I have used youtube-transcript-api successfully to retrieve transcripts. The below is a demo to dump the transcript into HTML with links back to the timestamps in the video:
import sys
from youtube_transcript_api import YouTubeTranscriptApi
video_id = sys.argv[1]
# Retrieve the available transcripts
transcript_list = YouTubeTranscriptApi.list_transcripts(video_id)
# Just use the first transcript, let it raise an exception if none exist.
transcript = next(iter(transcript_list))
print("<html><body>")
for line_map in transcript.fetch():
st_sec = int(line_map['start'] / 60)
st_msec = int(line_map['start'] - st_sec * 60)
tstmp = f"{st_sec}:{st_msec}"
link_to_tstmp = f"https://youtu.be/{video_id}?t={st_sec*60}"
tstmp_str = ("%2d:%-2d" % (st_sec, st_msec)).replace(" ", " ")
#print(f"{st_sec}:{st_msec} {line_map['text']}")
print("""%s %s<br/>""" % (link_to_tstmp, tstmp_str, line_map['text']))
print("</html></body>")
If there are multiple transcripts, the library provides API to search by language etc.
You can further tweak the logic to merge text so you only get one link every so many minutes. I got good results for a lecture by linking at every 1 min and format the lines into a HTML table.
I followed this page:
https://cloud.google.com/speech/docs/getting-started
and I could reach the end of it without problems.
In the example though, the file
'uri':'gs://cloud-samples-tests/speech/brooklyn.flac'
is processed.
What if I want to process a local file? In case this is not possible, how can I upload my .flac via command line?
Thanks
You're now able to process a local file by specifying a local path instead of the google storage one:
gcloud ml speech recognize '/Users/xxx/cloud-samples-tests/speech/brooklyn.flac' \ --language-code='en-US'
You can send this command by using the gcloud tool (https://cloud.google.com/speech-to-text/docs/quickstart-gcloud).
Solution found:
I created my own bucket (my_bucket_test), and I upload the file there via:
gsutil cp speech.flac gs://my_bucket_test
If you don't want to create a bucket (costs extra time and money) - you can stream the local files. The following code is copied directly from the Google cloud docs:
def transcribe_streaming(stream_file):
"""Streams transcription of the given audio file."""
import io
from google.cloud import speech
client = speech.SpeechClient()
with io.open(stream_file, "rb") as audio_file:
content = audio_file.read()
# In practice, stream should be a generator yielding chunks of audio data.
stream = [content]
requests = (
speech.StreamingRecognizeRequest(audio_content=chunk) for chunk in stream
)
config = speech.RecognitionConfig(
encoding=speech.RecognitionConfig.AudioEncoding.LINEAR16,
sample_rate_hertz=16000,
language_code="en-US",
)
streaming_config = speech.StreamingRecognitionConfig(config=config)
# streaming_recognize returns a generator.
responses = client.streaming_recognize(
config=streaming_config,
requests=requests,
)
for response in responses:
# Once the transcription has settled, the first result will contain the
# is_final result. The other results will be for subsequent portions of
# the audio.
for result in response.results:
print("Finished: {}".format(result.is_final))
print("Stability: {}".format(result.stability))
alternatives = result.alternatives
# The alternatives are ordered from most likely to least.
for alternative in alternatives:
print("Confidence: {}".format(alternative.confidence))
print(u"Transcript: {}".format(alternative.transcript))
Here is the URL incase the package's function names are edited over time: https://cloud.google.com/speech-to-text/docs/streaming-recognize
i'm trying to record a webcam, save it and stream it to a local network.
The Problem is, i want to do this with different compression:
The stream for the local network should only have <400kbit/s, but the other one, which is stored to a local file, should be uncompressed or with up to 10 Mbit/s
So i tried two methods to solve this:
First i played a little bit with the VLC Gui. It is really easy to record the Webcam, then transcode it and save it to a file or/and stream it to the internet. The command line looks like this:
vlc v4l2:///dev/video0 :v4l2-standard= :live-caching=300 :sout="#transcode{vcodec=WMV2,vb=380,fps=1,scale=Automatisch,acodec=none}:duplicate{dst=file{dst=stream.asf,no-overwrite},dst=http{dst=:8080/stream.wmv}}" :sout-keep
But i had the problem that both, the internet stream and the file, are getting compressed. So i changed the order of "duplicate" and "transcode" to:
vlc v4l2:///dev/video0 :v4l2-standard= :live-caching=300 :sout="#duplicate{dst=file{dst=stream.asf,no-overwrite}, dst="transcode{vcodec=WMV2,vb=380,fps=1,scale=Automatisch,acodec=none}:http{dst=:8080/stream.wmv}"}" :sout-keep
My thought: Now i should have a compressed internet stream and the orignal file. But it doesn't stream it to the internet.
So i tried another method: I wanted to stream the original stream to port 8080 and then use two other VLC instances to generate a compressed network stream to port 8008 and a original file. But i cant stream a stream....
So i would be really thankful, if someone has another idea or a hint where my problem is.
Sorry for my english.
Have a nice day.
You are double quoting the :sout. If you plan to use quotes " inside the value then use apostrophe ' to enclose the whole argument like:
:sout='#duplicate{dst=file{...}, dst="transcode{...}:http{dst=:8080/stream.wmv}"}'
If you add a -v (verbose output) at the end of your command you'll see some other issues too like no-overwrite not being recognized. Also,scale=Automatisch should be scale=auto.
Please note that I checked just the syntax and not your encoding parameters.
I used
user#ubuntu:~/Arbeitsfläche$ sudo youtube-dl -citw ytuser: raz\ malca
to download all videos from this channel. But it wodnloads only one:
user#ubuntu:~/Arbeitsfläche$ sudo youtube-dl -citw ytuser: raz\ malca
[generic] ytuser:: Requesting header
ERROR: Invalid URL protocol; please report this issue on https://yt-dl.org/bug . Be sure to call youtube-dl with the --verbose flag and include its complete output. Make sure you are using the latest version; type youtube-dl -U to update.
WARNING: Falling back to youtube search for raz malca . Set --default-search to "auto" to suppress this warning.
[youtube:search] query "raz malca": Downloading page 1
[download] Downloading playlist: raz malca
[youtube:search] playlist raz malca: Collected 1 video ids (downloading 1 of them)
[download] Downloading video #1 of 1
[youtube] Setting language
[youtube] R72vpqqNSDw: Downloading webpage
[youtube] R72vpqqNSDw: Downloading video info webpage
[youtube] R72vpqqNSDw: Extracting video information
I use the newest version of this programm.
My system is ubuntu 13.10 :)
Has anyone a idea ?
First of all, don't run youtube-dl with sudo - normal user rights suffice. There is also absolutely no need to pass in -ctw. But the main problem is that your command line contains a superfluous space after ytuser:. In any case, even if the space weren't there, raz malca is not a valid YouTube user ID. There is, however, a channel named so. Simply pass in that channel's URL into youtube-dl:
youtube-dl -i https://www.youtube.com/channel/UCnmQSqOPhkawAdndZgjfanA
to find out the channel's URL, do an inspect on the user that created the channel. E.g. for the channel mix - o grande amor (Tom Jobim) - Solo Guitar by Chiba Kosei, right click (in Firefox) on Chiba Kosei and select inspect element. Notice that the channel is in the href part. So copy that and your final channel-url is https://youtube.com/channel/UC5q9SrhlCJ4YciPTLLNTR5w
a href="/channel/UC5q9SrhlCJ4YciPTLLNTR5w" class="yt-uix-sessionlink g-hovercard spf-link " data-ytid="UC5q9SrhlCJ4YciPTLLNTR5w" data-sessionlink="itct=CDIQ4TkiEwjBvtSO7v3KAhWPnr4KHUYMDbko-B0">Chiba Kosei