Somehow the Convert routine screws up my graphics... It used to work before. Does anyone have a clue what might have gone wrong (I installed the latest version of ImageMagick 7.0.5-10 and imagick 3.4.3, image converted through commandline)
The input SVG
http://defreule.info/scale.svg
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 35.55 35.72"><defs><style>.a{fill:#706f6f;}</style></defs><title>youtube</title><path class="a" d="M17.61,15.42a0.44,0.44,0,0,0,.31-0.11A0.39,0.39,0,0,0,18,15V12.43a0.31,0.31,0,0,0-.12-0.25,0.48,0.48,0,0,0-.31-0.1,0.43,0.43,0,0,0-.29.1,0.32,0.32,0,0,0-.11.25V15a0.41,0.41,0,0,0,.1.3A0.38,0.38,0,0,0,17.61,15.42Z"/><path class="a" d="M19.75,20.2a0.85,0.85,0,0,0-.4.1,1.38,1.38,0,0,0-.37.3V18.72H18.11v5.84H19V24.22a1.1,1.1,0,0,0,.37.29,1,1,0,0,0,.45.09,0.71,0.71,0,0,0,.58-0.24,1.07,1.07,0,0,0,.2-0.7V21.28a1.29,1.29,0,0,0-.21-0.81A0.74,0.74,0,0,0,19.75,20.2Zm0,3.35a0.47,0.47,0,0,1-.07.3,0.28,0.28,0,0,1-.23.09,0.48,0.48,0,0,1-.21,0,0.73,0.73,0,0,1-.2-0.15V21.05a0.64,0.64,0,0,1,.18-0.13,0.45,0.45,0,0,1,.18,0,0.33,0.33,0,0,1,.27.11,0.51,0.51,0,0,1,.09.33v2.22Z"/><polygon class="a" points="11.65 19.57 12.65 19.57 12.65 24.55 13.62 24.55 13.62 19.57 14.63 19.57 14.63 18.72 11.65 18.72 11.65 19.57"/><path class="a" d="M16.6,23.52a1.22,1.22,0,0,1-.27.23,0.49,0.49,0,0,1-.24.09,0.21,0.21,0,0,1-.18-0.07,0.37,0.37,0,0,1-.05-0.22v-3.3H15v3.6a0.92,0.92,0,0,0,.15.58,0.53,0.53,0,0,0,.45.19,1,1,0,0,0,.5-0.14,1.85,1.85,0,0,0,.5-0.4v0.47h0.86V20.25H16.6v3.27Z"/><path class="a" d="M17.76,1a17,17,0,1,0,17,17A17,17,0,0,0,17.76,1Zm2.05,10.38h1V15a0.39,0.39,0,0,0,.06.24,0.23,0.23,0,0,0,.2.08,0.56,0.56,0,0,0,.27-0.1A1.35,1.35,0,0,0,21.6,15v-3.6h1v4.74h-1V15.6A2,2,0,0,1,21,16a1.23,1.23,0,0,1-.56.15A0.61,0.61,0,0,1,20,16a1,1,0,0,1-.17-0.63v-4Zm-3.6,1.08a1.08,1.08,0,0,1,.39-0.88,1.59,1.59,0,0,1,1.05-.33,1.43,1.43,0,0,1,1,.34,1.14,1.14,0,0,1,.38.89v2.45a1.24,1.24,0,0,1-.37,1,1.48,1.48,0,0,1-1,.35,1.43,1.43,0,0,1-1-.36,1.26,1.26,0,0,1-.38-1V12.47ZM13.57,9.7l0.71,2.57h0.07L15,9.7h1.11l-1.27,3.76v2.67H13.77V13.58L12.47,9.7h1.1ZM26.26,23.23a3.11,3.11,0,0,1-3.11,3.11H12.36a3.11,3.11,0,0,1-3.11-3.11v-2.5a3.11,3.11,0,0,1,3.11-3.11h10.8a3.11,3.11,0,0,1,3.11,3.11v2.5Z"/><path class="a" d="M22.34,20.14a1.3,1.3,0,0,0-.94.35,1.21,1.21,0,0,0-.37.91v1.93a1.37,1.37,0,0,0,.33,1,1.18,1.18,0,0,0,.91.36,1.28,1.28,0,0,0,1-.33,1.38,1.38,0,0,0,.32-1V23.12H22.68v0.2a0.94,0.94,0,0,1-.09.49,0.36,0.36,0,0,1-.3.11A0.32,0.32,0,0,1,22,23.79a1,1,0,0,1-.08-0.47V22.5h1.65v-1.1a1.29,1.29,0,0,0-.32-0.94A1.21,1.21,0,0,0,22.34,20.14Zm0.34,1.69H21.92V21.4A0.63,0.63,0,0,1,22,21a0.34,0.34,0,0,1,.29-0.12,0.33,0.33,0,0,1,.29.12,0.64,0.64,0,0,1,.09.39v0.43Z"/></svg>
Resulting Image
convert scale.svg test.png
Update
It seems that when i use file > save as in Illustrator to save my .svg the converter does work fine. However when I use file>export the svg looks fine, but converting to png fails. I am still interested in why this is.
Is it possible to extract the closed caption transcript from YouTube videos?
We have over 200 webcasts on YouTube and each is at least one hour long. YouTube has closed caption for all videos but it seems users have no way to get it.
I tried the URL in this blog but it does not work with our videos.
http://googlesystem.blogspot.com/2010/10/download-youtube-captions.html
Here's how to get the transcript of a YouTube video (when available):
Go to YouTube and open the video of your choice.
Click on the "More actions" button (3 horizontal dots) located next to the Share button.
Click "Open transcript"
Although the syntax may be a little goofy this is a pretty good solution.
Source: http://ccm.net/faq/40644-youtube-how-to-get-the-transcript-of-a-video
Get timedtext file directly from YouTube
curl -s "$video_url"|grep -o '"baseUrl":"https://www.youtube.com/api/timedtext[^"]*lang=en'|cut -d \" -f4|sed 's/\\u0026/\&/g'|xargs curl -Ls|grep -o '<text[^<]*</text>'|sed -E 's/<text start="([^"]*)".*>(.*)<.*/\1 \2/'|sed 's/\xc2\xa0/ /g;s/&/\&/g'|recode xml|awk '{$1=sprintf("%02d:%02d:%02d",$1/3600,$1%3600/60,$1%60)}1'|awk 'NR%n==1{printf"%s ",$1}{sub(/^[^ ]* /,"");printf"%s"(NR%n?FS:RS),$0}' n=2|awk 1
yt-dlp
yt-dlp supports saving the automatically generated closed captions in a JSON format:
cap()(printf %s\\n "${#-$(cat)}"|parallel -j10 -q yt-dlp -i --skip-download --write-auto-sub --sub-format json3 -o '%(upload_date)s.%(title)s.%(uploader)s.%(id)s.%(ext)s' --;for f in *.json3;do jq -r '.events[]|select(.segs and .segs[0].utf8!="\n")|(.tStartMs|tostring)+" "+([.segs[]?.utf8]|join(""))' "$f"|awk '{x=$1/1e3;$1=sprintf("%02d:%02d:%02d",x/3600,x%3600/60,x%60)}1'|awk 'NR%n==1{printf"%s ",$1}{sub(/^[^ ]* /,"");printf"%s"(NR%n?FS:RS),$0}' n=2|awk 1 >"${f%.json3}";rm "$f";done)
You can also use the function above to download the captions for all videos on a channel or playlist if you give the ID or URL of the channel or playlist as an argument. When there is an error downloading a single video, the -i (--ignore-errors) option skips the video instead of exiting with an error.
Or this just gets the text without the timestamps:
yt-dlp --skip-download --write-auto-sub --sub-format json3 $youtube_url_or_id;jq -r '.events[]|select(.segs and.segs[0].utf8!="\n")|[.segs[].utf8]|join("")' *json3|paste -sd\ -|fold -sw60
youtube-dl
As of 2022, the format of the VTT and TTML downloaded by youtube-dl --write-auto-sub is messed up so that all subtitle texts are placed under a few long lines so that the timestamps of the subtitles are not visible. If you don't need the timestamps, then it shouldn't matter, but otherwise you can fix it by substituting yt-dlp for youtube-dl in the following commands. But with yt-dlp, you can also use a more convenient JSON format, so you don't need the following approach to deal with the VTT subtitle format.
This downloads the subtitles as VTT:
youtube-dl --skip-download --write-auto-sub $youtube_url
The other available formats are ttml, srv3, srv2, and srv1 (shown by --list-subs):
--write-sub
Write subtitle file
--write-auto-sub
Write automatically generated subtitle file (YouTube only)
--all-subs
Download all the available subtitles of the video
--list-subs
List all available subtitles for the video
--sub-format FORMAT
Subtitle format, accepts formats preference, for example: "srt" or "ass/srt/best"
--sub-lang LANGS
Languages of the subtitles to download (optional) separated by commas, use --list-subs for available language tags
You can use ffmpeg to convert the subtitle file to another format:
ffmpeg -i input.vtt output.srt
In the VTT subtitles, each subtitle text is repeated three times, and there is typically a new subtitle text every eighth line (but under some mysterious circumstances it's every 12th line instead):
WEBVTT
Kind: captions
Language: en
00:00:01.429 --> 00:00:04.249 align:start position:0%
ladies<00:00:02.429><c> and</c><00:00:02.580><c> gentlemen</c><c.colorE5E5E5><00:00:02.879><c> I'd</c></c><c.colorCCCCCC><00:00:03.870><c> like</c></c><c.colorE5E5E5><00:00:04.020><c> to</c><00:00:04.110><c> thank</c></c>
00:00:04.249 --> 00:00:04.259 align:start position:0%
ladies and gentlemen<c.colorE5E5E5> I'd</c><c.colorCCCCCC> like</c><c.colorE5E5E5> to thank
</c>
00:00:04.259 --> 00:00:05.930 align:start position:0%
ladies and gentlemen<c.colorE5E5E5> I'd</c><c.colorCCCCCC> like</c><c.colorE5E5E5> to thank
you<00:00:04.440><c> for</c><00:00:04.620><c> coming</c><00:00:05.069><c> tonight</c><00:00:05.190><c> especially</c></c><c.colorCCCCCC><00:00:05.609><c> at</c></c>
00:00:05.930 --> 00:00:05.940 align:start position:0%
you<c.colorE5E5E5> for coming tonight especially</c><c.colorCCCCCC> at
</c>
00:00:05.940 --> 00:00:07.730 align:start position:0%
you<c.colorE5E5E5> for coming tonight especially</c><c.colorCCCCCC> at
such<00:00:06.180><c> short</c><00:00:06.690><c> notice</c></c>
00:00:07.730 --> 00:00:07.740 align:start position:0%
such short notice
00:00:07.740 --> 00:00:09.620 align:start position:0%
such short notice
I'm<00:00:08.370><c> sure</c><c.colorE5E5E5><00:00:08.580><c> mr.</c><00:00:08.820><c> Irving</c><00:00:09.000><c> will</c><00:00:09.120><c> fill</c><00:00:09.300><c> you</c><00:00:09.389><c> in</c><00:00:09.420><c> on</c></c>
00:00:09.620 --> 00:00:09.630 align:start position:0%
I'm sure<c.colorE5E5E5> mr. Irving will fill you in on
</c>
00:00:09.630 --> 00:00:11.030 align:start position:0%
I'm sure<c.colorE5E5E5> mr. Irving will fill you in on
the<00:00:09.750><c> circumstances</c><00:00:10.440><c> that's</c><00:00:10.620><c> brought</c><00:00:10.920><c> us</c></c>
00:00:11.030 --> 00:00:11.040 align:start position:0%
<c.colorE5E5E5>the circumstances that's brought us
</c>
This converts the VTT subtitles to a simpler format:
sed '1,/^$/d' *.vtt| # remove the lines at the top of the file
sed 's/<[^>]*>//g'| # remove tags
awk -F. 'NR%4==1{printf"%s ",$1}NR%4==3' | # print each new subtitle text and its start time without milliseconds
awk NF\>1 # remove lines with only one field
Output:
00:00:01 ladies and gentlemen I'd like to thank
00:00:04 you for coming tonight especially at
00:00:05 such short notice
00:00:07 I'm sure mr. Irving will fill you in on
00:00:09 the circumstances that's brought us
In maybe around 10% of videos that I tested with (like for example p9M3shEU-QM and aE05_REXnBc), there were one or more subtitle texts which came 12 and not 8 lines after the previous subtitle text. But a workaround is to print every fourth line but to then remove empty lines.
Function form:
cap()(printf %s\\n "${#-$(cat)}"|parallel -j10 -q youtube-dl -i --skip-download --write-auto-sub -o '%(upload_date)s.%(title)s.%(uploader)s.%(id)s.%(ext)s' --;for f in *.vtt;do sed '1,/^$/d' -- "$f"|sed 's/<[^>]*>//g'|awk -F. 'NR%4==1{printf"%s ",$1}NR%4==3'|awk 'NF>1'|awk 'NR%n==1{printf"%s ",$1}{sub(/^[^ ]* /,"");printf"%s"(NR%n?FS:RS),$0}' n=2|awk 1 >"${f%.vtt}";rm "$f";done)
Following document says only the owner of the channel can do this via standard youtube interface:
https://developers.google.com/youtube/2.0/developers_guide_protocol_captions?hl=en
Cheap fix:
You can click on the "interactive transscript" button - and copy the content this way.
Of course you lose the milliseconds this way.
Extremely cheap fix:
A shared youtube account -
so that multiple people can edit and upload caption files.
Challenging solution:
The youtube API allows downloading and uploading of caption files via HTTP...
You may write a youtube API application to provide a browser user interface for uploading or downloading for ANY user or particular users.
Here is an example project for this in java
http://apiblog.youtube.com/2011/01/youtube-captions-uploader-web-app.html
Here is very simple example of a working upload for everybody:
http://yt-captions-uploader.appspot.com/
You can view/copy/download a timecoded xml file of a youtube's closed captions file by accessing
http://video.google.com/timedtext?lang=[LANGUAGE]&v=[YOUTUBE VIDEO IDENTIFIER]
For example http://video.google.com/timedtext?lang=pt&v=WSVKbw7LC2w
NOTE: this method does not download autogenerated closed captions, even if you get the language right (maybe there's a special code for autogenerated languages).
You can download the streaming subtitles from YouTube with KeepSubs DownSub and SaveSubs.
You can choose from the Automatic Transcript or author supplied close captions. It also offers the possibility to automatically translate the English subtitles into other languages using Google Translate.
(Obligatory 'this is probably an internal youtube.com interface and might break at any time')
Instead of linking to another tool that does this, here's an answer to the question of "how to do this"
Use fiddler or your browser devtools (e.g.
Chrome) to inspect the youtube.com HTTP traffic, and there's a response from /api/timedtext that contains the closed caption info as XML.
It seems that a response like this:
<p t="0" d="5430" w="1">
<s p="2" ac="136">we've</s>
<s t="780" ac="252"> got</s>
</p>
<p t="2280" d="7170" w="1">
<s ac="243">we're</s>
<s t="810" ac="233"> going</s>
</p>
means at time 0 is the word we've and at time 0+780 is the word got and at time 2280+810 is the word going, etc. This time is in milliseconds so for time 3090 you'd want to append &t=3 to the URL.
You can use any tool to stitch together the XML into something readable, but here's my Power BI Desktop script to find words like "privilege":
let
Source = Xml.Tables(File.Contents("C:\Download\body.xml")),
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Attribute:format", Int64.Type}}),
body = #"Changed Type"{0}[body],
p = body{0}[p],
#"Changed Type1" = Table.TransformColumnTypes(p,{{"Attribute:t", Int64.Type}, {"Attribute:d", Int64.Type}, {"Attribute:w", Int64.Type}, {"Attribute:a", Int64.Type}, {"Attribute:p", Int64.Type}}),
#"Expanded s" = Table.ExpandTableColumn(#"Changed Type1", "s", {"Attribute:ac", "Attribute:p", "Attribute:t", "Element:Text"}, {"s.Attribute:ac", "s.Attribute:p", "s.Attribute:t", "s.Element:Text"}),
#"Changed Type2" = Table.TransformColumnTypes(#"Expanded s",{{"s.Attribute:t", Int64.Type}}),
#"Removed Other Columns" = Table.SelectColumns(#"Changed Type2",{"s.Attribute:t", "s.Element:Text", "Attribute:t"}),
#"Replaced Value" = Table.ReplaceValue(#"Removed Other Columns",null,0,Replacer.ReplaceValue,{"s.Attribute:t"}),
#"Filtered Rows" = Table.SelectRows(#"Replaced Value", each [#"s.Element:Text"] <> null),
#"Added Custom" = Table.AddColumn(#"Filtered Rows", "Time", each [#"Attribute:t"] + [#"s.Attribute:t"]),
#"Filtered Rows1" = Table.SelectRows(#"Added Custom", each ([#"s.Element:Text"] = " privilege" or [#"s.Element:Text"] = " privileged" or [#"s.Element:Text"] = " privileges" or [#"s.Element:Text"] = "privilege" or [#"s.Element:Text"] = "privileges"))
in
#"Filtered Rows1"
There is a free python tool called YouTube transcript API
You can use it in scripts or as a command line tool:
pip install youtube_transcript_api
With the YouTube video updated as of June 2020 it's very straight forward
select on the 3 dots next to like/dislike buttons to open further menu options
select "add translations"
select language
click autogenerate if needed
click Actions > Download
You will get and .sbv file
Choose Open Transcript from the ... dropdown to the right of the vote up/down and share links.
This will open a Transcript scrolling div on the right side.
You can then use Copy. Note that you cannot use Select All but need to click the top line, then scroll to the bottom using the scroll thumb, and then shift-click on the last line.
Note that you can also search within this text using the normal web page search.
I just got this easily done manually by opening the transcript at the beginning of the video and left-clicking and dragging at the time 00:00 marker with the shift key pressed over a few lines at the beginning.
I then advanced the video to near the end. When the video stopped, I clicked the end of the last sentence whilst holding down the shift key once more. With CTRL-C I copied the text to the clipboard and pasted it into an editor.
Done!
Caveat: Be sure to have no RDP-Windows sharing the clipboard or Software such as Teamviewer is running at the same time as this procedure will overflow their buffers where a large amount of text is copied.
I've seen several iPhone/iPad apps that show animated kanji. For those of you who are unfamiliar with kanji, stroke order is a very important part of kanji studying so if you are doing an app showing the animated stroke order is an essential part.
All the apps I've seen that do this, credit the KanjiVG project as their source for the stroke order data. After some research I found that the KanjiVG project gives you the data in SVG format encoded in XML.
Having never programmed graphics before (and being relatively new to iOS) I'm at a loss to where to keep looking for info.
I think I need to:
Parse the XML into SVG.
Render the SVG.
...but I'm not sure. For what I could see how this is done in the iPhone/iPad apps I bought, the animations all look surprisingly similar so there must be a common library that these guys are using that I'm failing to find (probably because I don't know exactly what I'm looking for!)
Any pointers that anyone can give me will be greatly appreciated.
Cheers!
This ended up being SO MUCH easier than I originally thought. The XML provided by the KanjiVG project not only contains all the kanji "parts" but the SVG data aswell!
So you get this:
<kanji midashi="会" id="4f1a">
<strokegr element="会">
<strokegr element="人" position="top" radical="general">
<stroke type="㇒" path="M52.25,14c0.25,2.28-0.52,3.59-1.8,5.62c-5.76,9.14-17.9,27-39.2,39.88"/>
<stroke type="㇏" path="M54.5,19.25c6.73,7.3,24.09,24.81,32.95,31.91c2.73,2.18,5.61,3.8,9.05,4.59"/>
</strokegr>
<strokegr element="云" position="bottom">
<strokegr element="二">
<stroke type="㇐" path="M37.36,50.16c1.64,0.34,4.04,0.36,4.98,0.25c6.79-0.79,14.29-1.91,19.66-2.4c1.56-0.14,3.25-0.39,4.66,0"/>
<stroke type="㇐" path="M23,65.98c2.12,0.52,4.25,0.64,7.01,0.3c13.77-1.71,30.99-3.66,46.35-3.74c3.04-0.02,4.87,0.14,6.4,0.29"/>
</strokegr>
<strokegr element="厶">
<stroke type="㇜" path="M47.16,66.38c0.62,1.65-0.03,2.93-0.92,4.28c-5.17,7.8-8.02,11.38-14.99,18.84c-2.11,2.25-1.5,4.18,2,3.75c7.35-0.91,28.19-5.83,40.16-7.95"/>
<stroke type="㇔" path="M66.62,77.39c4.52,3.23,11,12.73,13.06,18.82"/>
</strokegr>
</strokegr>
</strokegr>
</kanji>
And if you create your SVG file out of only the path attributes of the stroke nodes then you get a nice SVG drawing! Like this:
<svg width="100" height="100" viewBox="0 0 100 100" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" xml:space="preserve" version="1.1" baseProfile="full">
<path d="M52.25,14c0.25,2.28-0.52,3.59-1.8,5.62c-5.76,9.14-17.9,27-39.2,39.88" style="fill:none;stroke:black;stroke-width:2" />
<path d="M54.5,19.25c6.73,7.3,24.09,24.81,32.95,31.91c2.73,2.18,5.61,3.8,9.05,4.59" style="fill:none;stroke:black;stroke-width:2" />
<path d="M37.36,50.16c1.64,0.34,4.04,0.36,4.98,0.25c6.79-0.79,14.29-1.91,19.66-2.4c1.56-0.14,3.25-0.39,4.66,0" style="fill:none;stroke:black;stroke-width:2" />
<path d="M23,65.98c2.12,0.52,4.25,0.64,7.01,0.3c13.77-1.71,30.99-3.66,46.35-3.74c3.04-0.02,4.87,0.14,6.4,0.29" style="fill:none;stroke:black;stroke-width:2" />
<path d="M47.16,66.38c0.62,1.65-0.03,2.93-0.92,4.28c-5.17,7.8-8.02,11.38-14.99,18.84c-2.11,2.25-1.5,4.18,2,3.75c7.35-0.91,28.19-5.83,40.16-7.95" style="fill:none;stroke:black;stroke-width:2" />
<path d="M66.62,77.39c4.52,3.23,11,12.73,13.06,18.82" style="fill:none;stroke:black;stroke-width:2" />
</svg>
Copy the above SVG XML and paste it into a plain text file. Name this file something that ends in .svg and drag it into Firefox. There it is, a graphic representation of the Kanji!
So now that I have all the "raw" SVG info it's just a matter of finding the appropriate SVG renderer.
I wrote a javascript renderer for KanjiVG data a few years back that animates stokes. It might serve as a working example for you or even a solution depending on what you want to do.
The approach I took was to break the KanjiVG stroke data into a set of javascript data files, write my own code for drawing cubic and quadratic curves and then write an event queue function that takes kanji, looks them up and enqueues the rendering of each stroke in an array.
The source is not obfuscated in any way and contains the odd comment. Best of luck!
I'm interested in this as well. Have you gotten any further?
I am able to display the kanji from an svg file this way by creating a UIwebview.
In this example my file is k1.svg, and the image was drawn after hitting a button (the sender). I'm working on getting it animated now
-(void) doSVG:(id)sender {
NSString *filePath = [[NSBundle mainBundle]
pathForResource:#"k1" ofType:#"svg"];
NSURL *fileURL = [[NSURL alloc] initFileURLWithPath:filePath];
NSURLRequest *req = [NSURLRequest requestWithURL:fileURL];
//[kanjiView setScalesPageToFit:YES];
[yourWebView loadRequest:req];
}
My question to you is, have you found an easy way to get all the svg info out of the xml file from kanjiVG?
There is no need to translate the KanjiVG data into SVG data, because it already is SVG data. From their wiki:
Any KanjiVG file is 100% SVG-compliant and can be opened by one's favorite SVG viewer/editor to be seen as-is.
The reason the data looks so different is that they're using SVG Groups to store extra information. But the KanjiVG data itself should still render fine using any standards-compliant SVG library.
You can also consider "AnimCJK project" (https://github.com/parsimonhi/animCJK) as source for kanji svg data. The principle is slightly different from "KanjiVG project" (http://kanjivg.tagaini.net/) and the underlying font is different (kaisho style), but extracting the path data is simple. Using a regex such as #<path[^>]+id="z[0-9]+d[0-9]+"[^>]+d="([^"]+)"[^>]+># does the job.