Automatically decode URLs in Notepad++ - url

I am working with a lot of URL links which I need to decode.
I want to write a macro (or use any other method, really, whatever is easiest) attached to a keyboard shortcut which will automatically decode the urls into readable text.
For example, I want to press CTRL+A and have the result be that all %20 instances are replaced with a space, all & are replaced with an ENTER (\n, going down one row), all %27 replaced with ', etc.
Is there a way to accomplish this in Notepad++?
So far I have been manually changing at least 2-3 codes each time, but it's maddening as I have hundreds of such URLs to work with.
The URLs are sent to me automatically one at a time by a "report broken link" function and arrive as an openURL, example attached below.
Thank you!
URL examples:
ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&url_ver=Z39.88-2004&rfr_id=info%3Asid%2FElsevier%3ASD&svc_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Asch_svc&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.aulast=GILABERTE&rft.auinit=Y&rft.date=2014&rft.issn=00017310&rft.volume=105&rft.issue=3&rft.spage=253&rft.epage=262&rft.title=Actas%20Dermo-Sifiliogr%C3%A1ficas&rft.atitle=Realidades%20y%20retos%20de%20la%20fotoprotecci%C3%B3n%20en%20la%20infancia&rft_id=info%3Adoi%2F10.1016%2Fj.ad.2013.05.004
And this is how it looks like after changing & and %20:
ctx_ver=Z39.88-2004
ctx_enc=info%3Aofi%2Fenc%3AUTF-8
url_ver=Z39.88-2004
rfr_id=info%3Asid%2FElsevier%3ASD
svc_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Asch_svc
rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal
rft.aulast=GILABERTE
rft.auinit=Y
rft.date=2014
rft.issn=00017310
rft.volume=105
rft.issue=3
rft.spage=253
rft.epage=262
rft.title=Actas Dermo-Sifiliogr%C3%A1ficas
rft.atitle=Realidades y retos de la fotoprotecci%C3%B3n en la infancia
rft_id=info%3Adoi%2F10.1016%2Fj.ad.2013.05.004
As you can see this is much more readable for extracting details, but there are still many more codes in there that need changing.

Lets say you have your single URL or some more in a text file line by line you'd go through following steps:
Ctrl+A
Plugins > MIME Tools > URL Decode
Ctrl+H
Find what: &
Replace with: \n
Search mode: Regular expression
Click on Replace All
If you really need a shortcut like e.g. Ctrl+Alt+A you may use:
Macro > Start Recording the steps from above and Save Currently Recorded Macro when finished.
The snaps below show your sample source and the resulting text.

Related

Strange encoding of non-ASCII character in URL

My mother recently received the following scam message on Whatsapp (I have added multiple extra m's to the end of the link below in case of accidental clicks. Obviously the original link itself ended only with .com):
British Airways is giving free 5000 tickets to celebrate its birthday. Get your free ticket at : http://www.briṭishairways.commmmmmmm/
It seems to be a legitimate link to the British Airways URL, especially since Whatsapp doesn't allow link obfuscation (i.e. sending someone to SO by clicking http://www.google.com isn't possible).
However, a careful look will show that the t is in fact a ṭ (Latin T with dot below). Also, if one either hovers the mouse over the link in Chrome, the URL that appears at the bottom-left corner of the screen is in fact http://www.xn--briishairways-rt1g.commmmmmmm/. This is also the output from doing a right-click > Copy Link Address. (Try it yourself!)
Also, if I edit the body of the link, the rt1g part changes, as if it's a counter for where to put the dot. For example:
briṭishairways = xn--briishairways-rt1g
briṭishairway = xn--briishairway-vk5f
riṭishairway = xn--riishairway-yb9e
rṭishairay = xn--rishairay-5s6d
What's especially odd is that the Wikipedia link I used for Latin T with dot below also uses the same character (well, the capitalized version of it), and the URL shown on mouse-hover does not have this effect. (Try it yourself!)
What is going on here?

Add alternative text to a phrase in a document file

I use LibreOffice Writer and I want to insert an alternative text to a specific phrase in the document, how can I do it?
Example if we have an image in the document we can make double left click and add the alternative text like this:
Is it possible to make the same if we select a whole phrase of text? If yes how? And if No is there any other proposal?
The alternative text in 'word'/odt documents is actually intended as the 'alt' attribute in HTML (web) pages:
The alt attribute provides alternative information for an image if a
user for some reason cannot view it (because of slow connection, an
error in the src attribute, or if the user uses a screen reader).
(http://www.w3schools.com/tags/att_img_alt.asp)
It's only purpuse is thus to provide the user with information in case he/she can not view the image. Since having alternative text in case some text cannot be displayed is, well, silly, this 'alt' attribute is not defined for pieces of text. Alternatively, you could have a hyperlink pointing to nothing ("#"), which does provide a tooltip attribute.
What is it that you're intending to achieve anyway? It's not going to show up on any prints, which is the intended purpose of Writer... Footnotes (for prints) or Comments (for communication with co-editors) might suit you better.

Get closed caption "cc" for Youtube video

Does any one know how to get the CC for any Youtube video that has the caption available? I know on the API 2.0 documentation mentions that it is only available for the owner of the video... but I was able to get some of the video's caption even though I'm not the owner of any....
There are two APIs (or links to API) can be used. they both rout to timpedtext API.
before I mention them we should note the parameters the API need. which are:
lang: {en, fr,...} required.
v: {video ID} required.
name: the track name, Required only if it is set. (and with this is my problem.)
tlang: translation to language. optional (should be set if you like to translate the CC to other language.
The API links are:
http://video.google.com/timedtext?lang=fr&v=PILzP-bIeLo&name=french
Note the above example would return nothing if you remove the name=French or set it to something else...
http://www.youtube.com/api/timedtext?v=zzfCVBSsvqA&lang=en
Note this example would return nothing if you set the name=...
http://www.youtube.com/api/timedtext?v=ZdP0KM49IVk&lang=en
yet the actual video has caption.
Example 3 does not return the CC data.
So I'm guessing that example 3 need to have the name parameter set. and my main problem is how do I find the name parameter if it is set or not. and if it is set how do I know what is it?
[update]: This was the preferred method until google recently discontinued it (writing as of dec 2021).
Your first example should work without the name= part.
This did the job for me:
video.google.com/timedtext?lang={languageID}&v={videoId}
To fetch the english CC version from the previous answer, it would look like this:
http://video.google.com/timedtext?lang=en&v=zzfCVBSsvqA
You can get the list of available captions with http://video.google.com/timedtext?type=list&v=zzfCVBSsvqA request.
Your 3rd video has only automatically generated captions, which you cannot fetch easily.
Here my suggestions after spending some time:
Js library: https://github.com/syzer/youtube-captions-scraper => support auto-generated caption.
2 quick methods below not support auto-generated caption
Get a list of subtitles: http://video.google.com/timedtext?type=list&v=lT3vGaOLWqE
Get subtitle with track id: http://video.google.com/timedtext?type=track&v=lT3vGaOLWqE&id=0&lang=en
Quick download:
http://downsub.com/?url=https%3A%2F%2Fwww.youtube.com%2Fwatch%3Fv%3Dag_EJRhMfOM
If video.google.com does not fetch your closed caption file OR you don't want your file in XML format, but would rather SRT (see note below), try:
CC SUBS
NOTE: SRT can be transformed into virtually ANY format - either using free subtitling tools OR
by replacing \n\n with |, \n with ; and then | into \n, you get a CSV file that can be opened in a spreadsheet, for example.

special character coming when i am using & and p

I dont know what exactly i have to type in title for this ,i tried my best
anyway coming to topic
I am making one acc checker for that purpose ,i am sending user and pass from my bsskinedit1 and bsskinedit2
here is my code
s:='http:\\site.com..premlogin='+bsskinedit.text+'&password='+bsskinedit2.text
but it giving some error ,then i used showmessage whats wrong with it then i came with strange result
see below
observer after 4 & and p combining together and appearing as a some new symbol :(
can any one tell me why its coming like this ?
Your code (where you build the URL) is most likely correct (I guess the above has some typos?!), but when you display the URL in a label for instance, the & character is treated as indicator for an accelerator key.
By Windows design, accelerator keys
enable the user to access a menu
command from the keyboard by pressing
Alt along the appropriate letter,
indicated in your code by the
preceding ampersand (&). The character
after the and sign (&) appears
underlined in the menu.
If you want to display the & character itself, you have to set your string variable to &&.
By placing two ampesands together - you state that the character following the first one is not used as the accelerator - rather you actually want to get it (only one) displayed.
Just use your debugger if you want to see the real value that your string variables have, don't output them to a message box or the like... It may have side effects, as you can see.
Regarding the URL you build: I can't possibly know how it has to be correctly, but at least you should use the right slashes!
s := 'http://site.com...'
(All quotes from delphi.about.com)
In addition to what Mef said, you can use OutputDebugString to add your string to the event log in its raw form, so you don't need to modify it before displaying it. Delphi should capture those strings automatically if you're running from the debugger. If you aren't running it from Delphi you can use DebugView instead, which captures the messages from any running applications.

extract text from word or pdf based on format (font name and size)

I need to parse large text (about 1000 pages of word or pdf document)and place some of the text inside this document into database fields
I found that the only thing I can distinguish the text I want to extract is the format , it is always "Helvetica-Condensed" size 12
can I do that ? I know how to use the string functions but what I should use to test the format ?
as I said the text is stored inside word document or PDF
if there is third party component can do no problem please refer it to me
Thanks
There is QuickPDF. The price is $249,00.
The other option is to code it yourself. The file specification is available online, and if your only trying to rip the text out of the document this should guide you most of the way.
The only thing to be careful of are documents which are built entirely from images. In that scenario (no matter what you use to read the file) you will also need an OCR type of application. To see if this is the case or not, open a sample of the type of file you are wanting to "extract" text from, select the text to copy then try to paste into notepad.

Resources