how to obtain raw URL path in rust? - parsing

Let's say I have the URL:
http://host/path/to/../some/thing
I want to parse the URL and get the raw/untouched path string:
/path/to/../some/thing
Using, url::Url, I can do:
let url = url::Url::parse("http://host/path/to/../some/thing").unwrap();
println!("{}", url.path());
But this gives a normalized version of the path:
/path/some/thing
url::Url is doing more than I want it to do. (It also percent encodes things.)
Some other langs make it easy, for example in PHP, I can just do:
$ php -r 'echo parse_url( "http://host/path/to/../some/thing" )["path"] . "\n";'
/path/to/../some/thing
So, can anyone point me towards another URL parsing crate that can do this, and is known to be solid/tested?

Related

LibreOffice: embed script in script URL

In LibreOffice, It is possible to run python scripts like this:
sURL = "vnd.sun.star.script:file.function?language=Python&location=document"
oScript = scriptProv.getScript(sURL)
x = oScript.Invoke(args, Array(), Array())
In that example 'file' is a filename, and 'function' is the name of a function in that file.
Is it possible to embed script in that URL? sURL="vnd.." & scriptblock & "?language.."
(It seems like the kind of thing that might be possible with the correct URL, or might not be possible if just not supported).
We can use Python's eval() function. Here is an example inspired by JohnSUN's explanation in the discussion. Note: xray() uses XrayTool to show output, but you could replace that line with any output method of your choosing, such as writing to a file.
def runArbitraryCode(*args):
url = args[0]
codeString = url.split("&codeToRun=")[1]
x = eval(codeString)
xray(x)
Now enter this formula in Calc and Ctrl+click on it.
=HYPERLINK("vnd.sun.star.script:misc_examples.py$runArbitraryCode?language=Python&location=user&codeToRun=5+1")
Result: 6
Obligatory caveat: Running eval() on an unknown string is about the worst idea imaginable in terms of security. So hopefully you're the one controlling the URL and not some black hat hacker!

Katakana character ジ in URL being encoded incorrectly

I need to construct a URL with a string path received from my application server which contains the character: ジ
However, in Swift, the fileURLWithPath seems to encode it incorrectly.
let path = "ジ"
print(URL(fileURLWithPath: path))
print(URL(fileURLWithPath: path.precomposedStringWithCanonicalMapping))
Both print:
%E3%82%B7%E3%82%99
This expected URL path should be:
%E3%82%B8
What am I missing or doing wrong? Any help is appreciated.
There are two different characters, ジ and ジ. They may look the same, but they have different internal representations.
The former is “katakana letter zi”, comprised of a single Unicode scalar which percent-encodes as %E3%82%B8.
The latter is still a single Swift character, but is comprised of two Unicode scalars (the “katakana letter si” and “combining voiced sound mark”), and these two Unicode scalars percent-encode to %E3%82%B7%E3%82%99.
One can normalize characters in a string with precomposedStringWithCanonicalMapping, for example. That can convert a character with the two Unicode scalars into a character with a single Unicode scalar.
But your local file system (or, init(fileURLWithPath:), at least) decomposes diacritics. It is logical that the local file system ensures that diacritics are encoded in some consistent manner. (See Diacritics in file names on macOS behave strangely.) The fact that they are decomposed rather than precomposed is, for the sake of this discussion, a bit academic. When you send it to the server, you want it precomposed, regardless of what is happening in your local file system.
Now, you tell us that the “url path is rejected by the server”. That does not make sense. One would generally not provide a local file system URL to a remote server. One would generally extract a file name from a local file system URL and send that to the server. This might be done in a variety of ways:
You can use precomposedStringWithCanonicalMapping when adding a filename to a server URL, and it honors that mapping, unlike a file URL:
let path = "ジ" // actually `%E3%82%B7%E3%82%99` variant
let url = URL(string: "https://example.com")!
.appendingPathComponent(path.precomposedStringWithCanonicalMapping)
print(url) // https://example.com/%E3%82%B8
If sending it in the body of a request, use precomposedStringWithCanonicalMapping. E.g. if a filename in a multipart/form-data request:
body.append("--\(boundary)\r\n")
body.append("Content-Disposition: form-data; name=\"\(filePathKey)\"; filename=\"\(filename.precomposedStringWithCanonicalMapping)\"\r\n")
body.append("Content-Type: \(mimeType)\r\n\r\n")
body.append(data)
body.append("\r\n")
Now, those are two random examples of how a filename might be provided to the server. Yours may vary. But the idea is that when you provide the filename, that you precompose the string in its canonical format, rather than relying upon what a file URL in your local file system uses.
But I would advise avoiding URL(fileURLWithPath:) for manipulating strings provided by the server. It is only to be used when actually referring to files within your local file system. If you just want to percent-encode strings, I would advise using the String method addingPercentEncoding(withAllowedCharacters: .urlPathAllowed). That will not override the precomposedStringWithCanonicalMapping output.
you could try this approach using dataRepresentation:
if let path = "ジ".data(using: .utf8),
let url = URL(dataRepresentation: path, relativeTo: nil) {
print("\n---> url: \(url) \n") //---> url: %E3%82%B8
}

Is there a script that can extract particular link from txt and write it in another txt file?

I'm looking for a script (or if there isn't, I guess I'll have to write my own).
I wanted to ask if anyone here knows a script that can take a txt file with n links (lets say 200). I need to extract only links that have particular characters in them, let's say I only need links that contain "/r/learnprogramming". I need the script to get those links and write them to another txt files.
Edit: Here is what helped me: grep -i "/r/learnprogramming" 1.txt >2.txt
you can use ajax to read .txt file using jquery
<script src=https://cdnjs.cloudflare.com/ajax/libs/jquery/2.1.1/jquery.min.js></script>
<script>
jQuery(function($) {
console.log("start")
$.get("https://ayulayol.imfast.io/ajaxads/ajaxads.txt", function(wholeTextFile) {
var lines = wholeTextFile.split(/\n/),
randomIndex = Math.floor(Math.random() * lines.length),
randomLine = lines[randomIndex];
console.log(randomIndex, randomLine)
$("#ajax").html(randomLine.replace(/#/g,"<br>"))
})
})
</script>
<div id=ajax></div>
If you are using linux or macOS you could use cat and grep to output the links.
cat in.txt | grep /r/programming > out.txt
Solution provided by OP:
grep -i "/r/learnprogramming" 1.txt >2.txt
Since you did not provide the exact format of the document I assume those links are separated by newline characters. In this case, the code is pretty straightforward using Python/awk since you can iterate over file.readlines() and print only those that match your pattern (either by using a lines.contains(pattern) or using a regex if the pattern is more complex). To store the links in a new file simply redirect the stdout to a new file like this:
python script.py > links.txt
The solution above works even if links are separated by an arbitrary symbol s, first read the file into a single string and split it over s. I hope this helps.

regex to extract URLs from text - Ruby

I am trying to detect the urls from a text and replace them by wrapping in quotes like below:
original text: Hey, it is a url here www.example.com
required text: Hey, it is a url here "www.example.com"
original text show my input value and required text represents the required output. I searched a lot on web but could not find any possible solution. I already have tried URL.extract feature but that doesn't seem to detect URLs without http or https. Below are the examples of some of urls I want to deal with. Kindly let me know if you know the solution.
ANQUETIL-DUPERRON Abraham-Hyacinthe, KIEFFER Jean-Luc, www.hominides.net/html/actualites/outils-preuve-presence-hominides-asie-0422.php,Les Belles lettres, 2001.
https://www.ancient-code.com/indian-archeologists-stumbleacross-ruins-great-forgotten-civilization-mizoram/
www.jstor.org/stable/24084454
www.biorespire.com/2016/03/22/une-nouvelle-villeantique-d%C3%A9couverte-en-inde/
insu.cnrs.fr/terre-solide/terre-et-vie/de-nouvellesdatations-repoussent-l-age-de-l-apparition-d-outils-surle-so
www.cerege.fr/spip.php?page=pageperso&id_user=94
Find words who look like urls:
str = "ANQUETIL-DUPERRON Abraham-Hyacinthe, KIEFFER Jean-Luc, www.hominides.net/html/actualites/outils-preuve-presence-hominides-asie-0422.php,Les Belles lettres, 2001.\n\nhttps://www.ancient-code.com/indian-archeologists-stumbleacross-ruins-great-forgotten-civilization-mizoram/\n\nwww.jstor.org/stable/24084454\n\nwww.biorespire.com/2016/03/22/une-nouvelle-villeantique-d%C3%A9couverte-en-inde/\n\ninsu.cnrs.fr/terre-solide/terre-et-vie/de-nouvellesdatations-repoussent-l-age-de-l-apparition-d-outils-surle-so\n\nwww.cerege.fr/spip.php?page=pageperso&id_user=94"
str.split.select{|w| w[/(\b+\.\w+)/]}
This will give you an array of words which have no spaces and include a one or more . characters which MIGHT work for your use case.
puts str.split.select{|w| w[/(\b+\.\w+)/]}
www.hominides.net/html/actualites/outils-preuve-presence-hominides-asie-0422.php,
https://www.ancient-code.com/indian-archeologists-stumbleacross-ruins-great-forgotten-civilization-mizoram/
www.jstor.org/stable/24084454
www.biorespire.com/2016/03/22/une-nouvelle-villeantique-d%C3%A9couverte-en-inde/
insu.cnrs.fr/terre-solide/terre-et-vie/de-nouvellesdatations-repoussent-l-age-de-l-apparition-d-outils-surle-so
www.cerege.fr/spip.php?page=pageperso&id_user=94
Updated
Complete solution to modify your string:
str_with_quote = str.clone # make a clone for the `gsub!`
str.split.select{|w| w[/(\b+\.\w+)/]}
.each{|url| str_with_quote.gsub!(url, '"' + url + '"')}
Now your cloned object wraps urls inside double quotes
puts str_with_quote
Will give you this output
ANQUETIL-DUPERRON Abraham-Hyacinthe, KIEFFER Jean-Luc, "www.hominides.net/html/actualites/outils-preuve-presence-hominides-asie-0422.php,Les" Belles lettres, 2001.
"https://www.ancient-code.com/indian-archeologists-stumbleacross-ruins-great-forgotten-civilization-mizoram/"
"www.jstor.org/stable/24084454"
"www.biorespire.com/2016/03/22/une-nouvelle-villeantique-d%C3%A9couverte-en-inde/"
"insu.cnrs.fr/terre-solide/terre-et-vie/de-nouvellesdatations-repoussent-l-age-de-l-apparition-d-outils-surle-so"
"www.cerege.fr/spip.php?page=pageperso&id_user=94"

Extract pieces from URL

I need to extract pieces from a URL and I am trying to learn preg_match_all().
The output in a PHP variable:
$content = 'http://www.domain.com/folder1/firstname_lastname.jpg';
Here is my attempt:
preg_match_all('/http://(.*?).jpg/s', $content, $out, PREG_SET_ORDER);
echo $out[0][0] . "\n";
Matching the URL is not easy.
I need to pick out from:
hxxp://www.domain.com/folder1/firstname_lastname.jpg
the following: "www.domain.com" and "folder1" and "firstname_lastname"
Could I get one preg_match_all() for each example?
Thanks in advance.
I like to learn by example and trial and error.
He he... :)
You can use these:
http://uk3.php.net/parse_url
http://uk1.php.net/manual/en/function.parse-str.php
To get constituent parts of a URL and query string.

Resources