How do I URL-escape a string in Mathematica?

How do I URL-escape a string in Mathematica? - url

For example,
urlesc["foo.cgi?abc=123"]
should return
foo.cgi%3Fabc%3D123
This is also known as percent-encoding.
Also, for better readability, spaces should encode to pluses.
I believe that's always acceptable for URL escaping.

Another method, using J/Link and java.net.URLEncoder:
In[116]:= Needs["JLink`"]; InstallJava[];
LoadJavaClass["java.net.URLEncoder"];
In[118]:= URLEncoder`encode["foo.cgi?abc=123"]
Out[118]= "foo.cgi%3Fabc%3D123"
There's also java.net.URLDecoder for decoding.

Here's my solution:
cat = StringJoin##(ToString/#{##})&; (* Like sprintf/strout in C/C++. *)
re = RegularExpression;
hex = IntegerString[#,16]&; (* integer to hex, represented as a string *)
up = ToUpperCase;
asc = ToCharacterCode[#][[1]]&; (* character to ascii code *)
subst = StringReplace;
urlesc[s_String] := subst[s, {" "->"+", re#"[^\w\_\:\.]":>"%"<>up#hex#asc#"$0"}]
urlesc[x_] := urlesc#cat#x
unesc[s_String] := subst[s, re#"\\%(..)":>FromCharacterCode#FromDigits["$1",16]]
As a bonus, here's a function to encode a list of rules like {a->2, b->3} into GET parameters like a=2&b=3, with appropriate URL-encoding:
encode[c_] := cat ## Riffle[cat[#1, "=", urlesc[#2]]& ### c, "&"]

Related

Trying to add special character (%) sign to variable with following concatenation sign in ESQL, it gives me the the below error

Trying to add special character (%) sign to variable with following concatenation sign, but it gives me the the error: Invalid characters.
DECLARE Percent CHARACTER CAST ( ' %' AS CHARACTER CCSID 1208);
SET AlocatedAmount = 45
SET InPercent = AlocatedAmount||'%'
Result should be: InPercent = 45%
Error:Invalid characters::45 %
What's going wrong here?

AlocatedAmount seems to be an INTEGER, on what you cannot use the concatenation operator.
You need to cast that to CHARACTER first:
SET InPercent = CAST(AlocatedAmount AS CHARACTER) || '%';

So there is also the option of using FORMAT in your CAST
DECLARE Num INTEGER;
DECLARE FormattedStr CHAR;
SET Num = 45;
SET FormattedStr = CAST(Num AS CHAR FORMAT '#0%');
More information can be found at https://www.ibm.com/support/knowledgecenter/en/SSMKHH_9.0.0/com.ibm.etools.mft.doc/ak05615_.htm

cl_http_utility not normalizing my url. Why?

Via an enterpreise service consumer I connect to a webservice, which returns me some data, and also url's.
However, I tried all methods of the mentioned class above and NO METHOD seems to convert the unicode-characters inside my url into the proper readable characters.... ( in this case '=' and ';' ) ...
The only method, which runs properly is "is_valid_url", which returns false, when I pass url's like this:
http://not_publish-workflow-dev.hq.not_publish.com/lc/content/forms/af/not_publish/request-datson-internal/v01/request-datson-internal.html?taskId\u003d105862\u0026wcmmode\u003ddisabled
What am I missing?

It seems that this format is for json values. Usually = and & don't need to be written with the \u prefix. To decode all \u characters, you may use this code:
DATA(json_value) = `http://not_publish-workflow-dev.hq.not_publish.com/lc`
&& `/content/forms/af/not_publish/request-datson-internal/v01`
&& `/request-datson-internal.html?taskId\u003d105862\u0026wcmmode\u003ddisabled`.
FIND ALL OCCURRENCES OF REGEX '\\u....' IN json_value RESULTS DATA(matches).
SORT matches BY offset DESCENDING.
LOOP AT matches ASSIGNING FIELD-SYMBOL(<match>).
DATA hex2 TYPE x LENGTH 2.
hex2 = to_upper( substring( val = json_value+<match>-offset(<match>-length) off = 2 ) ).
DATA(uchar) = cl_abap_conv_in_ce=>uccp( hex2 ).
REPLACE SECTION OFFSET <match>-offset LENGTH <match>-length OF json_value WITH uchar.
ENDLOOP.
ASSERT json_value = `http://not_publish-workflow-dev.hq.not_publish.com/lc`
&& `/content/forms/af/not_publish/request-datson-internal/v01`
&& `/request-datson-internal.html?taskId=105862&wcmmode=disabled`.

I hate to answer my own questions, but anyway, I found an own solution, via manually replacing those unicodes. It is similar to Sandra's idea, but able to convert ANY unicode.
I share it here, just in case, any person might also need it.
DATA: lt_res_tab TYPE match_result_tab.
DATA(valid_url) = url.
FIND ALL OCCURRENCES OF REGEX '\\u.{4}' IN valid_url RESULTS lt_res_tab.
WHILE lines( lt_res_tab ) > 0.
DATA(match) = substring( val = valid_url off = lt_res_tab[ 1 ]-offset len = lt_res_tab[ 1 ]-length ).
DATA(hex_unicode) = to_upper( match+2 ).
DATA(char) = cl_abap_conv_in_ce=>uccp( uccp = hex_unicode ).
valid_url = replace( val = valid_url off = lt_res_tab[ 1 ]-offset len = lt_res_tab[ 1 ]-length with = char ).
FIND ALL OCCURRENCES OF REGEX '\\u.{4}' IN valid_url RESULTS lt_res_tab.
ENDWHILE.
WRITE / url.
WRITE / valid_url.

How do I parse a string at compile time in Nimrod?

Going through the second part of Nimrod's tutorial I've reached the part were macros are explained. The documentation says they run at compile time, so I thought I could do some parsing of strings to create myself a domain specific language. However, there are no examples of how to do this, the debug macro example doesn't display how one deals with a string parameter.
I want to convert code like:
instantiate("""
height,f,132.4
weight,f,75.0
age,i,25
""")
…into something which by hand I would write like:
var height: float = 132.4
var weight: float = 75.0
var age: int = 25
Obviously this example is not very useful, but I want to look at something simple (multiline/comma splitting, then transformation) which could help me implement something more complex.
My issue here is how does the macro obtain the input string, parse it (at compile time!), and what kind of code can run at compile time (is it just a subset of a languaje? can I use macros/code from other imported modules)?
EDIT: Based on the answer here's a possible code solution to the question:
import macros, strutils
# Helper proc, macro inline lambdas don't seem to compile.
proc cleaner(x: var string) = x = x.strip()
macro declare(s: string): stmt =
# First split all the input into separate lines.
var
rawLines = split(s.strVal, {char(0x0A), char(0x0D)})
buf = ""
for rawLine in rawLines:
# Split the input line into three columns, stripped, and parse.
var chunks = split(rawLine, ',')
map(chunks, cleaner)
if chunks.len != 3:
error("Declare macro syntax is 3 comma separated values:\n" &
"Got: '" & rawLine & "'")
# Add the statement, preppending a block if the buffer is empty.
if buf.len < 1: buf = "var\n"
buf &= " " & chunks[0] & ": "
# Parse the input type, which is an abbreviation.
case chunks[1]
of "i": buf &= "int = "
of "f": buf &= "float = "
else: error("Unexpected type '" & chunks[1] & "'")
buf &= chunks[2] & "\n"
# Finally, check if we did add any variable!
if buf.len > 0:
result = parseStmt(buf)
else:
error("Didn't find any input values!")
declare("""
x, i, 314
y, f, 3.14
""")
echo x
echo y

Macros can, by and large, utilize all pure Nimrod code that a procedure in the same place could see, too. E.g., you can import strutils or peg to parse your string, then construct output from that. Example:
import macros, strutils
macro declare(s: string): stmt =
var parts = split(s.strVal, {' ', ','})
if len(parts) != 3:
error("declare macro requires three parts")
result = parseStmt("var $1: $2 = $3" % parts)
declare("x, int, 314")
echo x
"Calling" a macro will basically evaluate it at compile time as though it were a procedure (with the caveat that the macro arguments will actually be ASTs, hence the need to use s.strVal above instead of s), then insert the AST that it returns at the position of the macro call.
The macro code is evaluated by the compiler's internal virtual machine.

Get an array from a string of numbers separated with comma

How can I convert a string like s = "6.1101,17.592,3.3245\n" to numbers in Lua.
In python, I usually do
a = s.strip().split(',')
a = [float(i) for i in a]
What is the proper way to do this with Lua?

This is fairly trivial; just do a repeated match:
for match in s:gmatch("([%d%.%+%-]+),?") do
output[#output + 1] = tonumber(match)
end
This of course assumes that there are no spaces in the numbers.

What standard produced hex-encoded characters with an extra "25" at the front?

I'm trying to integrate with ybp.com, a vendor of proprietary software for managing book ordering workflows in large libraries. It keeps feeding me URLs that contain characters encoded with an extra "25" in them. Like this book title:
VOLATILE KNOWING%253a PARENTS%252c TEACHERS%252c AND THE CENSORED STORY OF ACCOUNTABILITY IN AMERICA%2527S PUBLIC SCHOOLS.
The encoded characters in this sample are as follows:
%253a = %3A = a colon
%252c = %2C = a comma
%2527 = %27 = an apostrophe (non-curly)
I need to convert these encodings to a format my internal apps can recognize, and the extra 25 is throwing things off kilter. The final two digits of the hex encoded characters appear to be identical to standard URL encodings, so a brute force method would be to replace "%25" with "%". But I'm leary of doing that because it would be sure to haunt me later when an actual %25 shows up for some reason.
So, what standard is this? Is there an official algorithm for converting values like this to other encodings?

%25 is actually a % character. My guess is that the external website is URLEncoding their output twice accidentally.
If that's the case, it is safe to replace %25 with % (or just URLDecode twice)

The ASCII code 37 (25 in hexadecimal) is %, so the URL encoding of % is %25.
It looks like your data got URL encoded twice: , -> %2C -> %252C
Substituting every %25 for % should not generate any problems, as an actual %25 would get encoded to %25252525.

Create a counter that increments one by one for next two characters, and if you found modulus, you go back, assign the previous counter the '%' char and proceed again. Something like this.
char *str, *newstr; // Fill up with some memory before proceeding below..
....
int k = 0, j = 0;
short modulus = 0;
char first = 0, second = 0;
short proceed = 0;
for(k=0,j=0; k<some_size; j++,k++) {
if(str[k] == '%') {
++k; first = str[k];
++k; second = str[k];
proceed = 1;
} else if(modulus == 1) {
modulus = 0;
--j; first = str[k];
++k; second = str[k];
newstr[j] = '%';
proceed = 1;
} else proceed = 0; // Do not do decoding..
if(proceed == 1) {
if(first == '2' && second == '5') {
newstr[j] = '%';
modulus = 1;
......

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

How do I URL-escape a string in Mathematica? - url

For example, urlesc["foo.cgi?abc=123"] should return foo.cgi%3Fabc%3D123 This is also known as percent-encoding. Also, for better readability, spaces should encode to pluses. I believe that's always acceptable for URL escaping.

Another method, using J/Link and java.net.URLEncoder: In[116]:= Needs["JLink`"]; InstallJava[]; LoadJavaClass["java.net.URLEncoder"]; In[118]:= URLEncoder`encode["foo.cgi?abc=123"] Out[118]= "foo.cgi%3Fabc%3D123" There's also java.net.URLDecoder for decoding.

Related

Trying to add special character (%) sign to variable with following concatenation sign in ESQL, it gives me the the below error

cl_http_utility not normalizing my url. Why?

How do I parse a string at compile time in Nimrod?

Get an array from a string of numbers separated with comma

What standard produced hex-encoded characters with an extra "25" at the front?

Categories

Resources