cl_http_utility not normalizing my url. Why? - url

Via an enterpreise service consumer I connect to a webservice, which returns me some data, and also url's.
However, I tried all methods of the mentioned class above and NO METHOD seems to convert the unicode-characters inside my url into the proper readable characters.... ( in this case '=' and ';' ) ...
The only method, which runs properly is "is_valid_url", which returns false, when I pass url's like this:
http://not_publish-workflow-dev.hq.not_publish.com/lc/content/forms/af/not_publish/request-datson-internal/v01/request-datson-internal.html?taskId\u003d105862\u0026wcmmode\u003ddisabled
What am I missing?

It seems that this format is for json values. Usually = and & don't need to be written with the \u prefix. To decode all \u characters, you may use this code:
DATA(json_value) = `http://not_publish-workflow-dev.hq.not_publish.com/lc`
&& `/content/forms/af/not_publish/request-datson-internal/v01`
&& `/request-datson-internal.html?taskId\u003d105862\u0026wcmmode\u003ddisabled`.
FIND ALL OCCURRENCES OF REGEX '\\u....' IN json_value RESULTS DATA(matches).
SORT matches BY offset DESCENDING.
LOOP AT matches ASSIGNING FIELD-SYMBOL(<match>).
DATA hex2 TYPE x LENGTH 2.
hex2 = to_upper( substring( val = json_value+<match>-offset(<match>-length) off = 2 ) ).
DATA(uchar) = cl_abap_conv_in_ce=>uccp( hex2 ).
REPLACE SECTION OFFSET <match>-offset LENGTH <match>-length OF json_value WITH uchar.
ENDLOOP.
ASSERT json_value = `http://not_publish-workflow-dev.hq.not_publish.com/lc`
&& `/content/forms/af/not_publish/request-datson-internal/v01`
&& `/request-datson-internal.html?taskId=105862&wcmmode=disabled`.

I hate to answer my own questions, but anyway, I found an own solution, via manually replacing those unicodes. It is similar to Sandra's idea, but able to convert ANY unicode.
I share it here, just in case, any person might also need it.
DATA: lt_res_tab TYPE match_result_tab.
DATA(valid_url) = url.
FIND ALL OCCURRENCES OF REGEX '\\u.{4}' IN valid_url RESULTS lt_res_tab.
WHILE lines( lt_res_tab ) > 0.
DATA(match) = substring( val = valid_url off = lt_res_tab[ 1 ]-offset len = lt_res_tab[ 1 ]-length ).
DATA(hex_unicode) = to_upper( match+2 ).
DATA(char) = cl_abap_conv_in_ce=>uccp( uccp = hex_unicode ).
valid_url = replace( val = valid_url off = lt_res_tab[ 1 ]-offset len = lt_res_tab[ 1 ]-length with = char ).
FIND ALL OCCURRENCES OF REGEX '\\u.{4}' IN valid_url RESULTS lt_res_tab.
ENDWHILE.
WRITE / url.
WRITE / valid_url.

Related

I need to fix malformed pattern error

I want to replace % signs with a $. I tried doing an escape character () but that didn't work. I am using lua 5.1 and I get a malformed pattern error. (ends in '%') This is bugging me because I don't know how to fix it.
io.write("Search: ") search = io.read()
local query = search:gsub("%", "%25") -- Where I put the % sign.
query = query:gsub("+", "%2B")
query = query:gsub(" ","+")
query = query:gsub("/", "%2F")
query = query:gsub("#", "%23")
query = query:gsub("$", "%24")
query = query:gsub("#", "%40")
query = query:gsub("?", "%3F")
query = query:gsub("{", "%7B")
query = query:gsub("}","%7D")
query = query:gsub("[","%5B")
query = query:gsub("]","%5D")
query = query:gsub(">", "%3E")
query = query:gsub("<", "%3C")
local url = "https://www.google.com/#q=" .. query
print(url)
Output reads:
malformed pattern (ends with '%')
You need to escape % and write %%.
The idiomatic what to do this in Lua is to give a table to gsub:
local reserved="%+/#$#?{}[]><"
local escape={}
for c in reserved:gmatch(".") do
escape[c]=string.format("%%%02X",c:byte())
end
escape[" "]="+"
query = search:gsub(".", escape)

How can I efficiently parse formatted text from a file in Qt?

I would like to get efficient way of working with Strings in Qt. Since I am new in Qt environment.
So What I am doing:
I am loading a text file, and getting each lines.
Each line has text with comma separated.
Line schema:
Fname{limit:list:option}, Lname{limit:list:option} ... etc.
Example:
John{0:0:0}, Lname{0:0:0}
Notes:limit can be 1 or 0 and the same as others.
So I would like to get Fname and get limit,list,option values from {}.
I am thinking to write a code with find { and takes what is inside, by reading symbol by symbol.
What is the efficient way to parse that?
Thanks.
The following snippet will give you Fname and limit,list,option from the first set of brackets. It could be easily updated if you are interested in the Lname set as well.
QFile file("input.txt");
if (!file.open(QIODevice::ReadOnly | QIODevice::Text))
qDebug() << "Failed to open input file.";
QRegularExpression re("(?<name>\\w+)\\{(?<limit>[0-1]):(?<list>[0-1]):(?<option>[0-1])}");
while (!file.atEnd())
{
QString line = file.readLine();
QRegularExpressionMatch match = re.match(line);
QString name = match.captured("name");
int limit = match.captured("limit").toInt();
int list = match.captured("list").toInt();
int option = match.captured("option").toInt();
// Do something with values ...
}

How to replace many symbols with a single word in Lua?

I must replace all of these characters, "①②③④⑤⑥⑦⑧⑨⑩", with "\item".
I have used this code:
stra = string.gsub(text, "①", "\\item")
strb = string.gsub(stra, "②", "\\item")
strc = string.gsub(strb, "③", "\\item")
strd = string.gsub(strc, "④", "\\item")
stre = string.gsub(strd, "⑤", "\\item")
However, this is very verbose. Is there a simpler way to replace all of those items?
local symbols_trans = {
["\226\145\160"]--[[①]] = "\\item1",
["\226\145\161"]--[[②]] = "\\bananas",
["\226\145\162"]--[[③]] = "\\cactus",
["\226\145\163"]--[[④]] = "\\etc",
["\226\145\164"]--[[⑤]] = "\\item5",
["\226\145\165"]--[[⑥]] = "\\item6",
["\226\145\166"]--[[⑦]] = "\\item7",
["\226\145\167"]--[[⑧]] = "\\item8",
["\226\145\168"]--[[⑨]] = "\\item9",
["\226\145\169"]--[[⑩]] = "\\item10",
}
text = string.gsub(text, "(\266\145.)", symbol_trans)
Or if you want to replace them all with"\\item":
text = string.gsub(text,
"\266\145[\160-\169]",
"\\item"
)
[\160-\169] is equivalent to [\160\161\162\163\164\165\166\167\168\169].
See the Lua manual for information on ranges and, in general, Lua patterns.
You could also be fancy:
text = string.gsub(text,
"\266\145([\160-169])",
function(c)
return "\\item"..(string.byte(c)-160+1)
end
)
This will turn ① into \item1, ② into \item2, and so on.
Use a "set" as described in the tutorial: http://lua-users.org/wiki/PatternsTutorial
string.gsub(text, "[①②③④⑤⑥⑦⑧⑨⑩]", "\\item")
Is there a simpler way to replace all of those items?
Not without a Lua pattern matching library that knows what UTF-8 is. Lua is not Unicode aware; it has no idea how to search for Unicode symbols.
If you're using some non-multibyte encoding, then what John suggested might work. But not if it's UTF-8.
For your specific case, you could always do this:
local symbolsToChange { "①", "②", ...}
for i, sym in ipairs(symbolsToChange) do
string.gsub(text, sym, "\\item")
end

lua Hashtables, table index is nil?

What I'm currently trying to do is make a table of email addresses (as keys) that hold person_records (as values). Where the person_record holds 6 or so things in it. The problem I'm getting is that when I try to assign the email address as a key to a table it complains and says table index is nil... This is what I have so far:
random_record = split(line, ",")
person_record = {first_name = random_record[1], last_name = random_record[2], email_address = random_record[3], street_address = random_record[4], city = random_record[5], state = random_record[6]}
email_table[person_record.email_address] = person_record
I wrote my own split function that basically takes a line of input and pulls out the 6 comma seperated values and stores them in a table (random_record)
I get an error when I try to say email_table[person_record.email_address] = person_record.
But when I print out person_record.email_address it's NOT nil, it prints out the string I stored in it.. I'm so confused.
function split(str, pat)
local t = {} -- NOTE: use {n = 0} in Lua-5.0
local fpat = "(.-)" .. pat
local last_end = 1
local s, e, cap = str:find(fpat, 1)
while s do
if s ~= 1 or cap ~= "" then
table.insert(t,cap)
end
last_end = e+1
s, e, cap = str:find(fpat, last_end)
end
if last_end <= #str then
cap = str:sub(last_end)
table.insert(t, cap)
end
return t
end
The following code is copy and pasted from your example and runs just fine:
email_table = {}
random_record = {"first", "second", "third"}
person_record = {first_name = random_record[1], last_name = random_record[1], email_address = random_record[1]}
email_table[person_record.email_address] = person_record
So your problem is in your split function.
BTW, Lua doesn't have "hashtables". It simply has "tables" which store key/value pairs. Whether these happen to use hashes or not is an implementation detail.
It looks like you iterating over some lines that have comma-separated data.
Looking at your split function, it stops as soon as there's no more separator (,) symbols in particular line to find. So feeding it anything with less than 3 ,-separated fields (for very common example: an empty line at end of file) will produce a table that doesn't go up to [3]. Addressing any empty table value will return you a nil, so person_record.email_address will be set to nil as well on the 2nd line of your code. Then, when you attempt to use this nil stored in person_record.email_address as an index to email_table in 3rd line, you will get the exact error you've mentioned.

What standard produced hex-encoded characters with an extra "25" at the front?

I'm trying to integrate with ybp.com, a vendor of proprietary software for managing book ordering workflows in large libraries. It keeps feeding me URLs that contain characters encoded with an extra "25" in them. Like this book title:
VOLATILE KNOWING%253a PARENTS%252c TEACHERS%252c AND THE CENSORED STORY OF ACCOUNTABILITY IN AMERICA%2527S PUBLIC SCHOOLS.
The encoded characters in this sample are as follows:
%253a = %3A = a colon
%252c = %2C = a comma
%2527 = %27 = an apostrophe (non-curly)
I need to convert these encodings to a format my internal apps can recognize, and the extra 25 is throwing things off kilter. The final two digits of the hex encoded characters appear to be identical to standard URL encodings, so a brute force method would be to replace "%25" with "%". But I'm leary of doing that because it would be sure to haunt me later when an actual %25 shows up for some reason.
So, what standard is this? Is there an official algorithm for converting values like this to other encodings?
%25 is actually a % character. My guess is that the external website is URLEncoding their output twice accidentally.
If that's the case, it is safe to replace %25 with % (or just URLDecode twice)
The ASCII code 37 (25 in hexadecimal) is %, so the URL encoding of % is %25.
It looks like your data got URL encoded twice: , -> %2C -> %252C
Substituting every %25 for % should not generate any problems, as an actual %25 would get encoded to %25252525.
Create a counter that increments one by one for next two characters, and if you found modulus, you go back, assign the previous counter the '%' char and proceed again. Something like this.
char *str, *newstr; // Fill up with some memory before proceeding below..
....
int k = 0, j = 0;
short modulus = 0;
char first = 0, second = 0;
short proceed = 0;
for(k=0,j=0; k<some_size; j++,k++) {
if(str[k] == '%') {
++k; first = str[k];
++k; second = str[k];
proceed = 1;
} else if(modulus == 1) {
modulus = 0;
--j; first = str[k];
++k; second = str[k];
newstr[j] = '%';
proceed = 1;
} else proceed = 0; // Do not do decoding..
if(proceed == 1) {
if(first == '2' && second == '5') {
newstr[j] = '%';
modulus = 1;
......

Resources