What standard produced hex-encoded characters with an extra "25" at the front? - url

I'm trying to integrate with ybp.com, a vendor of proprietary software for managing book ordering workflows in large libraries. It keeps feeding me URLs that contain characters encoded with an extra "25" in them. Like this book title:
VOLATILE KNOWING%253a PARENTS%252c TEACHERS%252c AND THE CENSORED STORY OF ACCOUNTABILITY IN AMERICA%2527S PUBLIC SCHOOLS.
The encoded characters in this sample are as follows:
%253a = %3A = a colon
%252c = %2C = a comma
%2527 = %27 = an apostrophe (non-curly)
I need to convert these encodings to a format my internal apps can recognize, and the extra 25 is throwing things off kilter. The final two digits of the hex encoded characters appear to be identical to standard URL encodings, so a brute force method would be to replace "%25" with "%". But I'm leary of doing that because it would be sure to haunt me later when an actual %25 shows up for some reason.
So, what standard is this? Is there an official algorithm for converting values like this to other encodings?

%25 is actually a % character. My guess is that the external website is URLEncoding their output twice accidentally.
If that's the case, it is safe to replace %25 with % (or just URLDecode twice)

The ASCII code 37 (25 in hexadecimal) is %, so the URL encoding of % is %25.
It looks like your data got URL encoded twice: , -> %2C -> %252C
Substituting every %25 for % should not generate any problems, as an actual %25 would get encoded to %25252525.

Create a counter that increments one by one for next two characters, and if you found modulus, you go back, assign the previous counter the '%' char and proceed again. Something like this.
char *str, *newstr; // Fill up with some memory before proceeding below..
....
int k = 0, j = 0;
short modulus = 0;
char first = 0, second = 0;
short proceed = 0;
for(k=0,j=0; k<some_size; j++,k++) {
if(str[k] == '%') {
++k; first = str[k];
++k; second = str[k];
proceed = 1;
} else if(modulus == 1) {
modulus = 0;
--j; first = str[k];
++k; second = str[k];
newstr[j] = '%';
proceed = 1;
} else proceed = 0; // Do not do decoding..
if(proceed == 1) {
if(first == '2' && second == '5') {
newstr[j] = '%';
modulus = 1;
......

Related

writing to flash memory dspic33e

I have some questions regarding the flash memory with a dspic33ep512mu810.
I'm aware of how it should be done:
set all the register for address, latches, etc. Then do the sequence to start the write procedure or call the builtins function.
But I find that there is some small difference between what I'm experiencing and what is in the DOC.
when writing the flash in WORD mode. In the DOC it is pretty straightforward. Following is the example code in the DOC
int varWord1L = 0xXXXX;
int varWord1H = 0x00XX;
int varWord2L = 0xXXXX;
int varWord2H = 0x00XX;
int TargetWriteAddressL; // bits<15:0>
int TargetWriteAddressH; // bits<22:16>
NVMCON = 0x4001; // Set WREN and word program mode
TBLPAG = 0xFA; // write latch upper address
NVMADR = TargetWriteAddressL; // set target write address
NVMADRU = TargetWriteAddressH;
__builtin_tblwtl(0,varWord1L); // load write latches
__builtin_tblwth(0,varWord1H);
__builtin_tblwtl(0x2,varWord2L);
__builtin_tblwth(0x2,varWord2H);
__builtin_disi(5); // Disable interrupts for NVM unlock sequence
__builtin_write_NVM(); // initiate write
while(NVMCONbits.WR == 1);
But that code doesn't work depending on the address where I want to write. I found a fix to write one WORD but I can't write 2 WORD where I want. I store everything in the aux memory so the upper address(NVMADRU) is always 0x7F for me. The NVMADR is the address I can change. What I'm seeing is that if the address where I want to write modulo 4 is not 0 then I have to put my value in the 2 last latches, otherwise I have to put the value in the first latches.
If address modulo 4 is not zero, it doesn't work like the doc code(above). The value that will be at the address will be what is in the second set of latches.
I fixed it for writing only one word at a time like this:
if(Address % 4)
{
__builtin_tblwtl(0, 0xFFFF);
__builtin_tblwth(0, 0x00FF);
__builtin_tblwtl(2, ValueL);
__builtin_tblwth(2, ValueH);
}
else
{
__builtin_tblwtl(0, ValueL);
__builtin_tblwth(0, ValueH);
__builtin_tblwtl(2, 0xFFFF);
__builtin_tblwth(2, 0x00FF);
}
I want to know why I'm seeing this behavior?
2)I also want to write a full row.
That also doesn't seem to work for me and I don't know why because I'm doing what is in the DOC.
I tried a simple write row code and at the end I just read back the first 3 or 4 element that I wrote to see if it works:
NVMCON = 0x4002; //set for row programming
TBLPAG = 0x00FA; //set address for the write latches
NVMADRU = 0x007F; //upper address of the aux memory
NVMADR = 0xE7FA;
int latchoffset;
latchoffset = 0;
__builtin_tblwtl(latchoffset, 0);
__builtin_tblwth(latchoffset, 0); //current = 0, available = 1
latchoffset+=2;
__builtin_tblwtl(latchoffset, 1);
__builtin_tblwth(latchoffset, 1); //current = 0, available = 1
latchoffset+=2;
.
. all the way to 127(I know I could have done it in a loop)
.
__builtin_tblwtl(latchoffset, 127);
__builtin_tblwth(latchoffset, 127);
INTCON2bits.GIE = 0; //stop interrupt
__builtin_write_NVM();
while(NVMCONbits.WR == 1);
INTCON2bits.GIE = 1; //start interrupt
int testaddress;
testaddress = 0xE7FA;
status = NVMemReadIntH(testaddress);
status = NVMemReadIntL(testaddress);
testaddress += 2;
status = NVMemReadIntH(testaddress);
status = NVMemReadIntL(testaddress);
testaddress += 2;
status = NVMemReadIntH(testaddress);
status = NVMemReadIntL(testaddress);
testaddress += 2;
status = NVMemReadIntH(testaddress);
status = NVMemReadIntL(testaddress);
What I see is that the value that is stored in the address 0xE7FA is 125, in 0xE7FC is 126 and in 0xE7FE is 127. And the rest are all 0xFFFF.
Why is it taking only the last 3 latches and write them in the first 3 address?
Thanks in advance for your help people.
The dsPIC33 program memory space is treated as 24 bits wide, it is
more appropriate to think of each address of the program memory as a
lower and upper word, with the upper byte of the upper word being
unimplemented
(dsPIC33EPXXX datasheet)
There is a phantom byte every two program words.
Your code
if(Address % 4)
{
__builtin_tblwtl(0, 0xFFFF);
__builtin_tblwth(0, 0x00FF);
__builtin_tblwtl(2, ValueL);
__builtin_tblwth(2, ValueH);
}
else
{
__builtin_tblwtl(0, ValueL);
__builtin_tblwth(0, ValueH);
__builtin_tblwtl(2, 0xFFFF);
__builtin_tblwth(2, 0x00FF);
}
...will be fine for writing a bootloader if generating values from a valid Intel HEX file, but doesn't make it simple for storing data structures because the phantom byte is not taken into account.
If you create a uint32_t variable and look at the compiled HEX file, you'll notice that it in fact uses up the least significant words of two 24-bit program words. I.e. the 32-bit value is placed into a 64-bit range but only 48-bits out of the 64-bits are programmable, the others are phantom bytes (or zeros). Leaving three bytes per address modulo of 4 that are actually programmable.
What I tend to do if writing data is to keep everything 32-bit aligned and do the same as the compiler does.
Writing:
UINT32 value = ....;
:
__builtin_tblwtl(0, value.word.word_L); // least significant word of 32-bit value placed here
__builtin_tblwth(0, 0x00); // phantom byte + unused byte
__builtin_tblwtl(2, value.word.word_H); // most significant word of 32-bit value placed here
__builtin_tblwth(2, 0x00); // phantom byte + unused byte
Reading:
UINT32 *value
:
value->word.word_L = __builtin_tblrdl(offset);
value->word.word_H = __builtin_tblrdl(offset+2);
UINT32 structure:
typedef union _UINT32 {
uint32_t val32;
struct {
uint16_t word_L;
uint16_t word_H;
} word;
uint8_t bytes[4];
} UINT32;

Getting a specific number from a bigger number?

a = (any random number)
Is there a way could I get the third number (12345) without converting it into a string?
If not, what would be a good way to get it by converting it into a string?
You can use the sub function
function getDigit(value,digitPlace)
return tonumber(tostring(value):sub(digitPlace,digitPlace))
end
This will get the third digit of a as a number:
a = 12345
print(getDigit(a,3))
You can get that using simple maths.
function getDigitInt(value, digit)
-- get rid of the sign
value = math.abs(value)
-- how many digits does the number have?
local numDigits = math.floor(math.log(value, 10)) + 1
-- does the requested digit exist?
if digit > numDigits or digit < 1 then
print("digit does not exist")
return
end
-- return the requested digit
return math.floor(value / 10^(numDigits - digit)) % 10
end
-- test
for i = 0, 8 do print(getDigitInt(1234567, i)) end
Add more error handling as needed. Also this can only handle integers of course. But I'm sure you will find out how to apply this idea to decimals as well.
You can convert the number into array and find the any place easily like blow
public int GetDigitsPlace(int number, int digitPlace) {
string t = number.ToString();
int[] nArr = new int[t.Length];
for(int i = 0; i < nArr.Length; i++) {
nArr[i] = int.Parse(t[i]);
}
return nArr[digitPlace];
}

How to achieve capturing groups in flex lex?

I wanted to match for a string which starts with a '#', then matches everything until it matches the character that follows '#'. This can be achieved using capturing groups like this: #(.)[^(?1)]*(?1)(EDIT this regex is also erroneous). This matches #$foo$, does not match #%bar&, matches first 6 characters of #"foo"bar.
But since flex lex does not support capturing groups, what is the workaround here?
As you say, (f)lex does not support capturing groups, and it certainly doesn't support backreferences.
So there is no simple workaround, but there are workarounds. Here are a few possibilities:
You can read the input one character at a time using the input() function, until you find the matching character (but you have to create your own buffer to store the characters, because characters read by input() are not added to the current token). This is not the most efficient because reading one character at a time is a bit clunky, but it's the only interface that (f)lex offers. (The following snippet assumes you have some kind of expandable stringBuilder; if you are using C++, this would just be replaced with a std::string.)
#. { StringBuilder sb = string_builder_new();
int delim = yytext[1];
for (;;) {
int next = input();
if (next == delim) break;
if (next == EOF ) { /* Signal error */; break; }
string_builder_addchar(next);
}
yylval = string_builder_release();
return DELIMITED_STRING;
}
Even less efficiently, but perhaps more conveniently, you can get (f)lex to accumulate the characters in yytext using yymore(), matching one character at a time in a start condition:
%x DELIMITED
%%
int delim;
#. { delim = yytext[1]; BEGIN(DELIMITED); }
<DELIMITED>.|\n { if (yytext[0] == delim) {
yylval = strdup(yytext);
BEGIN(INITIAL);
return DELIMITED_STRING;
}
yymore();
}
<DELIMITED><<EOF>> { /* Signal unterminated string error */ }
The most efficient solution (in (f)lex) is to just write one rule for each possible delimiter. While that's a lot of rules, they could be easily generated with a small script in whatever scripting language you prefer. And, actually, there are not that many rules, particularly if you don't allow alphabetic and non-printing characters to be delimiters. This has the additional advantage that if you want Perl-like parenthetic delimiters (#(Hello) instead of #(Hello(), you can just modify the individual pattern to suit (as I've done below). [Note 1] Since all the actions are the same; it might be easier to use a macro for the action, making it easier to modify.
/* Ordinary punctuation */
#:[^:]*: { yylval = strndup(yytext + 2, yyleng - 3); return DELIMITED_STRING; }
#:[^:]*: { yylval = strndup(yytext + 2, yyleng - 3); return DELIMITED_STRING; }
#![^!]*! { yylval = strndup(yytext + 2, yyleng - 3); return DELIMITED_STRING; }
#\.[^.]*\. { yylval = strndup(yytext + 2, yyleng - 3); return DELIMITED_STRING; }
/* Matched pairs */
#<[^>]*> { yylval = strndup(yytext + 2, yyleng - 3); return DELIMITED_STRING; }
#\[[^]]*] { yylval = strndup(yytext + 2, yyleng - 3); return DELIMITED_STRING; }
/* Trap errors */
# { /* Report unmatched or invalid delimiter error */ }
If I were writing a script to generate these rules, I would use hexadecimal escapes for all the delimiter characters rather than trying to figure out which ones needed escapes.
Notes:
Perl requires nested balanced parentheses in constructs like that. But you can't do that with regular expressions; if you wanted to reproduce Perl behaviour, you'd need to use some variation on one of the other suggestions. I'll try to revisit this answer later to address that feature.

cl_http_utility not normalizing my url. Why?

Via an enterpreise service consumer I connect to a webservice, which returns me some data, and also url's.
However, I tried all methods of the mentioned class above and NO METHOD seems to convert the unicode-characters inside my url into the proper readable characters.... ( in this case '=' and ';' ) ...
The only method, which runs properly is "is_valid_url", which returns false, when I pass url's like this:
http://not_publish-workflow-dev.hq.not_publish.com/lc/content/forms/af/not_publish/request-datson-internal/v01/request-datson-internal.html?taskId\u003d105862\u0026wcmmode\u003ddisabled
What am I missing?
It seems that this format is for json values. Usually = and & don't need to be written with the \u prefix. To decode all \u characters, you may use this code:
DATA(json_value) = `http://not_publish-workflow-dev.hq.not_publish.com/lc`
&& `/content/forms/af/not_publish/request-datson-internal/v01`
&& `/request-datson-internal.html?taskId\u003d105862\u0026wcmmode\u003ddisabled`.
FIND ALL OCCURRENCES OF REGEX '\\u....' IN json_value RESULTS DATA(matches).
SORT matches BY offset DESCENDING.
LOOP AT matches ASSIGNING FIELD-SYMBOL(<match>).
DATA hex2 TYPE x LENGTH 2.
hex2 = to_upper( substring( val = json_value+<match>-offset(<match>-length) off = 2 ) ).
DATA(uchar) = cl_abap_conv_in_ce=>uccp( hex2 ).
REPLACE SECTION OFFSET <match>-offset LENGTH <match>-length OF json_value WITH uchar.
ENDLOOP.
ASSERT json_value = `http://not_publish-workflow-dev.hq.not_publish.com/lc`
&& `/content/forms/af/not_publish/request-datson-internal/v01`
&& `/request-datson-internal.html?taskId=105862&wcmmode=disabled`.
I hate to answer my own questions, but anyway, I found an own solution, via manually replacing those unicodes. It is similar to Sandra's idea, but able to convert ANY unicode.
I share it here, just in case, any person might also need it.
DATA: lt_res_tab TYPE match_result_tab.
DATA(valid_url) = url.
FIND ALL OCCURRENCES OF REGEX '\\u.{4}' IN valid_url RESULTS lt_res_tab.
WHILE lines( lt_res_tab ) > 0.
DATA(match) = substring( val = valid_url off = lt_res_tab[ 1 ]-offset len = lt_res_tab[ 1 ]-length ).
DATA(hex_unicode) = to_upper( match+2 ).
DATA(char) = cl_abap_conv_in_ce=>uccp( uccp = hex_unicode ).
valid_url = replace( val = valid_url off = lt_res_tab[ 1 ]-offset len = lt_res_tab[ 1 ]-length with = char ).
FIND ALL OCCURRENCES OF REGEX '\\u.{4}' IN valid_url RESULTS lt_res_tab.
ENDWHILE.
WRITE / url.
WRITE / valid_url.

How to generate unique (short) URL folder name on the fly...like Bit.ly

I'm creating an application which will create a large number of folders on a web server, with files inside of them.
I need the folder name to be unique. I can easily do this with a GUID, but I want something more user friendly. It doesn't need to be speakable by users, but should be short and standard characters (alphas is best).
In short: i'm looking to do something like Bit.ly does with their unique names:
www.mydomain.com/ABCDEF
Is there a good reference on how to do this? My platform will be .NET/C#, but ok with any help, references, links, etc on the general concept, or any overall advice to solve this task.
Start at 1. Increment to 2, 3, 4, 5, 6, 7,
8, 9, a, b...
A, B, C...
X, Y, Z, 10, 11, 12, ... 1a, 1b,
You get the idea.
You have a synchronized global int/long "next id" and represent it in base 62 (numbers, lowercase, caps) or base 36 or something.
I'm assuming that you know how to use your web server's redirect capabilities. If you need help, just comment :).
The way I would do it would be generating a random integer (between the integer values of 'a' and 'z'); converting it into a char; appending it to a string; and repeating until we reach the needed length. If it generates a value already in the database, repeat the process. If it was unique, store it in the database with the name of the actual location and the name of the alias.
This is a bit hack-like because it assumes that 'a' through 'z' are actually in sequence in their integer values.
Best I could think of :(.
In Perl, without modules so you can translate more easly.
sub convert_to_base {
my ($n, $b) = #_;
my #digits;
while ($n) {
my $digits = $n % $b;
unshift #digits, $digit;
$n = ($n - $digit) / $b;
}
unshift #digits, 0 if !#digits;
return #digits;
}
# Whatever characters you want to use.
my #digit_set = ( '0'..'9', 'a'..'z', 'A'..'Z' );
# The id of the record in the database,
# or one more than the last id you generated.
my $id = 1;
my $converted =
join '',
map { $digit_set[$_] }
convert_to_base($id, 0+#digits_set);
I needed something similar to what you're trying to accomplish. I retooled my code to generate folders so try this. It's setup for a console app, but you can use it in a website also.
private static void genRandomFolders()
{
string basepath = "C:\\Users\\{username here}\\Desktop\\";
int count = 5;
int length = 8;
List<string> codes = new List<string>();
int total = 0;
int i = count;
Random rnd = new Random();
while (i-- > 0)
{
string code = RandomString(rnd, length);
if (!codes.Exists(delegate(string c) { return c.ToLower() == code.ToLower(); }))
{
//Create directory here
System.IO.Directory.CreateDirectory(basepath + code);
}
total++;
if (total % 100 == 0)
Console.WriteLine("Generated " + total.ToString() + " random folders...");
}
Console.WriteLine();
Console.WriteLine("Generated " + total.ToString() + " total random folders.");
}
public static string RandomString(Random r, int len)
{
//string str = "ABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890"; //uppercase only
//string str = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz1234567890"; //All
string str = "abcdefghjkmnpqrstuvwxyz123456789"; //Lowercase only
StringBuilder sb = new StringBuilder();
while ((len--) > 0)
sb.Append(str[(int)(r.NextDouble() * str.Length)]);
return sb.ToString();
}

Resources