HTML Escapes - url-encoding

Given:
CR = %0d = \r
LF = %0a = \n
What does
%3E,
%3C
Mean?

They are URL encoded characters. %3C is <, %3E is >
More info on URL Encoding, and a chart of some of the lower ASCII values.

paste
javascript:alert(unescape("%3E"))
into a browser's address bar and hit Return to find out ;)

The two digits after the % is an ASCII code represented in hexadecimal.

Related

I'm trying to remove backslash from string but if I print out with print I get the correct string , but if I print it with "po" I get the same string

MyString = "CfegoAsZEM/sP\u{10}\u{10}}"
MyString.replacingOccurrences(of: "\"", with: "")
with print(MyString) I got this : "CfegoAsZEM/sP" (that's what I need)
with po MyString (on the debugger) : "CfegoAsZEM/sP\u{10}\u{10}}"
\u{10} is a linefeed character
Maybe a better way is to trim the string, it removes all whitespace and newline characters from the beginning and the end of the string
let myString = "CfegoAsZEM/sP\u{10}\u{10}"
let trimmedString = myString.trimmingCharacters(in: .whitespacesAndNewlines)
Your string doesn't contain literal backslash characters. Rather, the \u{} sequence is an escaped sequence that introduces a Unicode character. This is why you can't remove it using replacingOccurrences.
In this case, as Vadian pointed out it is the "new line" character (0x10). Since this is an invisible "white space" character you don't see it when you print the string, but you do see it when you use po. The debugger shows you escape sequences for non-printable characters. You will also see the sequence if you print(MyString.debugDescription)
Unfortunately the trimmingCharactersIn function doesn't appear to consider Unicode sequences.
We can use the filter function to examine each character in the string. If the character is ASCII and has a value greater than 31 ( 32 is the space character, the first "printable" character in the ASCII sequence) we can include it. We also need to ensure that values that aren't ASCII are included so as not to strip printable Unicode characters (e.g. emoji or non-Latin characters).
let MyString = "CfegoAsZEM/sP\u{10}\u{13}$}🔅\u{1F600}".filter { $0.asciiValue ?? 32 > 31 }
print(MyString.debugDescription)
print(MyString)
Output
"CfegoAsZEM/sP}🔅😀"
CfegoAsZEM/sP}🔅😀
asciiValue returns an optional, which is nil if the character isn't plain ASCII. I have used a nil-coalescing operator to return 32 in this case so that the character isn't filtered.
I modified the initial string to include some printable Unicode to demonstrate that it isn't stripped by the filter.

How NSString containing "&" convert to " \u0026" in objective c

Eg :"Transport & Logistics" to "Transport \u0026 Logistics"
The thing you are trying to do is to convert Text to Unicode Hex Representation.
the code below:
NSString *amp = #"&";
char ch = [amp characterAtIndex:0];
NSLog(#"amp is %04x", ch);
printing out in console: amp is 0026
(4 in %04x is number of characters to log)
now the deal is to represent 0026 as \u0026,
you can use stringWithFormat method of NSString, to achieve the result, if you want it as string.
And it is not clear, if you need to convert only ampersand - you should search for it in string.
The & is \u0026 in unicode characters and the bootstrap will accept %26 as &

Removing single backslash from string

I am getting a string for a place name back from an API: "Moe\'s Restaurant & Brewhouse". I want to just have it be "Moe's Restaurant & Brewhouse" but I can't get it to properly format without the \.
I've seen the other posts on this topic, I've tried placeName?.stringByReplacingOccurrencesOfString("\\", withString: "") and placeName?.stringByReplacingOccurrencesOfString("\'", withString: "'"). I just can't get anything to work. Any ideas so I can get the string how I want it without the \? Any help is greatly appreciated, thanks!!
You report that the API is returning "Moe\'s Restaurant & Brewhouse". More than likely you are looking at a Swift dictionary or something like that and it is showing you the string literal representation of that string. But depending upon how you're printing that, the string most likely does not contain any backslash.
Consider the following:
let string = "Moe's"
let dictionary = ["name": string]
print(dictionary)
That will print:
["name": "Moe\'s"]
It is just showing the "string literal" representation. As the documentation says:
String literals can include the following special characters:
The escaped special characters \0 (null character), \\ (backslash), \t (horizontal tab), \n (line feed), \r (carriage return), \" (double quote) and \' (single quote)
An arbitrary Unicode scalar, written as \u{n}, where n is a 1–8 digit hexadecimal number with a value equal to a valid Unicode code point
But, note, that backslash before the ' in Moe\'s is not part of the string, but rather just an artifact of printing a string literal with an escapable character in it.
If you do:
let string2 = dictionary["name"]!
print(string2)
It will show you that there is actually no backslash there:
Moe's
Likewise, if you check the number of characters:
print(dictionary["name"]!.characters.count)
It will correctly report that there are only five characters, not six.
(For what it's worth, I think Apple has made this far more confusing than is necessary because it sometimes prints strings as if they were string literals with backslashes, and other times as the true underlying string. And to add to the confusion, the single quote character can be escaped in a string literal, but doesn't have to be.)
Note, if your string really did have a backslash in it, you are correct that this is the correct way to remove it:
someString.stringByReplacingOccurrencesOfString("\\", withString: "")
But in this case, I suspect that the backslash that you are seeing is an artifact of how you're displaying it rather than an actual backslash in the underlying string.

Extra escape character in go URL

I have the following snippet of code :
u := *baseURL
u.User = nil
if q := strings.Index(path, "?"); q > 0 {
u.Path = path[:q]
u.RawQuery = path[q+1:]
} else {
u.Path = path
}
log.Printf(" url %v, u.String())
I see that when the baseurl is set to something like this http://localhost:9000/buckets/test%?bucket_uuid=7864b0dcdf0a578bd0012c70aef58aca the url package seems to add an extra escape character near the % sign. For e.g. the output of the above print statement is the following :
2015/03/25 12:02:49 url http://localhost:9000/pools/default/buckets/test%2525?bucket_uuid=7864b0dcdf0a578bd0012c70aef58aca
This seems to only happen when the RawQuery field of the URL is set. Any idea why this is happening ? I'm using go version 1.3.3
Cheers,
Manik
URLs may only contain characters of the ASCII character set, but it is often intended to include/transfer characters outside of this ASCII set. In such cases the URL has to be converted into a valid ASCII format.
If the raw URL contains characters outside of the allowed set, they are escaped: they are replaced with a '%' followed by two hexadecimal digits. Therefore the character '%' is special and also has to be escaped (and its escaped form will start with '%' as well, and its hexadecimal code is 25).
Since your raw URL contains the character '%', it will be replaced by "%25".
Back to your example: in the printed form you see "%2525". You could ask why not just "%25"?
This is because your original url contains a '%' in its escaped form which means its raw form contains the escape sequence "%25". If you use/interpret this as raw input, the '%' will be replaced by "%25" which will be followed by the "25" from the input hence resulting in "%2525".
See: HTML URL Encoding Reference
Also: RFC 1738 - Uniform Resource Locators (URL)
And also: RFC 3986 - Uniform Resource Identifier (URI): Generic Syntax

How to Escape Unicode word to be used in URL

I want to escape Unicode word to be used in URL to make HTTPRequest, for example want to convert "محمود" to "%D9%85%D8%AD%D9%85%D9%88%D8%AF" I noticed that each character has converted to two HEX
Thanks a lot
Convert to UTF-8, then url-encode chars not in [[:alnum:]].
\Url-encoding is where a character is converted into %<HIGHNIBBLE><LOWNIBBLE> form, where HIGHNIBBLE = (ch >> 4) & 0x0F and LOWNIBBLE = (ch & 0x0F).
Look into RFC 1738 (S) 2.2 for more details.
Because it looks like you're using java, you'll have to work with byte[] instead of String or char[].

Resources