Katakana character ジ in URL being encoded incorrectly - ios

I need to construct a URL with a string path received from my application server which contains the character: ジ
However, in Swift, the fileURLWithPath seems to encode it incorrectly.
let path = "ジ"
print(URL(fileURLWithPath: path))
print(URL(fileURLWithPath: path.precomposedStringWithCanonicalMapping))
Both print:
%E3%82%B7%E3%82%99
This expected URL path should be:
%E3%82%B8
What am I missing or doing wrong? Any help is appreciated.

There are two different characters, ジ and ジ. They may look the same, but they have different internal representations.
The former is “katakana letter zi”, comprised of a single Unicode scalar which percent-encodes as %E3%82%B8.
The latter is still a single Swift character, but is comprised of two Unicode scalars (the “katakana letter si” and “combining voiced sound mark”), and these two Unicode scalars percent-encode to %E3%82%B7%E3%82%99.
One can normalize characters in a string with precomposedStringWithCanonicalMapping, for example. That can convert a character with the two Unicode scalars into a character with a single Unicode scalar.
But your local file system (or, init(fileURLWithPath:), at least) decomposes diacritics. It is logical that the local file system ensures that diacritics are encoded in some consistent manner. (See Diacritics in file names on macOS behave strangely.) The fact that they are decomposed rather than precomposed is, for the sake of this discussion, a bit academic. When you send it to the server, you want it precomposed, regardless of what is happening in your local file system.
Now, you tell us that the “url path is rejected by the server”. That does not make sense. One would generally not provide a local file system URL to a remote server. One would generally extract a file name from a local file system URL and send that to the server. This might be done in a variety of ways:
You can use precomposedStringWithCanonicalMapping when adding a filename to a server URL, and it honors that mapping, unlike a file URL:
let path = "ジ" // actually `%E3%82%B7%E3%82%99` variant
let url = URL(string: "https://example.com")!
.appendingPathComponent(path.precomposedStringWithCanonicalMapping)
print(url) // https://example.com/%E3%82%B8
If sending it in the body of a request, use precomposedStringWithCanonicalMapping. E.g. if a filename in a multipart/form-data request:
body.append("--\(boundary)\r\n")
body.append("Content-Disposition: form-data; name=\"\(filePathKey)\"; filename=\"\(filename.precomposedStringWithCanonicalMapping)\"\r\n")
body.append("Content-Type: \(mimeType)\r\n\r\n")
body.append(data)
body.append("\r\n")
Now, those are two random examples of how a filename might be provided to the server. Yours may vary. But the idea is that when you provide the filename, that you precompose the string in its canonical format, rather than relying upon what a file URL in your local file system uses.
But I would advise avoiding URL(fileURLWithPath:) for manipulating strings provided by the server. It is only to be used when actually referring to files within your local file system. If you just want to percent-encode strings, I would advise using the String method addingPercentEncoding(withAllowedCharacters: .urlPathAllowed). That will not override the precomposedStringWithCanonicalMapping output.

you could try this approach using dataRepresentation:
if let path = "ジ".data(using: .utf8),
let url = URL(dataRepresentation: path, relativeTo: nil) {
print("\n---> url: \(url) \n") //---> url: %E3%82%B8
}

Related

ios sdk URL filename, remove the url addon special character "\"

In iOS sdk (swift). Lets say i have a three files "mary'sCat.mp3", "mary\s.mp3", "mary\\s.mp3"
(the special character \ is part of the real filename)
When I use the below code to get the urls
FileManager.default.contentsOfDirectory(at: documentDir, includingPropertiesForKeys: nil)
and use the below code to get filename
fileUrl.standardizedFileURL.lastPathComponent
I will have "mary\'sCat.mp3", "mary\\s.mp3", "mary\\\\s.mp3"
So.. is there any way i can remove the system addon special characters \ in a correct way? so i can get back to original file name "mary'sCat.mp3", "mary\s.mp3", "mary\\s.mp3"? I noticed that when using XCode output window you wont see the addon special character, but when you actually see it in debug watch, u will see the addon special character. That is nightmare when doing String compare as below
let mediaUrl_filename = "mary\\s.mp3" \\<-- this value from url
let db_filename = "mary\s.mp3" \\ <-- this value from sqlite
if mediaUrl_filename == db_filename {
print("It is equal")
}
So is there any way to solve this problem?
Actually '\' is used as escape sequence in String objects so while comparing the file name if the file name is 'mary\s.mp3' you will have to write like this
if mediaUrl_filename == "mary\\s.mp3"{
print("It is equal")
}
I suggest that you replace the '\' with other character like '_' in the file name to ignore such confusions.

How to encode a STRING variable into a given code page

I've got a string variable containing a text that I need to encode and write to a file, in UTF-16LE code page.
Currently the following code generates a UTF-8 file and I don't see any option in the statement OPEN DATASET to generate the file in UTF-16LE.
REPORT zmyprogram.
DATA(filename) = `/tmp/myfile`.
OPEN DATASET filename IN TEXT MODE ENCODING DEFAULT FOR OUTPUT.
TRANSFER 'HELLO WORLD' TO filename.
CLOSE DATASET filename.
I guess one solution is to first encode the string in memory, then write the encoded bytes to the file.
Generally speaking, how to encode a string of characters into a given code page, in memory?
In the first part, I explain how to encode a string of characters into a given code page (all is done in memory), and in the second part, I explain specifically how to write files to the application server in a given code page.
General way (all in memory)
If a string of characters (type STRING) has to be encoded, the result has to be stored in a string of bytes, which corresponds to the built-in data type XSTRING.
There are several possibilities which depend on the ABAP version:
Since 7.53, use the class CL_ABAP_CONV_CODEPAGE:
DATA(xstring) = cl_abap_conv_codepage=>create_out( codepage = `UTF-16LE` )->convert( source = `ABCDE` ).
Since 7.02, use the class CL_ABAP_CODEPAGE:
DATA xstring TYPE xstring.
xstring = cl_abap_codepage=>convert_to( source = `ABCDE` codepage = `UTF-16LE` ).
Before 7.02, use the class CL_ABAP_CONV_OUT_CE (documentation provided with the class):
First, instantiate the conversion object, use a SAP code page number instead of the ISO name (list of values shown hereafter):
DATA: conv TYPE REF TO CL_ABAP_CONV_OUT_CE, xstring TYPE xstring.
conv = CL_ABAP_CONV_OUT_CE=>CREATE( encoding = '4103' ). "4103 = utf-16le
Then encode the string and retrieve the bytes encoded:
conv->RESET( ).
conv->WRITE( data = `ABCDE` ).
xstring = conv->GET_BUFFER( ).
Eventually, instead of using RESET, WRITE and GET_BUFFER, the method CONVERT was added in 6.40 and retroported :
conv->CONVERT( EXPORTING data = `ABCDE` IMPORTING buffer = xstring ).
With the class CL_ABAP_CONV_OUT_CE, you need to use the number of the SAP Code Page, not the ISO name. Here are the most common SAP code pages and their equivalent ISO names:
1100: ISO-8859-1
1101: US-ASCII
1160: Windows-1252 ("ANSI")
1401: ISO-8859-2
4102: UTF-16BE
4103: UTF-16LE
4104: UTF-32BE
4105: UTF-32LE
4110: UTF-8
Etc. (the possible values are defined in the table TCP00A, in lines with column CPATTRKIND = 'H').
 
Writing a file on the application server in a given code page
In ABAP, OPEN DATASET can directly specify the target code page, most code pages are supported including UTF-8, but not other UTF (code pages 41xx) which can be done only by the solution explained in 2.3 below (by first encoding in memory).
2.1) IN TEXT MODE ENCODING ...
Possible ENCODING values:
UTF-8: in this mode, it's possible to add the Byte Order Mark if needed, via the option WITH BYTE-ORDER MARK.
DEFAULT: will be UTF-8 in a SAP "Unicode" system (that you can check via the menu System > Status > Unicode System Yes/No), NON-UNICODE otherwise.
NON-UNICODE: will depend on the current ABAP linguistic environment; for language English, it's the character encoding iso-8859-1, for language Polish, it's the character encoding iso-8859-2, etc. (the equivalences are shown in table TCP0C.)
Example in ABAP version 7.52 to write to UTF-8 with the byte order mark:
REPORT zmyprogram.
DATA(filename) = `/tmp/dataset_utf_8`.
OPEN DATASET filename IN TEXT MODE ENCODING UTF-8 WITH BYTE-ORDER MARK FOR OUTPUT.
TRY.
TRANSFER `Witaj świecie` TO filename.
CATCH cx_sy_conversion_codepage INTO DATA(lx).
" Character not supported in language code page
ENDTRY.
CLOSE DATASET filename.
Example in ABAP version 7.52 to write to iso-8859-2 (Polish language here):
REPORT zmyprogram.
SET LOCALE LANGUAGE 'L'. " Polish
DATA(filename) = `/tmp/dataset_nonunicode_pl`.
OPEN DATASET filename IN TEXT MODE ENCODING NON-UNICODE FOR OUTPUT.
TRY.
TRANSFER `Witaj świecie` TO filename.
CATCH cx_sy_conversion_codepage INTO DATA(lx).
" Character not supported in language code page
ENDTRY.
CLOSE DATASET filename.
2.2) IN LEGACY TEXT MODE CODE PAGE ...
Use any code page number except code pages 41xx (i.e. UTF-8 and other UTF; see workaround in 2.3 below).
Example in ABAP version 7.52 to write to iso-8859-2 (code page 1401) :
REPORT zmyprogram.
DATA(filename) = `/tmp/dataset_iso_8859_2`.
OPEN DATASET filename IN LEGACY TEXT MODE CODE PAGE '1401' FOR OUTPUT. " iso-8859-2
TRY.
TRANSFER `Witaj świecie` TO filename.
CATCH cx_sy_conversion_codepage INTO DATA(lx).
" Character not supported in language code page
ENDTRY.
CLOSE DATASET filename.
2.3) UTF = general way + IN BINARY MODE
Example in ABAP version 7.52:
REPORT zmyprogram.
TRY.
DATA(xstring) = cl_abap_codepage=>convert_to( source = `Witaj świecie` codepage = `UTF-16LE` ).
CATCH cx_sy_conversion_codepage INTO DATA(lx).
" Character not supported in language code page
BREAK-POINT.
ENDTRY.
DATA(filename) = `/tmp/dataset_utf_16le`.
OPEN DATASET filename IN BINARY MODE FOR OUTPUT.
TRANSFER xstring TO filename.
CLOSE DATASET filename.

Regex failing to match the punycode url

I was having the url which on converting to punycode has suffix as xn---- which all the regex present in ruby libraries fails to match.
Currently I am using validates_url_format_of ruby library.
Example Url: "https://www.θεραπευτικη-κανναβη.com.gr"
Punycode url: "https://www.xn----ylbbafnbqebomc7ba3bp1ds.com.gr"
So can you please suggest that is there any issue in the regex in the library or the issue lies in the conversion to punycode.
As per the punycode conversion rules the suffix always is xn--. So can anyone suggest what extra two -- means here
"https://www.xn----ylbbafnbqebomc7ba3bp1ds.com.gr".match(/https?:\/\/w*\.xn----.*/)
=> #<MatchData "https://www.xn----ylbbafnbqebomc7ba3bp1ds.com.gr">
Note the url matcher is not perfect
When you have a - inside the URL, the algorithm gets it duplicated and moves it to the beginning of the puny code.
For example:
áéíóú.com -> xn--1caqmy9a.com
á-é-í-ó-ú.com -> xn-------4na3c3a3cwd.com
I guess it has to do with the xn-- encoding restrictions.
This one should work for you:
(xn--)(--)*[a-z0-9]+.com.gr
The beginning of the code: (xn--)
An even number (or 0) of --: (--)*
The domain chars/numbers :([a-z0-9]+)
The TLD of the domain : .com.gr
You can add http/https if you wish
Update:
After adding numbers to the URL I found that the regex needs a fix:
(xn--)(-[-0-9]{1})*[a-z0-9]+.com.gr
á-1é-2í-3ó-4ú.gr.com -> xn---1-2-3-4-7ya6f1b6dve.gr.com

Different AES encryptors give me different results... why?

I have tried using three different libraries to AES-encrypt a string.
When I use the tool found here I get the following results:
Input: "Test"
key: "MyEncryptionKey1MyEncryptionKey1" (256 Bit)
ECB mode.
this gives me the output Cidor8Ph7pZqPw0x2AwIKw==
But when i'm using the libraries in Swift I get different results.
Using RNCryptor
When i'm using RNcryptor i'm using the following code:
class func encryptMessage(message: String) throws -> String {
guard let messageData = message.data(using: .utf8) else { return message }
let cipherData = RNCryptor.encrypt(data: messageData, withPassword: key)
return cipherData.base64EncodedString()
}
output:
AwF8a+HziYkO4iHdcI3jY8p9QAY461DVgkjkYUFMkuh4A2a8FCfa4RgS9Z37QhJGxIL0Q20RE3BL4nmLQVFOfZmBpj8l0wj9YZgqZmrkxRFYQQ==
Using AESCrypt
When i'm using RNcryptor i'm using the following code:
class func encryptMessageAES(message: String) -> String{
guard let encryptedData = AESCrypt.encrypt(message, password: key) else { return message }
return encryptedData
}
Output:
T5/mR8UT/EXeUobPTLhcFA==
Also if i'm using CryptoSwift i'm getting a third result. My co-worker who does Android always gets the same result - matching the web tool.
I am totally new to encryption and I see that i'm doing something wrong. But I can't really realize what. I should also mention that this encryption is only used to not have chat messages in raw strings showing in Firebase, for those who have access to the database.
The definition of AES is quite precise and when things don't work between different implementations it's often due various things build on top of AES. The AES algorithm itself always operates on binary data. The data you encrypt needs to be binary. The key you use to encrypt with, needs to be binary and If an IV is in play, it also needs to be binary.
In all implementations where you provide data to the implementation that are not binary, a choice have been made on how that data is transformed into a format that can be used with AES. Sometimes these transformations are just simple data conversions like hex or base64 decoding, but other times whole new concepts are in play, like deriving encryption keys from passwords.
All of your three examples uses text as input for the Key, and each implementation have made some choice on how to support that.
The first page you link to has chosen to just interpret an ASCII string as a binary key. This is a terrible choice as it (in addition to being incompatible with everything else) effectively eliminates 1-2 bits per bytes of the key, reducing the strength considerable.
In the RNCryptor example you specify the key with withPassword: key. Here the RNCryptor team have chosen to use a PBKDF2 key deriving function to make an actual AES key. This solves a different usecase, where you have an potential weak password that needs stretching to be secure for encryption. If you have an actual key, this is not the way to go.
In the case of AESCrypt you also seems to be providing a password as input. It's not clear how that would be transformed to an actual key.
There is one more value which you’ll have to set in AES which is iv. So try to find that iv in all three libraries. And also try to set same value for iv. And then you may be able to get same results from all libraries.

SSRS Goto URL decoding and encoding

I am facing problem when passing value for url through data field.
I am passing value in goto url like this
="javascript:void(window.open('file:" &Fields!url.Value &"','_blank'))"
url value = /servername/foldername/FormuláriodeCalibração.xls
After report deployed and opened in internet explorer and clicked on the url. It is changing the url like this
/servername/foldername/FormuláriodeCalibração.xls
because of which I am unable to open the file.
Please help me in this.
Finally we come up with a solution of replacing non ASCII characters of Portuguese with HTML ASCII Codes.
For e.g. this is the actual file name of the attachment
TE-5180FormuláriodeCalibração(modelo)1271440308393(2)1338379011084.xls
We replaced the Portuguese characters with HTML ASCII Codes.
TE-5180FormuláriodeCalibração(modelo)1271440308393(2)1338379011084.xls
After these changes the above modified URL is passed in the place of actual URL and when it hits the server it was decoded properly and worked as expected.

Resources