Indy is altering the binary data in my URL - delphi

I want to send some binary data over via GET using the Indy components.
So, I have an URL like www.awebsite.com/index.php?data=xxx where xxx is the binary data encoded using ParamsEncode function. After encoding the binary data is converted to something like bB7%18%11z\ so my URL is something like:
www.awebsite.com/bB7%18%11z\
I have seen that if my URL contains the backshash char (see the last char in the URL) it is replaced with slash char (/) in TIdURI.NormalizePath so my binary data is corrupted. What am I doing wrong?

Backslashes aren't allowed in URL's, and to avoid confusion between Windows and *nix systems, all backslashes are replaced by slashes to attempt to keep things working. See http://www.faqs.org/rfcs/rfc1738.html section 5, HTTP, httpurl
You could try with replacing backslashes with %5C yourself.
That said, you should either try with MIME encoding, or try to get a hang of POST requests.

You're using an old version of Indy. Backslashes are included in the UnsafeChars list that Indy uses now. Remy changed it in July 2010 with revision 4272 in the Tiburon branch:
r4272 | Indy-RemyLebeau | 2010-07-07 03:12:23 -0500 (Wed, 07 Jul 2010) | 1 line
Internal logic changes for TIdURI, and moved some sharable logic into IdGlobalProtocols.pas for later use in TIdHTTP.
It was merged into the trunk with the rest of Indy 10.5.7 with revision 4394, in September 2010.

Related

Delphi TIdHTTP POST does not encode plus sign

I have a TIdHTTP component on a form, and I am sending an http POST request to a cloud-based server. Everything works brilliantly, except for 1 field: a text string with a plus sign, e.g. 'hello world+dog', is getting saved as 'hello world dog'.
Researching this problem, I realise that a '+' in a URL is regarded as a space, so one has to encode it. This is where I'm stumped; it looks like the rest of the POST request is encoded by the TIdHTTP component, except for the '+'.
Looking at the request through Fiddler, it's coming through as 'hello%20world+dog'. If I manually encode the '+' (hello world%2Bdog), the result is 'hello%20world%252Bdog'.
I really don't know what I'm doing here, so if someone could point me in the right direction it would be most appreciated.
Other information:
I am using Delphi 2010. The component doesn't have any special settings, I presume I might need to set something? The header content-type that comes through in Fiddler is 'application/x-www-form-urlencoded'.
Then, the Delphi code:
Request:='hello world+dog';
URL :='http://............./ExecuteWithErrors';
TSL:=TStringList.Create;
TSL.Add('query='+Request);
Try
begin
IdHTTP1.ConnectTimeout:=5000;
IdHTTP1.ReadTimeout :=5000;
Reply:=IdHTTP1.Post(URL,TSL);
You are using an outdated version of Indy and need to upgrade.
TIdHTTP's webform data encoder was changed several times in late 2010. Your version appears to predate all of those changes.
In your version, TIdHTTP uses TIdURI.ParamsEncode() internally to encode the form data, where a space character is encoded as %20 and a + character is left un-encoded, thus:
hello%20world+dog
In October 2010, the encoder was updated to encode a space character as & before calling TIdURI.ParamsEncode(), thus:
hello&world+dog
In early December 2010, the encoder was updated to encode a space character as + instead, thus:
hello+world+dog
In late December 2010, the encoder was completely re-written to follow W3C's HTML specifications for application/x-www-form-urlencoded. A space character is encoded as + and a + character is encoded as %2B, thus:
hello+world%2Bdog
In all cases, the above logic is applied only if the hoForceEncodeParams flag is enabled in the TIdHTTP.HTTPOptions property (which it is by default). If upgrading is not an option, you will have to disable the hoForceEncodeParams flag and manually encode the TStringList content yourself:
Request:='hello+world%2Bdog';

Consistent Encoding for iCal file import

I'm trying to use the iCalendar gem to import some iCal files on a rails 4 site.
Sometimes the file is of type 'text/calendar;charset=utf-8' and sometimes its 'text/calendar; charset=UTF-8;'
I am retrieving it like this:
uri = URI.parse(url)
calendar = Net::HTTP.get_response(uri)
new_calendar = Icalendar.parse(calendar.body)
When its text/calendar;charset=utf-8 it works fine. but when its text/calendar; charset=UTF-8 encoded I get UTF codes in the string
SUMMARY:Tech Job Fair – City(ST) – Jul 1, 2015
ends up being
["Tech Job Fair \xE2\x80\x93 City(ST) \xE2\x80\x93 Jul 1", " 2015"]
Which is then saved to the database and that is undesirable.
Is the charset/content-type revealing the problem here or could it actually just be encoded wrong from the source?
How do I change my retrieval commands to strip those codes out effectively or tell it its a UTF string so it doesn't include them in the first place?
Update: it looks like some are text/calendar;charset=utf-8 and some are text/calendar;charset=UTF-8 and some are text/calendar; charset=UTF-8. Note the last one has a space between the two segments. Could this be causing an issue?
Update2: Opening up my three example iCal files in Notepad++ shows them encoded as "UTF-8 without BOM" in the menu.

NSString: dealing with UTF8-based API

Which characterset is the default characterset for NSString, when i get typed content from a UITextField?
I developed an app, which sends such NSStrings to a UTF8-based REST-API. At the backend, there is an utf8 based MySQL-Database and also utf8-based varchar-fields.
My POST-Request sends string data from the iOS App to the server. And with a GET-Request i receive those strings from the REST API.
Within the App, everything is printed fine. Special UTF-8-Characters like ÄÖÜ are showed correctly after sending them to the server and after receive them back.
But when i enter the mysql-console of the server of the REST API, and do a SELECT-Command at these data, there are broken characters visible.
What could be the root cause? In which characterset does Apple use a NSString?
It sounds like it is a server issue. Check that the version you are using supports UTF-8, older versions do not. See : How to support full Unicode in MySQL database
MySQL’s utf8 encoding is different from proper UTF-8 encoding. It doesn’t offer full Unicode support.
MySQL 5.5.3 (released in early 2010) introduced a new encoding called utf8mb4 which maps to proper UTF-8 and thus fully supports Unicode.
NSString has in internal representation that is essentially opaque.
The UITextField method text returns an NSString.
When you want data from a string use to send to a server use - (NSData *)dataUsingEncoding:(NSStringEncoding)encoding and specify the encoding such as NSUTF8StringEncoding.
NSData *textFieldUTF8Data = [textFieldInstance.text dataUsingEncoding: NSUTF8StringEncoding];
If, by "mysql console", you are referring to the DOS-like window in Windows, then you need:
The command "chcp" controls the "code page". chcp 65001 provides utf8, but it needs a special charset installed, too. some code pages
To set the font in the console window: Right-click on the title of the window → Properties → Font → pick Lucida Console
Also, tell the 'console' that your bytes are UTF8 by doing SET NAMES utf8mb4.

iconv C API: charset conversion from/to local encoding

I am using the iconv C API and I want iconv to detect the local encoding of the computer. Is that possible? Apparently it is because when I look in the source code, I find in the file iconv_open1.h that if the fromcode or tocode variables are empty strings ("") then the local encoding is used using the locale_charset() function call.
Someone also told me that in order to convert the locale encoding to unicode, all I needed was to use iconv_open ("UTF-8", "")
Unfortunately, I find no mention of this in the documentation.
And when I convert some iso-8859-1 text to the locale encoding (which is utf-8 on my machine), then during conversion I get errno=EILSEQ (illegal sequence). I checked and iconv_open returned no error.
If instead of the empty string in iconv_open I specify "utf-8", then I get no error. Obviously iconv failed to detect my current charset.
edit: I checked with a simple C program that puts(nl_langinfo(CODESET)) and I get ANSI_X3.4-1968 (which is ASCII). Apparently, I got a problem with charset detection.
edit: this should be related to Why is nl_langinfo(CODESET) different from locale charmap?
additional information: my program is written in Ada, and I bind at link-time to C functions. Apparently, the locale setting is not initialized the same way in the Ada runtime and C runtime.
I'll take the same answer as in Why is nl_langinfo(CODESET) different from locale charmap?
You need to first call
setlocale(LC_ALL, "");

HTML decoding in C/C++

I'm using libcurl for getting HTML pages.
I have some problems with Hebrew characters.
for example this: סלקום
gets gibberish.
How do I get Hebrew characters and not gibberish?
Do I need some HTML decoder?
Does libcurl support such operation?
Does libiconv support such operation?
I appreciate any help.
Thanks
Edit: Ok, so what you’re seeing is UTF-8 data being decoded as Windows-1252 (so the numeric character references were a red herring). Here’s a demonstration in Python:
>>> u = ''.join(map(unichr, [1505, 1500, 1511, 1493, 1501]))
>>> s = u.encode('utf-8')
>>> print s.decode('cp1255', 'replace')
׳¡׳�׳§׳•׳�
The solution to this problem depends on the environment in which the output is displayed. Merely outputting the bytes received and expecting them to be interpreted as characters leads to problems like this.
An HTML document typically contains a header tag like <meta charset=utf-8> to indicate to the browser what its encoding should be. A document served by a web server contains an HTTP header like Content-Type: text/html; charset=utf-8.
You should ask libcurl for the Content-Type HTTP header to know the encoding of the document, and then convert it to the system encoding using iconv. While in your case that would be codepage 1255, it depends on the user’s system and so you should look up the appropriate functions to detect that.
(Read Unicode and Character Sets and the character-encoding tag on this site for a wealth of further information.)

Resources