CharSet gets reset - delphi

I init my socket using the following code
Socket:=TmyidHTTP.Create(NIL);
IOHandler:=TIdIOHandlerStack.Create(Socket);
Socket.HandleRedirects:=true;
Socket.AllowCookies:=FALSE;
Socket.ProtocolVersion:=pv1_1;
Socket.HTTPOptions:=Socket.HTTPOptions+[hoKeepOrigProtocol]+[hoNoProtocolErrorException]+[hoWantProtocolErrorContent];
Socket.Request.CustomHeaders.FoldLines:=FALSE;
Socket.Request.CharSet:='utf-8';
Socket.Request.ContentType:='text/txt';
Socket.Request.Accept:='*/*';
// Socket.ReuseSocket:=rsTrue;
Socket.Request.Connection:='keep-alive';
(TmyidHTTP only publishes the protected DoRequest)
but when I look into the protocol I see the following header: charset=ISO-8859-1.
only if I specify the Socket.Request.CharSet:='utf-8'; again before a post, then it works...
any ideas what is resetting the CharSet??

This is happening because:
you are using an older release of Indy 10 1.
you are setting the ContentType property after setting the CharSet property.
you are not specifying a charset attribute value on the ContentType property.
So, in this case, the ContentType property setter is resetting the CharSet property with a default value, instead of preserving the current value.
1 This was already fixed back in July 2019:
Patch for Embarcadero RSP-13703. Updating various ContentType property setters to preserve an existing CharSet property value if it is already set and a new charset attribute is not being specified.
You should update your installed copy of Indy with the latest code from Indy's GitHub repo (or, at least, apply the same fix to your existing copy and then recompile Indy).
Otherwise, you can simply change your code to either:
swap the order of the two property assignments:
Socket.Request.ContentType:='text/txt';
Socket.Request.CharSet:='utf-8';
specify a charset attribute value on the ContentType assignment, which will update the CharSet property accordingly:
Socket.Request.ContentType:='text/txt;charset=utf-8';

Related

AnsiString header file

I am migrating from an old version of Borland C++ to the newest. In my code I had used String (AnsiString). In the new compiler it does not recognize String or AnsiString as a valid type, so I put in vcl.h in the file where I use String. Now I get 103 errors, all saying "reference to byte is ambiguous" (various system .h files). Is vcl.h not the header for AnsiString?
thanks
The actual header file that defines AnsiString is dstring.h, and always has been (the header file that defines UnicodeString is ustring.h). The System::String alias is defined in sysmac.h.
vcl.h includes these headers for you. If you are getting errors, either you did not create a VCL project properly to begin with, or your project is misconfigured.

JSoup.clean() is not preserving relative URLs

I have tried:
Whitelist.relaxed();
Whitelist.relaxed().preserveRelativeLinks(true);
Whitelist.relaxed().addProtocols("a","href","#","/","http","https","mailto","ftp");
Whitelist.relaxed().addProtocols("a","href","#","/","http","https","mailto","ftp").preserveRelativeLinks(true);
None of them work: When I try to clean a relative url, like test I get the href attribute removed (<a>test</a>).
I am using JSoup 1.8.2.
Any ideas?
The problem most likely stems from the call of the clean method. If you give the base URI all should work as expected:
String html = ""
+ "test"
+ "<invalid>stuff</invalid>"
+ "<h2>header1</h2>";
String cleaned = Jsoup.clean(html, "http://base.uri", Whitelist.relaxed().preserveRelativeLinks(true));
System.out.println(cleaned);
The above works and keeps the relative links. With String cleaned = Jsoup.clean(html, Whitelist.relaxed().preserveRelativeLinks(true)) however the link is deleted.
Note the documentation of Whitelist.preserveRelativeLinks(true):
Note that when handling relative links, the input document must have
an appropriate base URI set when parsing, so that the link's protocol
can be confirmed. Regardless of the setting of the preserve relative
links option, the link must be resolvable against the base URI to an
allowed protocol; otherwise the attribute will be removed.

Delphi TIdHTTP POST does not encode plus sign

I have a TIdHTTP component on a form, and I am sending an http POST request to a cloud-based server. Everything works brilliantly, except for 1 field: a text string with a plus sign, e.g. 'hello world+dog', is getting saved as 'hello world dog'.
Researching this problem, I realise that a '+' in a URL is regarded as a space, so one has to encode it. This is where I'm stumped; it looks like the rest of the POST request is encoded by the TIdHTTP component, except for the '+'.
Looking at the request through Fiddler, it's coming through as 'hello%20world+dog'. If I manually encode the '+' (hello world%2Bdog), the result is 'hello%20world%252Bdog'.
I really don't know what I'm doing here, so if someone could point me in the right direction it would be most appreciated.
Other information:
I am using Delphi 2010. The component doesn't have any special settings, I presume I might need to set something? The header content-type that comes through in Fiddler is 'application/x-www-form-urlencoded'.
Then, the Delphi code:
Request:='hello world+dog';
URL :='http://............./ExecuteWithErrors';
TSL:=TStringList.Create;
TSL.Add('query='+Request);
Try
begin
IdHTTP1.ConnectTimeout:=5000;
IdHTTP1.ReadTimeout :=5000;
Reply:=IdHTTP1.Post(URL,TSL);
You are using an outdated version of Indy and need to upgrade.
TIdHTTP's webform data encoder was changed several times in late 2010. Your version appears to predate all of those changes.
In your version, TIdHTTP uses TIdURI.ParamsEncode() internally to encode the form data, where a space character is encoded as %20 and a + character is left un-encoded, thus:
hello%20world+dog
In October 2010, the encoder was updated to encode a space character as & before calling TIdURI.ParamsEncode(), thus:
hello&world+dog
In early December 2010, the encoder was updated to encode a space character as + instead, thus:
hello+world+dog
In late December 2010, the encoder was completely re-written to follow W3C's HTML specifications for application/x-www-form-urlencoded. A space character is encoded as + and a + character is encoded as %2B, thus:
hello+world%2Bdog
In all cases, the above logic is applied only if the hoForceEncodeParams flag is enabled in the TIdHTTP.HTTPOptions property (which it is by default). If upgrading is not an option, you will have to disable the hoForceEncodeParams flag and manually encode the TStringList content yourself:
Request:='hello+world%2Bdog';

iconv C API: charset conversion from/to local encoding

I am using the iconv C API and I want iconv to detect the local encoding of the computer. Is that possible? Apparently it is because when I look in the source code, I find in the file iconv_open1.h that if the fromcode or tocode variables are empty strings ("") then the local encoding is used using the locale_charset() function call.
Someone also told me that in order to convert the locale encoding to unicode, all I needed was to use iconv_open ("UTF-8", "")
Unfortunately, I find no mention of this in the documentation.
And when I convert some iso-8859-1 text to the locale encoding (which is utf-8 on my machine), then during conversion I get errno=EILSEQ (illegal sequence). I checked and iconv_open returned no error.
If instead of the empty string in iconv_open I specify "utf-8", then I get no error. Obviously iconv failed to detect my current charset.
edit: I checked with a simple C program that puts(nl_langinfo(CODESET)) and I get ANSI_X3.4-1968 (which is ASCII). Apparently, I got a problem with charset detection.
edit: this should be related to Why is nl_langinfo(CODESET) different from locale charmap?
additional information: my program is written in Ada, and I bind at link-time to C functions. Apparently, the locale setting is not initialized the same way in the Ada runtime and C runtime.
I'll take the same answer as in Why is nl_langinfo(CODESET) different from locale charmap?
You need to first call
setlocale(LC_ALL, "");

How can I read the DOCTYPE SYSTEM identifier with Delphi?

For a document which has a DOCTPYE declaration like
<!DOCTYPE RootElement SYSTEM "file.dtd">
Delphi 2009, using MSXML, reports that the systemId is empty (""):
Assert(Doc.DOMDocument.doctype.systemId <> ''); // fails!
while
Assert(Doc.DOMDocument.doctype.name = 'RootElement'); // ok
correctly verifies that the DOCTYPE name id "RootElement".
Is this a bug in Delphi (or my code) or am I using a version of MSXML which does not support this property?
MSXML's DocumentType implementation is completely missing the DocumentType properties publicId, systemId and internalSubset. MSDN api ref; the missing properties are specifically called out in MS-DOM2CX.
If you need this information you might have to try a different DOM implementation. Here's one. If you can use .NET classes, System.Xml supports it too.
In case ProhibitDTD property is True try setting it to False.
Here's an article with more details.

Resources