Delphi & Indy & utf8 - delphi

i have a problem to access into websites whit utf8 charset, for example when i try to accesso at this www
Click for example
all utf8 characters are not correctly codified.
This is my access routine:
var
Web : TIdHTTP;
Sito : String;
hIOHand : TIdSSLIOHandlerSocketOpenSSL;
begin
Url := TIdURI.URLEncode(Url);
try
Web := TIdHTTP.Create(nil);
hIOHand := TIdSSLIOHandlerSocketOpenSSL.Create(nil);
hIOHand.DefStringEncoding := IndyTextEncoding_UTF8;
hIOHand.SSLOptions.SSLVersions := [sslvTLSv1,sslvTLSv1_1,sslvTLSv1_2,sslvSSLv2,sslvSSLv3,sslvSSLv23];
Web.IOHandler := hIOHand;
Web.Request.CharSet := 'utf-8';
Web.Request.UserAgent := INET_USERAGENT; //Custom user agent string
Web.RedirectMaximum := INET_REDIRECT_MAX; //Maximum redirects
Web.HandleRedirects := INET_REDIRECT_MAX <> 0; //Handle redirects
Web.ReadTimeOut := INET_TIMEOUT_SECS * 1000; //Read timeout msec
try
Sito := Web.Get(Url);
Web.Disconnect;
except
on e : exception do
Sito := 'ERR: ' +Url+#32+e.Message;
end;
finally
Web.Free;
hIOHand.Free;
end;
I try all solution but in the Sito var i find alltime wrong characthers, for example correct value of the "name" is
"name": "Aire d'adhésion du Parc national du Mercantour",
but after the Get instruction i have
"name": "Aire d'adhésion du Parc national du Mercantour",
Do you have idea where is my error?
Thankyou all!

In Delphi 2009+, which includes XE6, string is a UTF-16 encoded UnicodeString.
You are using the overloaded version of TIdHTTP.Get() that returns a string. It decodes the sent text to UTF-16 using whatever charset is reported by the response. If the text is not decoding properly, it likely means the response is not reporting a correct charset. If the wrong charset is used, the text will not decode properly.
The URL in question is, in fact, sending a response Content-Type header that is set to application/json without specifying a charset at all. The default charset for application/json is UTF-8, but Indy does not know that, so it ends up using its own internal default instead, which is not UTF-8. That is why the text is not decoding properly when non-ASCII characters are present.
In which case, if you KNOW the charset will always be UTF-8, you have a few workarounds to choose from:
you can set Indy's default charset to UTF-8 by setting the global GIdDefaultTextEncoding variable in the IdGlobal unit:
GIdDefaultTextEncoding := encUTF8;
you can use the TIdHTTP.OnHeadersAvailable event to change the TIdHTTP.Response.Charset property to 'utf-8' if it is blank or incorrect.
Web.OnHeadersAvailable := CheckResponseCharset;
...
procedure TMyClass.CheckResponseCharset(Sender: TObject; AHeaders: TIdHeaderList; var VContinue: Boolean);
var
Response: TIdHTTPResponse;
begin
Response := TIdHTTP(Sender).Response;
if IsHeaderMediaType(Response.ContentType, 'application/json') and (Response.Charset = '') then
Response.Charset := 'utf-8';
VContinue := True;
end;
you can use the other overloaded version of TIdHTTP.Get() that fills an output TStream instead of returning a string. Using a TMemoryStream or TStringStream, you can decode the raw bytes yourself using UTF-8:
MStrm := TMemoryStream.Create;
try
Web.Get(Url, MStrm);
MStrm.Position := 0;
Sito := ReadStringFromStream(MStrm, IndyTextEncoding_UTF8);
finally
SStrm.Free;
end;
SStrm := TStringStream.Create('', TEncoding.UTF8);
try
Web.Get(Url, SStrm);
Sito := SStrm.DataString;
finally
SStrm.Free;
end;

Related

Encoding problem while processing a multipart request on Indy HTTP server

I have a web server based on TIdHTTPServer. It is built in Delphi Sydney. From a webpage I'm receiving following multipart/form-data post stream:
-----------------------------16857441221270830881532229640
Content-Disposition: form-data; name="d"
83AAAFUaVVs4Q07z
-----------------------------16857441221270830881532229640
Content-Disposition: form-data; name="dir"
Upload
-----------------------------16857441221270830881532229640
Content-Disposition: form-data; name="file_name"; filename="česká tečka.png"
Content-Type: image/png
PNG_DATA
-----------------------------16857441221270830881532229640--
Problem is that text parts are not received correctly. I read the Indy MIME decoding of Multipart/Form-Data Requests returns trailing CR/LF and changed transfer encoding to 8bit which helps to receive file correctly, but received file name is still wrong (dir should be Upload and filename should be česká tečka.png).
d=83AAAFUaVVs4Q07z
dir=UploadW
??esk?? te??ka.png 75
To demonstrate the issue I simplified my code to a console app (please note that the MIME.txt file contains the same as is in post stream above):
program MIMEMultiPartTest;
{$APPTYPE CONSOLE}
{$R *.res}
uses
System.Classes, System.SysUtils,
IdGlobal, IdCoder, IdMessage, IdMessageCoder, IdGlobalProtocols, IdCoderMIME, IdMessageCoderMIME,
IdCoderQuotedPrintable, IdCoderBinHex4;
procedure ProcessAttachmentPart(var Decoder: TIdMessageDecoder; var MsgEnd: Boolean);
var
MS: TMemoryStream;
Name: string;
Value: string;
NewDecoder: TIdMessageDecoder;
begin
MS := TMemoryStream.Create;
try
// http://stackoverflow.com/questions/27257577/indy-mime-decoding-of-multipart-form-data-requests-returns-trailing-cr-lf
TIdMessageDecoderMIME(Decoder).Headers.Values['Content-Transfer-Encoding'] := '8bit';
TIdMessageDecoderMIME(Decoder).BodyEncoded := False;
NewDecoder := Decoder.ReadBody(MS, MsgEnd);
MS.Position := 0; // nutne?
if Decoder.Filename <> EmptyStr then // je to atachment
begin
try
Writeln(Decoder.Filename + ' ' + IntToStr(MS.Size));
except
FreeAndNil(NewDecoder);
Writeln('Error processing MIME');
end;
end
else // je to parametr
begin
Name := ExtractHeaderSubItem(Decoder.Headers.Text, 'name', QuoteHTTP);
if Name <> EmptyStr then
begin
Value := string(PAnsiChar(MS.Memory));
try
Writeln(Name + '=' + Value);
except
FreeAndNil(NewDecoder);
Writeln('Error processing MIME');
end;
end;
end;
Decoder.Free;
Decoder := NewDecoder;
finally
MS.Free;
end;
end;
function ProcessMultiPart(const ContentType: string; Stream: TStream): Boolean;
var
Boundary: string;
BoundaryStart: string;
BoundaryEnd: string;
Decoder: TIdMessageDecoder;
Line: string;
BoundaryFound: Boolean;
IsStartBoundary: Boolean;
MsgEnd: Boolean;
begin
Result := False;
Boundary := ExtractHeaderSubItem('multipart/form-data; boundary=---------------------------16857441221270830881532229640', 'boundary', QuoteHTTP);
if Boundary <> EmptyStr then
begin
BoundaryStart := '--' + Boundary;
BoundaryEnd := BoundaryStart + '--';
Decoder := TIdMessageDecoderMIME.Create(nil);
try
TIdMessageDecoderMIME(Decoder).MIMEBoundary := Boundary;
Decoder.SourceStream := Stream;
Decoder.FreeSourceStream := False;
BoundaryFound := False;
IsStartBoundary := False;
repeat
Line := ReadLnFromStream(Stream, -1, True);
if Line = BoundaryStart then
begin
BoundaryFound := True;
IsStartBoundary := True;
end
else
begin
if Line = BoundaryEnd then
BoundaryFound := True;
end;
until BoundaryFound;
if BoundaryFound and IsStartBoundary then
begin
MsgEnd := False;
repeat
TIdMessageDecoderMIME(Decoder).MIMEBoundary := Boundary;
Decoder.SourceStream := Stream;
Decoder.FreeSourceStream := False;
Decoder.ReadHeader;
case Decoder.PartType of
mcptText,
mcptAttachment:
begin
ProcessAttachmentPart(Decoder, MsgEnd);
end;
mcptIgnore:
begin
Decoder.Free;
Decoder := TIdMessageDecoderMIME.Create(nil);
end;
mcptEOF:
begin
Decoder.Free;
MsgEnd := True;
end;
end;
until (Decoder = nil) or MsgEnd;
Result := True;
end
finally
Decoder.Free;
end;
end;
end;
var
Stream: TMemoryStream;
begin
Stream := TMemoryStream.Create;
try
Stream.LoadFromFile('MIME.txt');
ProcessMultiPart('multipart/form-data; boundary=---------------------------16857441221270830881532229640', Stream);
finally
Stream.Free;
end;
Readln;
end.
Could someone help me what is wrong with my code? Thank you.
Your call to ExtractHeaderSubItem() in ProcessMultiPart() is wrong, it needs to pass in the ContentType string parameter, not a hard-coded string literal.
Your call to ExtractHeaderSubItem() in ProcessAttachmentPart() is also wrong, it needs to pass in only the content of just the Content-Disposition header, not the entire Headers.Text. ExtractHeaderSubItem() is designed to only operate on 1 header at a time.
Regarding the dir MIME part, the reason the body data ends up as 'UploadW' instead of 'Upload' is because you are not taking MS.Size into account when assigning MS.Memory to your Value string. The TMemoryStream data is NOT null-terminated! So, you will need to use SetString() instead of the := operator, eg:
var
Value: AnsiString;
...
SetString(Value, PAnsiChar(MS.Memory), MS.Size);
Regarding the Decoder.FileName, that value is not affected by the Content-Transfer-Encoding header at all. MIME headers simply do not allow unencoded Unicode characters. Currently, Indy's MIME decoder supports RFC2047-style encodings for Unicode characters in headers, per RFC 7578 Section 5.1.3, but your stream data is not using that format. It looks like your data is using raw UTF-8 octets 1 (which 5.1.3 also mentions as a possible encoding, but the decoder does not currently look for). So, you may have to manually extract and decode the original filename yourself as needed. If you know the filename will always be encoded as UTF-8, you could try setting Indy's global IdGlobal.GIdDefaultTextEncoding variable to encUTF8 (it defaults to encASCII), and then the Decoder.FileName should be accurate. But, that is a global setting, so may have unwanted side effects elsewhere in Indy, depending on context and data. So, I would suggest setting GIdDefaultTextEncoding to enc8Bit instead, so that unwanted side effects are minimized, and the Decoder.FileName will contain the original raw bytes as-is (just extended to 16-bit chars). That way, you can recover the original filename bytes by simply passing the Decoder.FileName as-is to IndyTextEncoding_8Bit.GetBytes(), and then decode them as needed (such as with IndyTextEncoding_UTF8.GetString(), after validating the bytes are valid UTF-8).
1: However, ÄŤeská teÄŤka.png is not the correct UTF-8 form of česká tečka.png, it looks like that data may have been double-encoded, ie česká tečka.png was UTF-8 encoded, and then the resulting bytes were UTF-8 encoded again
Nowadays the filename parameter should only be added for fallback reasons, while filename* should be added to clearly tell which text encoding the filename has. Otherwise each client only guesses and supposes. Which may go wrong.
RFC 5987 §3.2 defines the format of that filename* parameter:
charset ' [ language ] ' value-chars
...whereas:
charset can be UTF-8 or ISO-8859-1 or any MIME-charset
...and the language is optional.
RFC 6266 §4.3 defines that filename* should be used and comes up with examples in §5:
Content-Disposition: attachment; filename="EURO rates"; filename*=utf-8''%e2%82%ac%20rates`
Do you spot the asterisk *? Do you spot the text encoding utf-8? Do you spot the two apostrophes '', designating no further specified language (see RFC 5646 § 2.1)? And then come the octets according to the specified text encoding: either percent-encoded, or (if allowed) in plain ASCII.
Other examples:
Content-Disposition: attachment; filename="green.jpg"; filename*=UTF-8''%e3%82%b0%e3%83%aa%e3%83%bc%e3%83%b3.jpg
will present "green.jpg" on older web browsers and "グリーン.jpg" on compliant web browsers.
Content-Disposition: attachment; filename="Gruesse.txt"; filename*=ISO-8859-1''Gr%fc%dfe.txt
will present "Gruesse.txt" on older web browsers and "Grüße.txt" on compliant web browsers.
Content-Disposition: attachment; filename="Hello.png"; filename*=Shift_JIS'en-US'Howdy.png; filename*=EUC-KR'de'Hallo.png
will present "Hello.png" on older web browsers, and "Howdy.png" on compliant web browsers where the preferred language is set to American English, and "Hallo.png" on compliant ones with a preferred language of German (Deutsch). Note that the different text encodings are unbound to percent encoding as long as the octets are within the allowed range (and latin letters are, along with the dot).
From my experiences nobody cares for this nice feature - everybody just shoves UTF-8 into filename, which still violates the standard - no matter how many clients silently support it. Linking How to encode the filename parameter of Content-Disposition header in HTTP? and PHP: RFC-2231 How to encode UTF-8 String as Content-Disposition filename.

Creating an amazon MWS signature with Delphi XE7 and Indy classes

I need to generate a signature for amazon MWS and decided to find a solution with only the components and classes which come with Delphi. Because I am using Indy for the HTTP post itself, it seemed to be a good idea to use Indy classes for the calculation of the RFC 2104-compliant HMAC.
For others, who work on amazon integration, the creation of the "Canonicalized Query String" is explained in the amazon tutorial very well: http://docs.developer.amazonservices.com/en_DE/dev_guide/DG_ClientLibraries.html
Be careful, just use #10 for line breaking, as #13#10 or #13 will fail with a wrong signature. It may also be important to add ":443" to the amazon Endpoint (Host), depending on the TIdHttp version, as explained in question #23573799.
To create a valid signature, we have to calculate a HMAC with SHA256 with the query string and the SecretKey we got from amazon after registration and then, the result has to be encoded in BASE64.
The query string is properly generated and identical to the string the amazon Scratchpad creates. But the call failed because the signature is not correct.
After some tests I realized that the signature I got from my query string is not the same as the result I got when I used PHP to generate it. The PHP result is considered as correct, as my PHP solution simply works with amazon since a long time, the Delphi result is different, which is not correct.
To make testing easier, I use '1234567890' as value for the query string and 'ABCDEFG' as replacement for the SecretKey. When the result I get with Delphi is the same as the result I get with PHP, the problem should be solved, I believe.
Here is how I get the correct result with PHP:
echo base64_encode(hash_hmac('sha256', '1234567890', 'ABCDEFG', TRUE));
This shows a result of
aRGlc3RY1pKmKX0hvorkVKNcPigiJX2rksqXzlAeCLg=
The following Delphi XE7 code returns the wrong result, while using the indy version that comes with Delphi XE7:
uses
  IdHash, IdHashSHA, IdHMACSHA1, IdSSLOpenSSL, IdGlobal, IdCoderMIME;
function GenerateSignature(const AData, AKey: string): string;
var
   AHMAC: TIdBytes;
begin
     IdSSLOpenSSL.LoadOpenSSLLibrary;
     With TIdHMACSHA256.Create do
      try
         Key:= ToBytes(AKey, IndyTextEncoding_UTF16LE);
         AHMAC:= HashValue(ToBytes(AData, IndyTextEncoding_UTF16LE));
         Result:= TIdEncoderMIME.EncodeBytes(AHMAC);
      finally
         Free;
      end;
end;
Here the result, which is shown in a Memo with
Memo.Lines.Text:= GenerateSignature('1234567890', 'ABCDEFG');
is:
jg6Oddxvv57fFdcCPXrqGWB9YD5rSvtmGnZWL0X+y0Y=
I believe the problem has something to do with the encodings, so I have done some research around that. As the amazon tutorial (link see above) tells, amazon expects a utf8 encoding.
As the Indy function "ToBytes" expect a string, which is a UnicodeString in my Delphi version, I quit testing with other string types as UTF8String for parameters or variables, but I just do not know where utf8 should come into place. Also I do not know if the encodings I use in the code above are the correct ones.
I choose UTF16LE because UnicodeString is utf16 encoded (see http://docwiki.embarcadero.com/RADStudio/Seattle/en/String_Types_(Delphi) for details) and LE (Little-Endian) is most commonly used on modern machines. Also the TEncodings of Delphi itself there is "Unicode" and "BigEndianUnicode", so "Unicode" seems to be LE and some kind of "standard" Unicode.
Of course I tested to use IndyTextEncoding_UTF8 instead of IndyTextEncoding_UTF16LE in the code above, but it does not work anyway.
Because
TIdEncoderMIME.EncodeBytes(AHMAC);
is writing the TidBytes to a Stream first and then reading it all with 8bit encoding, this could be a source of problem also, so I also tested with
Result:= BytesToString(AHMAC, IndyTextEncoding_UTF16LE);
Result:= TIdEncoderMIME.EncodeString(Result, IndyTextEncoding_UTF16LE);
but the result is the same.
If you like to see the main code for creating the request, here it is:
function TgboAmazon.MwsRequest(const AFolder, AVersion: string;
const AParams: TStringList; const AEndPoint: string): string;
var
i: Integer;
SL: TStringList;
AMethod, AHost, AURI, ARequest, AStrToSign, APath, ASignature: string;
AKey, AValue, AQuery: string;
AHTTP: TIdHTTP;
AStream, AResultStream: TStringStream;
begin
AMethod:= 'POST';
AHost:= AEndPoint;
AURI:= '/' + AFolder + '/' + AVersion;
AQuery:= '';
SL:= TStringList.Create;
try
SL.Assign(AParams);
SL.Values['AWSAccessKeyId']:= FAWSAccessKeyId;
SL.Values['SellerId']:= FSellerId;
FOR i:=0 TO FMarketplaceIds.Count-1 DO
begin
SL.Values['MarketplaceId.Id.' + IntToStr(i+1)]:= FMarketplaceIds[i];
end;
SL.Values['Timestamp']:= GenerateTimeStamp(Now);
SL.Values['SignatureMethod']:= 'HmacSHA256';
SL.Values['SignatureVersion']:= '2';
SL.Values['Version']:= AVersion;
FOR i:=0 TO SL.Count-1 DO
begin
AKey:= UrlEncode(SL.Names[i]);
AValue:= UrlEncode(SL.ValueFromIndex[i]);
SL[i]:= AKey + '=' + AValue;
end;
SortList(SL);
SL.Delimiter:= '&';
AQuery:= SL.DelimitedText;
AStrToSign:= AMethod + #10 + AHost + #10 + AURI + #10 + AQuery;
TgboUtil.ShowMessage(AStrToSign);
ASignature:= GenerateSignature(AStrToSign, FAWSSecretKey);
TgboUtil.ShowMessage(ASignature);
APath:= 'https://' + AHost + AURI + '?' + AQuery + '&Signature=' + Urlencode(ASignature);
TgboUtil.ShowMessage(APath);
finally
SL.Free;
end;
AHTTP:= TIdHTTP.Create(nil);
try
AHTTP.IOHandler := TIdSSLIOHandlerSocketOpenSSL.Create(AHTTP);
AHTTP.Request.ContentType:= 'text/xml';
AHTTP.Request.Connection:= 'Close';
AHTTP.Request.CustomHeaders.Add('x-amazon-user-agent: MyApp/1.0 (Language=Delphi/XE7)');
AHTTP.HTTPOptions:= AHTTP.HTTPOptions + [hoKeepOrigProtocol];
AHTTP.ProtocolVersion:= pv1_0;
AStream:= TStringStream.Create;
AResultStream:= TStringStream.Create;
try
AHTTP.Post(APath, AStream, AResultStream);
Result:= AResultStream.DataString;
ShowMessage(Result);
finally
AStream.Free;
AResultStream.Free;
end;
finally
AHTTP.Free;
end;
end;
Urlencode and GenerateTimestamp are my own functions and they do what the name promises, SortList is my own procedure which sorts the stringlist in a byte order as requested by amazon, TgboUtil.ShowMessage is my own ShowMessage alternative which shows the complete message with all characters and is used for debugging only. The http protocol is 1.0 for testing only, because I got a 403 (permission denied) as HTTP return earlier. I just wanted to exclude this as problem as the indy documentation said, that protocol version 1.1 is considered incomplete because of problematic server answers.
There are several posts regarding the amazon mws topic here, but that specific problem seems to be new.
This question here may help someone who just not have come so far, but also I hope that someone can provide a solution to just get the same signature value in Delphi as I got with PHP.
Thank you in advance.
Using the latest SVN snapshot of Indy 10, I am not able to reproduce your signature problem. When using UTF-8, your example key+value data produces the same result in Delphi as the PHP output. So, your GenerateSignature() function is fine, provided that:
you use IndyTextEncoding_UTF8 instead of IndyTextEncoding_UTF16LE.
you make sure that AData and AKey contain valid input data.
Also, you should make sure that TIdHashSHA256.IsAvailable returns true, otherwise TIdHashHMACSHA256.HashValue() will fail.
this could happen, for instance, if OpenSSL fails to load.
Try this instead:
function GenerateSignature(const AData, AKey: string): string;
var
AHMAC: TIdBytes;
begin
IdSSLOpenSSL.LoadOpenSSLLibrary;
if not TIdHashSHA256.IsAvailable then
raise Exception.Create('SHA-256 hashing is not available!');
with TIdHMACSHA256.Create do
try
Key := IndyTextEncoding_UTF8.GetBytes(AKey);
AHMAC := HashValue(IndyTextEncoding_UTF8.GetBytes(AData));
finally
Free;
end;
Result := TIdEncoderMIME.EncodeBytes(AHMAC);
end;
That being said, there are quite a few problems with your MwsRequest() function:
you are leaking the TIdSSLIOHandlerSocketOpenSSL object. You are not assigning an Owner to it, and TIdHTTP does not take ownership when assigned to its IOHandler property. In fact, assigning the IOHanlder is actually optional in your example, see New HTTPS functionality for TIdHTTP for why.
you are setting AHTTP.Request.ContentType to the wrong media type. You are not sending XML data, so don't set the media type to 'text/xml'. In this situation, you need to set it to 'application/x-www-form-urlencoded' instead.
when calling AHTTP.Post(), your AStream stream is empty, so you are not actually posting any data to the server. You are putting your AQuery data in the query string of the URL itself, but it actually belongs in AStream instead. If you want to sent the data in the URL query string, you have to use TIdHTTP.Get() instead of TIdHTTP.Post(), and change your AMethod value to 'GET' instead of 'POST'.
you are using the version of TIdHTTP.Post() that fills an output TStream. You are using a TStringStream to convert the response to a String without any regard to the actual charset used by the response data. Since you are not specifying any TEncoding object in the TStringStream constructor, it will use TEncoding.Default for decoding, which may not (and likely will not) match the response's actual charset. You should instead use the other version of Post() that returns a String so TIdHTTP can decode the response data based on the actual charset reported by the HTTPS server.
Try something more like this instead:
function TgboAmazon.MwsRequest(const AFolder, AVersion: string;
const AParams: TStringList; const AEndPoint: string): string;
var
i: Integer;
SL: TStringList;
AMethod, AHost, AURI, AQuery, AStrToSign, APath, ASignature: string;
AHTTP: TIdHTTP;
begin
AMethod := 'POST';
AHost := AEndPoint;
AURI := '/' + AFolder + '/' + AVersion;
AQuery := '';
SL := TStringList.Create;
try
SL.Assign(AParams);
SL.Values['AWSAccessKeyId'] := FAWSAccessKeyId;
SL.Values['SellerId'] := FSellerId;
for i := 0 to FMarketplaceIds.Count-1 do
begin
SL.Values['MarketplaceId.Id.' + IntToStr(i+1)] := FMarketplaceIds[i];
end;
SL.Values['Timestamp'] := GenerateTimeStamp(Now);
SL.Values['SignatureMethod'] := 'HmacSHA256';
SL.Values['SignatureVersion'] := '2';
SL.Values['Version'] := AVersion;
SL.Values['Signature'] := '';
SortList(SL);
for i := 0 to SL.Count-1 do
SL[i] := UrlEncode(SL.Names[i]) + '=' + UrlEncode(SL.ValueFromIndex[i]);
SL.Delimiter := '&';
SL.QuoteChar := #0;
SL.StrictDelimiter := True;
AQuery := SL.DelimitedText;
finally
SL.Free;
end;
AStrToSign := AMethod + #10 + Lowercase(AHost) + #10 + AURI + #10 + AQuery;
TgboUtil.ShowMessage(AStrToSign);
ASignature := GenerateSignature(AStrToSign, FAWSSecretKey);
TgboUtil.ShowMessage(ASignature);
APath := 'https://' + AHost + AURI;
TgboUtil.ShowMessage(APath);
AHTTP := TIdHTTP.Create(nil);
try
// this is actually optional in this example...
AHTTP.IOHandler := TIdSSLIOHandlerSocketOpenSSL.Create(AHTTP);
AHTTP.Request.ContentType := 'application/x-www-form-urlencoded';
AHTTP.Request.Connection := 'close';
AHTTP.Request.UserAgent := 'MyApp/1.0 (Language=Delphi/XE7)';
AHTTP.Request.CustomHeaders.Values['x-amazon-user-agent'] := 'MyApp/1.0 (Language=Delphi/XE7)';
AHTTP.HTTPOptions := AHTTP.HTTPOptions + [hoKeepOrigProtocol];
AHTTP.ProtocolVersion := pv1_0;
AStream := TStringStream.Create(AQuery + '&Signature=' + Urlencode(ASignature);
try
Result := AHTTP.Post(APath, AStream);
ShowMessage(Result);
finally
AStream.Free;
end;
finally
AHTTP.Free;
end;
end;
However, since the response is documented as being XML, it would be better to return the response to the caller as a TStream (not using TStringStream, though) or TBytes instead of as a String. That way, instead of Indy decoding the bytes, let your XML parser decode the raw bytes on its own. XML has its own charset rules that are separate from HTTP, so let the XML parser do its job for you:
procedure TgboAmazon.MwsRequest(...; Response: TStream);
var
...
begin
...
AHTTP.Post(APath, AStream, Response);
...
end;

Authorization failure TIdHTTP over HTTPS when password is russian

I try to test my webservice with the TIdHTTP (Indy 10.6.0 and Delphi XE5) by this code:
GIdDefaultTextEncoding := encUTF8;
HTTP.IOHandler.DefStringEncoding := IndyTextEncoding_UTF8;
Http.Request.UserName := AUser;
Http.Request.Password := APass;
Http.Request.Accept := 'text/javascript';
Http.Request.ContentType := 'application/json';
Http.Request.ContentEncoding := 'utf-8';
Http.Request.URL := 'https://sameService';
Http.MaxAuthRetries := 1;
Http.Request.BasicAuthentication := True;
TIdSSLIOHandlerSocketOpenSSL(HTTP.IOHandler).SSLOptions.Method := sslvSSLv3;
HTTP.HandleRedirects := True;
"AUser" and "APass" in UTF-8. When "APass" have same Russian chars I can't login.
By "HTTP Analyze" I see:
...
Authorization: Basic cDh1c2VyOj8/Pz8/PzEyMw==
Decode from Base 64 (base64decode.org) we can see:
p8user:??????123
Why DefStringEncoding not work ?
TIdHTTP's authentication system has no concept of TIdIOHandler or its DefStringEncoding property.
Internally, TIdBasicAuthentication uses TIdEncoderMIME.Encode(), but without specifying any encoding. TIdEncoder.Encode() defaults to 8bit encoding, and thus is not affected by GIdDefaultTextEncoding.
If you need to send a UTF-8 encoded password with BASIC authentication, you will have to encode the UTF-8 data manually and store the resulting octets into a string, then the 8bit encoder can process the octets as-is, eg:
Http.Request.Password := BytesToStringRaw(IndyTextEncoding_UTF8.GetBytes(APass));
On the other hand, Indy's DIGEST authentication, for instance, uses TIdHashMessageDigest5.HashStringAsHex(), and TIdHash.HashString() does not default to any specific encoding, it depends on GIdDefaultTextEncoding.
So, you have to be careful about how you encode passwords, based on which authentications you use. To account for the discrepency, what you could try is not encode TIdHTTP.Request.Password itself, but instead encode the password inside the TIdHTTP.OnAuthorization event instead when BASIC authentication is being used, eg:
Http.Request.Password := APass;
...
procedure TMyForm.HttpAuthorization(Sender: TObject;
Authentication: TIdAuthentication; var Handled: Boolean);
begin
if Authentication is TIdBasicAuthentication then
begin
Authentication.Password := BytesToStringRaw(IndyTextEncoding_UTF8.GetBytes(TheDesiredPasswordHere));
Handled := True;
end;
end;
UPDATE:
Internally, TIdBasicAuthentication uses TIdEncoderMIME.Encode(), but without specifying any encoding.
That last part is no longer true. TIdBasicAuthentication was updated in 2016 to now pass an encoding to TIdEncoderMIME.Encode(). When an HTTP server asks for BASIC authentication, TIdBasicAuthentication now checks if the server's WWW-Authenticate header includes one of the following attributes: charset, accept-charset, encoding, or enc (in that order). If one is found, the specified charset is passed to Encode(), otherwise ISO-8859-1 is used (there is a TODO in the code to use UTF-8 if the username or password contain any characters that do not exist in ISO-8859-1).
If you want to ensure that UTF-8 is used in BASIC authentication, you are better off setting Request.BasicAuthentication to False and using the Request.CustomHeaders to supply your own Authorization header, eg:
Http.Request.BasicAuthentication := False;
Http.Request.CustomHeaders.Values['Authorization'] := 'Basic ' + TIdEncoderMIME.EncodeString(AUser + ':' + APass, IndyTextEncoding_UTF8);
Alternatively, you might be able to just get away with updating the protected TIdBasicAuthentication.FCharset member inside of the TIdHTTP.OnAuthorization event (which is fired after the server's WWW-Authenticate header has been parsed), eg:
Http.Request.Password := APass;
...
type
TIdBasicAuthenticationAccess = class(TIdBasicAuthentication)
end;
procedure TMyForm.HttpAuthorization(Sender: TObject;
Authentication: TIdAuthentication; var Handled: Boolean);
begin
if Authentication is TIdBasicAuthentication then
begin
TIdBasicAuthenticationAccess(Authentication).FCharset := 'utf-8';
Authentication.Password := TheDesiredPasswordHere;
Handled := True;
end;
end;

How to turn an Indy UTF-8 response into a native Delphi (Unicode)String?

Using Indy THTTP I obtain a response that has Content-Type: text/html; charset=UTF-8 and store it in a TStringStream. If I then use ReponseStream.ReadString(ResponseStream.Size), the resulting String is not correctly shown. I bet this is due to the fact that Windows uses UTF-16.
I tried a few things with TEncoding.UTF8 and TEncoding.Convert that only messed up the result even more (started to look Chinese).
Here's the current code:
var
LHTTP: TIdHTTP;
LResponseStream: TStringStream;
LResponse: String;
begin
LResponseStream := TStringStream.Create();
try
LHTTP := TIdHTTP.Create(nil);
try
LHTTP.Get('url', LResponseStream); // Returns 'hęllo'
finally
LHTTP.Free;
end;
LResponseStream.Position := 0;
LResponse := LResponseStream.ReadString(LResponseStream.Size);
ShowMessage(LResponse); // Make me pretty
finally
LResponseStream.Free;
end;
end;
What should I change to get a regular Delphi String...?
TIdHTTP has an overloaded version of Get() that returns a String. It will decode the UTF-8 into UTF-16 for you:
LResponse := LHTTP.Get('url');
If the content you are trying to download is encoded as UTF-8 character set, you could simply force TStringStream to re-encode that data to UTF-8 internally in this way :
LResponseStream := TStringStream.Create('', TEncoding.UTF8);

Delphi. Indy & cyrillic letters

I've been writing some function that downloads source code of specified web page by URL:
function GetWebPage(const url: string): tStringList;
var
idHttp: TidHttp;
begin
Result := tStringList.Create;
idHttp := TidHttp.Create(nil);
// set params
idHttp.Request.UserAgent := 'Mozilla/4.0 (compatible; MSIE 5.5; Windows 98)';
idHttp.Request.AcceptLanguage := 'ru en';
idHttp.Response.KeepAlive := True;
idHttp.HandleRedirects := True;
idHttp.ConnectTimeout := 5000;
idHttp.ReadTimeout := 5000;
try
try
Result.values['responce'] := idHttp.Get(url);
except
Result.values['responce'] := '';
end;
finally
Result.values['code'] := IntToStr(idHttp.ResponseCode);
FreeAndNil(idHttp);
end;
I'ts working perfectly with english URL adresses, when I specify a URL like президент.рф, iside Indy that URL transforms to ?????????.?? - (screen shot of HTTP Analyzer)
I've found this solution for my problem:
idHttp.IOHandler.DefStringEncoding := TEncoding.Ansi;
// also tried - TEncoding.Unicode, TEncoding.UTF8
But it not working - when I try to call my function, I get error:
So, how I can force its function to work with cyrillic adresses?
Thank you.
URLs can only contain ASCII characters in them. You need to pre-format the URL to encode non-ASCII characters before then passing it to TIdHTTP. You can use the TIdURI.URLEncode() method for that purpose, eg:
Result.values['responce'] := idHttp.Get(TIdURI.URLEncode(url));
GetWebPage('http://президент.рф');
UTF-8 is commonly used for URL encodings, so it is the default encoding used by TIdURL, but not all servers use UTF-8, so if you need to use a different encoding then TIdURI.URLEncode() has an optional AByteEncoding parameter for that purpose.
With that said, international resources are better serviced using IRIs instead of URLs, but Indy does not natively support IRIs yet (that will be implemented in Indy 11).

Resources