I am trying to use Indy to serve Javascript (deploying a Swagger UI to render API documentation).
procedure TfmMain.SendJavaScriptFileResponse(AResponseInfo: TIdHTTPResponseInfo; AFileName: String);
begin
AResponseInfo.ContentType := 'application/javascript';
AResponseInfo.CharSet := 'utf-8';
var LFileContents := TStringList.Create;
try
LFileContents.LoadFromFile(AFileName);
AResponseInfo.ContentText := LFileContents.Text;
finally
LFileContents.Free;
end;
end;
When the browser receives the Javascript and attempts to run it, I get a syntax error:
Uncaught SyntaxError: illegal character U+20AC
The respoinsde headers received from the Indy IdHttpServer look like so:
HTTP/1.1 200 OK
Connection: close
Content-Encoding: utf-8
Content-Type: application/javascript; charset=utf-8
Content-Length: 1063786
Date: Sun, 05 Feb 2023 20:45:56 GMT
However, when I serve the exact same Javascript files via my hosted website, the Javascript runs fine in the browser with no errors.
Is there a setting or character set I need to use when sending Javascript files using the Indy HTTP server?
You are loading the Javascript from a file into a string, and then you are sending that string to the client. That requires 2 data conversions at runtime - from the file's encoding to UTF-16 in memory, and from UTF-16 to the specified AResponseInfo.Charset on the data transmission to the client. Either one of those conversions can fail if you are not careful.
In memory, a string in Delphi 2009+ is always UTF-16 encoded, but you are not specifying the file's encoding when loading the file into the TStringList. So, if the file uses an encoding other than ASCII (say, UTF-8), does not have a BOM, and contains any non-ASCII characters (say, the Euro sign €), then TStringList WILL NOT decode the file into UTF-16 correctly. In which case, you MUST specify the file's actual encoding, eg:
procedure TfmMain.SendJavaScriptFileResponse(
AResponseInfo: TIdHTTPResponseInfo;
const AFileName: String);
begin
AResponseInfo.ContentType := 'application/javascript';
AResponseInfo.CharSet := 'utf-8';
var LFileContents := TStringList.Create;
try
LFileContents.LoadFromFile(AFileName, TEncoding.UTF8); // <-- HERE
AResponseInfo.ContentText := LFileContents.Text;
finally
LFileContents.Free;
end;
end;
Another option is to send the actual file itself, without having to load and decode it into memory first, eg:
procedure TfmMain.SendJavaScriptFileResponse(
AContext: TIdContext;
AResponseInfo: TIdHTTPResponseInfo;
const AFileName: String);
begin
AResponseInfo.ContentType := 'application/javascript';
AResponseInfo.CharSet := 'utf-8';
AResponseInfo.ServeFile(AContext, AFileName);
end;
Either way, utf-8 is not a valid value for the HTTP Content-Encoding header. Indy does not assign any value to that header by default, so you must be assigning it manually. Don't do that in this case.
Related
I have a web server based on TIdHTTPServer. It is built in Delphi Sydney. From a webpage I'm receiving following multipart/form-data post stream:
-----------------------------16857441221270830881532229640
Content-Disposition: form-data; name="d"
83AAAFUaVVs4Q07z
-----------------------------16857441221270830881532229640
Content-Disposition: form-data; name="dir"
Upload
-----------------------------16857441221270830881532229640
Content-Disposition: form-data; name="file_name"; filename="česká tečka.png"
Content-Type: image/png
PNG_DATA
-----------------------------16857441221270830881532229640--
Problem is that text parts are not received correctly. I read the Indy MIME decoding of Multipart/Form-Data Requests returns trailing CR/LF and changed transfer encoding to 8bit which helps to receive file correctly, but received file name is still wrong (dir should be Upload and filename should be česká tečka.png).
d=83AAAFUaVVs4Q07z
dir=UploadW
??esk?? te??ka.png 75
To demonstrate the issue I simplified my code to a console app (please note that the MIME.txt file contains the same as is in post stream above):
program MIMEMultiPartTest;
{$APPTYPE CONSOLE}
{$R *.res}
uses
System.Classes, System.SysUtils,
IdGlobal, IdCoder, IdMessage, IdMessageCoder, IdGlobalProtocols, IdCoderMIME, IdMessageCoderMIME,
IdCoderQuotedPrintable, IdCoderBinHex4;
procedure ProcessAttachmentPart(var Decoder: TIdMessageDecoder; var MsgEnd: Boolean);
var
MS: TMemoryStream;
Name: string;
Value: string;
NewDecoder: TIdMessageDecoder;
begin
MS := TMemoryStream.Create;
try
// http://stackoverflow.com/questions/27257577/indy-mime-decoding-of-multipart-form-data-requests-returns-trailing-cr-lf
TIdMessageDecoderMIME(Decoder).Headers.Values['Content-Transfer-Encoding'] := '8bit';
TIdMessageDecoderMIME(Decoder).BodyEncoded := False;
NewDecoder := Decoder.ReadBody(MS, MsgEnd);
MS.Position := 0; // nutne?
if Decoder.Filename <> EmptyStr then // je to atachment
begin
try
Writeln(Decoder.Filename + ' ' + IntToStr(MS.Size));
except
FreeAndNil(NewDecoder);
Writeln('Error processing MIME');
end;
end
else // je to parametr
begin
Name := ExtractHeaderSubItem(Decoder.Headers.Text, 'name', QuoteHTTP);
if Name <> EmptyStr then
begin
Value := string(PAnsiChar(MS.Memory));
try
Writeln(Name + '=' + Value);
except
FreeAndNil(NewDecoder);
Writeln('Error processing MIME');
end;
end;
end;
Decoder.Free;
Decoder := NewDecoder;
finally
MS.Free;
end;
end;
function ProcessMultiPart(const ContentType: string; Stream: TStream): Boolean;
var
Boundary: string;
BoundaryStart: string;
BoundaryEnd: string;
Decoder: TIdMessageDecoder;
Line: string;
BoundaryFound: Boolean;
IsStartBoundary: Boolean;
MsgEnd: Boolean;
begin
Result := False;
Boundary := ExtractHeaderSubItem('multipart/form-data; boundary=---------------------------16857441221270830881532229640', 'boundary', QuoteHTTP);
if Boundary <> EmptyStr then
begin
BoundaryStart := '--' + Boundary;
BoundaryEnd := BoundaryStart + '--';
Decoder := TIdMessageDecoderMIME.Create(nil);
try
TIdMessageDecoderMIME(Decoder).MIMEBoundary := Boundary;
Decoder.SourceStream := Stream;
Decoder.FreeSourceStream := False;
BoundaryFound := False;
IsStartBoundary := False;
repeat
Line := ReadLnFromStream(Stream, -1, True);
if Line = BoundaryStart then
begin
BoundaryFound := True;
IsStartBoundary := True;
end
else
begin
if Line = BoundaryEnd then
BoundaryFound := True;
end;
until BoundaryFound;
if BoundaryFound and IsStartBoundary then
begin
MsgEnd := False;
repeat
TIdMessageDecoderMIME(Decoder).MIMEBoundary := Boundary;
Decoder.SourceStream := Stream;
Decoder.FreeSourceStream := False;
Decoder.ReadHeader;
case Decoder.PartType of
mcptText,
mcptAttachment:
begin
ProcessAttachmentPart(Decoder, MsgEnd);
end;
mcptIgnore:
begin
Decoder.Free;
Decoder := TIdMessageDecoderMIME.Create(nil);
end;
mcptEOF:
begin
Decoder.Free;
MsgEnd := True;
end;
end;
until (Decoder = nil) or MsgEnd;
Result := True;
end
finally
Decoder.Free;
end;
end;
end;
var
Stream: TMemoryStream;
begin
Stream := TMemoryStream.Create;
try
Stream.LoadFromFile('MIME.txt');
ProcessMultiPart('multipart/form-data; boundary=---------------------------16857441221270830881532229640', Stream);
finally
Stream.Free;
end;
Readln;
end.
Could someone help me what is wrong with my code? Thank you.
Your call to ExtractHeaderSubItem() in ProcessMultiPart() is wrong, it needs to pass in the ContentType string parameter, not a hard-coded string literal.
Your call to ExtractHeaderSubItem() in ProcessAttachmentPart() is also wrong, it needs to pass in only the content of just the Content-Disposition header, not the entire Headers.Text. ExtractHeaderSubItem() is designed to only operate on 1 header at a time.
Regarding the dir MIME part, the reason the body data ends up as 'UploadW' instead of 'Upload' is because you are not taking MS.Size into account when assigning MS.Memory to your Value string. The TMemoryStream data is NOT null-terminated! So, you will need to use SetString() instead of the := operator, eg:
var
Value: AnsiString;
...
SetString(Value, PAnsiChar(MS.Memory), MS.Size);
Regarding the Decoder.FileName, that value is not affected by the Content-Transfer-Encoding header at all. MIME headers simply do not allow unencoded Unicode characters. Currently, Indy's MIME decoder supports RFC2047-style encodings for Unicode characters in headers, per RFC 7578 Section 5.1.3, but your stream data is not using that format. It looks like your data is using raw UTF-8 octets 1 (which 5.1.3 also mentions as a possible encoding, but the decoder does not currently look for). So, you may have to manually extract and decode the original filename yourself as needed. If you know the filename will always be encoded as UTF-8, you could try setting Indy's global IdGlobal.GIdDefaultTextEncoding variable to encUTF8 (it defaults to encASCII), and then the Decoder.FileName should be accurate. But, that is a global setting, so may have unwanted side effects elsewhere in Indy, depending on context and data. So, I would suggest setting GIdDefaultTextEncoding to enc8Bit instead, so that unwanted side effects are minimized, and the Decoder.FileName will contain the original raw bytes as-is (just extended to 16-bit chars). That way, you can recover the original filename bytes by simply passing the Decoder.FileName as-is to IndyTextEncoding_8Bit.GetBytes(), and then decode them as needed (such as with IndyTextEncoding_UTF8.GetString(), after validating the bytes are valid UTF-8).
1: However, ÄŤeská teÄŤka.png is not the correct UTF-8 form of česká tečka.png, it looks like that data may have been double-encoded, ie česká tečka.png was UTF-8 encoded, and then the resulting bytes were UTF-8 encoded again
Nowadays the filename parameter should only be added for fallback reasons, while filename* should be added to clearly tell which text encoding the filename has. Otherwise each client only guesses and supposes. Which may go wrong.
RFC 5987 §3.2 defines the format of that filename* parameter:
charset ' [ language ] ' value-chars
...whereas:
charset can be UTF-8 or ISO-8859-1 or any MIME-charset
...and the language is optional.
RFC 6266 §4.3 defines that filename* should be used and comes up with examples in §5:
Content-Disposition: attachment; filename="EURO rates"; filename*=utf-8''%e2%82%ac%20rates`
Do you spot the asterisk *? Do you spot the text encoding utf-8? Do you spot the two apostrophes '', designating no further specified language (see RFC 5646 § 2.1)? And then come the octets according to the specified text encoding: either percent-encoded, or (if allowed) in plain ASCII.
Other examples:
Content-Disposition: attachment; filename="green.jpg"; filename*=UTF-8''%e3%82%b0%e3%83%aa%e3%83%bc%e3%83%b3.jpg
will present "green.jpg" on older web browsers and "グリーン.jpg" on compliant web browsers.
Content-Disposition: attachment; filename="Gruesse.txt"; filename*=ISO-8859-1''Gr%fc%dfe.txt
will present "Gruesse.txt" on older web browsers and "Grüße.txt" on compliant web browsers.
Content-Disposition: attachment; filename="Hello.png"; filename*=Shift_JIS'en-US'Howdy.png; filename*=EUC-KR'de'Hallo.png
will present "Hello.png" on older web browsers, and "Howdy.png" on compliant web browsers where the preferred language is set to American English, and "Hallo.png" on compliant ones with a preferred language of German (Deutsch). Note that the different text encodings are unbound to percent encoding as long as the octets are within the allowed range (and latin letters are, along with the dot).
From my experiences nobody cares for this nice feature - everybody just shoves UTF-8 into filename, which still violates the standard - no matter how many clients silently support it. Linking How to encode the filename parameter of Content-Disposition header in HTTP? and PHP: RFC-2231 How to encode UTF-8 String as Content-Disposition filename.
i have a problem to access into websites whit utf8 charset, for example when i try to accesso at this www
Click for example
all utf8 characters are not correctly codified.
This is my access routine:
var
Web : TIdHTTP;
Sito : String;
hIOHand : TIdSSLIOHandlerSocketOpenSSL;
begin
Url := TIdURI.URLEncode(Url);
try
Web := TIdHTTP.Create(nil);
hIOHand := TIdSSLIOHandlerSocketOpenSSL.Create(nil);
hIOHand.DefStringEncoding := IndyTextEncoding_UTF8;
hIOHand.SSLOptions.SSLVersions := [sslvTLSv1,sslvTLSv1_1,sslvTLSv1_2,sslvSSLv2,sslvSSLv3,sslvSSLv23];
Web.IOHandler := hIOHand;
Web.Request.CharSet := 'utf-8';
Web.Request.UserAgent := INET_USERAGENT; //Custom user agent string
Web.RedirectMaximum := INET_REDIRECT_MAX; //Maximum redirects
Web.HandleRedirects := INET_REDIRECT_MAX <> 0; //Handle redirects
Web.ReadTimeOut := INET_TIMEOUT_SECS * 1000; //Read timeout msec
try
Sito := Web.Get(Url);
Web.Disconnect;
except
on e : exception do
Sito := 'ERR: ' +Url+#32+e.Message;
end;
finally
Web.Free;
hIOHand.Free;
end;
I try all solution but in the Sito var i find alltime wrong characthers, for example correct value of the "name" is
"name": "Aire d'adhésion du Parc national du Mercantour",
but after the Get instruction i have
"name": "Aire d'adhésion du Parc national du Mercantour",
Do you have idea where is my error?
Thankyou all!
In Delphi 2009+, which includes XE6, string is a UTF-16 encoded UnicodeString.
You are using the overloaded version of TIdHTTP.Get() that returns a string. It decodes the sent text to UTF-16 using whatever charset is reported by the response. If the text is not decoding properly, it likely means the response is not reporting a correct charset. If the wrong charset is used, the text will not decode properly.
The URL in question is, in fact, sending a response Content-Type header that is set to application/json without specifying a charset at all. The default charset for application/json is UTF-8, but Indy does not know that, so it ends up using its own internal default instead, which is not UTF-8. That is why the text is not decoding properly when non-ASCII characters are present.
In which case, if you KNOW the charset will always be UTF-8, you have a few workarounds to choose from:
you can set Indy's default charset to UTF-8 by setting the global GIdDefaultTextEncoding variable in the IdGlobal unit:
GIdDefaultTextEncoding := encUTF8;
you can use the TIdHTTP.OnHeadersAvailable event to change the TIdHTTP.Response.Charset property to 'utf-8' if it is blank or incorrect.
Web.OnHeadersAvailable := CheckResponseCharset;
...
procedure TMyClass.CheckResponseCharset(Sender: TObject; AHeaders: TIdHeaderList; var VContinue: Boolean);
var
Response: TIdHTTPResponse;
begin
Response := TIdHTTP(Sender).Response;
if IsHeaderMediaType(Response.ContentType, 'application/json') and (Response.Charset = '') then
Response.Charset := 'utf-8';
VContinue := True;
end;
you can use the other overloaded version of TIdHTTP.Get() that fills an output TStream instead of returning a string. Using a TMemoryStream or TStringStream, you can decode the raw bytes yourself using UTF-8:
MStrm := TMemoryStream.Create;
try
Web.Get(Url, MStrm);
MStrm.Position := 0;
Sito := ReadStringFromStream(MStrm, IndyTextEncoding_UTF8);
finally
SStrm.Free;
end;
SStrm := TStringStream.Create('', TEncoding.UTF8);
try
Web.Get(Url, SStrm);
Sito := SStrm.DataString;
finally
SStrm.Free;
end;
I am testing a localhost server using TIdHTTPServer and TIdHTTP. I am having problems with encoding UTF8 data.
client side:
procedure TForm1.SpeedButton1Click(Sender: TObject);
var
res: string;
begin
res:=IdHTTP1.Get('http://localhost/?msg=đi chơi thôi');
Memo1.Lines.Add(res);
end;
Server side:
procedure TForm1.OnCommandGet(AContext: TIdContext;
ARequestInfo: TIdHTTPRequestInfo; AResponseInfo: TIdHTTPResponseInfo);
begin
Memo1.Lines.Add(ARequestInfo.Params.Values['msg']); // ?i ch?i th?i
AResponseInfo.CharSet := 'utf-8';
AResponseInfo.ContentText := 'chào các bạn'; // chào các b?n
end;
I want to send đi chơi thôi and receive chào các bạn. But the server receives ?i ch?i th?i and the client receives chào các b?n.
Can anyone help me?
TIdHTTP transmits the URL exactly as you give it, but http://localhost/?msg=đi chơi thôi is not a valid URL that can be transmitted as-is, as URLs can only contain ASCII characters. Unreserved ASCII characters can be used as-is, but reserved and non-ASCII characters MUST be charset-encoded into bytes and then those bytes must be url-encoded in %HH format, eg:
IdHTTP1.Get('http://localhost/?msg=%C4%91i%20ch%C6%A1i%20th%C3%B4i');
You must ensure you pass only valid url-encoded URLs to TIdHTTP.
In this example, the URL is hard-coded, but if you need something more dynamic then use the TIdURI class, eg:
IdHTTP1.Get('http://localhost/?msg=' + TIdURI.ParamsEncode('đi chơi thôi'));
TIdHTTPServer will then decode the parameter data as you are expecting. Both TIdURI and TIdHTTPServer use UTF-8 by default.
When sending a response, you are only setting a CharSet, but you are not setting a ContentType. So TIdHTTPServer will set the ContentType to 'text/html; charset=ISO-8859-1', overwriting your CharSet. You need to explicitly set the ContentType yourself so you can specify a custom CharSet, eg:
AResponseInfo.ContentType := 'text/plain';
AResponseInfo.CharSet := 'utf-8';
AResponseInfo.ContentText := 'chào các bạn';
Or:
AResponseInfo.ContentType := 'text/plain; charset=utf-8';
AResponseInfo.ContentText := 'chào các bạn';
On a side note, TIdHTTPServer is a multi-threaded component. The OnCommand... events are fired in the context of a worker thread, not the main UI thread. So accessing Memo1 directly like you are is not thread-safe. You MUST synchronize with the main UI thread in order to access UI controls safely, eg:
procedure TForm1.OnCommandGet(AContext: TIdContext; ARequestInfo: TIdHTTPRequestInfo; AResponseInfo: TIdHTTPResponseInfo);
var
msg: string;
begin
msg := ARequestInfo.Params.Values['msg'];
TThread.Synchronize(nil,
procedure
begin
Memo1.Lines.Add(msg);
end
);
...
end;
I have been using the Synapse library to download files from the internet, but I have recently converted my application to use INDY instead and I am missing one of the nicer features in the Synapse library which is the ability to easily get the Mime-Type of a file that I was downloading from a server before saving it to my local machine. Does INDY have this feature and if so how do I go about accessing it?
You can issue an HTTP HEAD request and check the Content-Type header. Before you actually GET the file (download) :
procedure TForm1.Button1Click(Sender: TObject);
var
Url: string;
Http: TIdHTTP;
begin
Url := 'http://yoursite.com/yourfile.png';
Http := TIdHTTP.Create(nil);
try
Http.Head(Url);
ShowMessage(Http.Response.ContentType); // "image/png"
finally
Http.Free;
end;
end;
The ContentType you receive back depends on the web server implementation and is not guaranteed to be the same on each and every server.
The other option, is to actually GET the file and save it's content to a memory stream such as TMemoryStream (not to a local file). Indy provides an overload:
Http.Get(Url, AStream);
Then you check the Http.Response.ContentType, and Save the stream to file: AStream.SaveToFile.
Not sure about the relevancy here, but note also that Indy can return/guess the mime type of a local file as well (given a file extension). with GetMIMETypeFromFile (uses IdGlobalProtocols). See also here.
Or you can build your function
function GetMIMEType(sFile: TFileName): string;
var aMIMEMap: TIdMIMETable;
begin
aMIMEMap:= TIdMIMETable.Create(true);
try
result:= aMIMEMap.GetFileMIMEType(sFile);
finally
aMIMEMap.Free;
end;
end;
And then call
procedure HTTPServerGet(aThr: TIdPeerThread; reqInf: TIdHTTPRequestInfo;
respInf: TIdHTTPResponseInfo);
var localDoc: string;
ByteSent: Cardinal;
begin
//RespInfo.ContentType:= 'text/HTML';
Writeln(Format('Command %s %s at %-10s received from %s:%d',[ReqInf.Command, ReqInf.Document,
DateTimeToStr(Now),aThr.Connection.socket.binding.PeerIP,
aThr.Connection.socket.binding.PeerPort]));
localDoc:= ExpandFilename(Exepath+'/web'+ReqInf.Document);
RespInf.ContentType:= GetMIMEType(LocalDoc);
if FileExists(localDoc) then begin
ByteSent:= HTTPServer.ServeFile(AThr, RespInf, LocalDoc);
Writeln(Format('Serving file %s (%d bytes/ %d bytes sent) to %s:%d at %s',
[LocalDoc,ByteSent,FileSizeByName(LocalDoc), aThr.Connection.Socket.Binding.PeerIP,
aThr.Connection.Socket.Binding.PeerPort, dateTimeToStr(now)]));
end else begin
RespInf.ResponseNo:= 404; //Not found RFC
RespInf.ContentText:=
'<html><head><title>Sorry WebBox Error</title></head><body><h1>' +
RespInf.ResponseText + '</h1></body></html>';
end;
end;
Using Indy THTTP I obtain a response that has Content-Type: text/html; charset=UTF-8 and store it in a TStringStream. If I then use ReponseStream.ReadString(ResponseStream.Size), the resulting String is not correctly shown. I bet this is due to the fact that Windows uses UTF-16.
I tried a few things with TEncoding.UTF8 and TEncoding.Convert that only messed up the result even more (started to look Chinese).
Here's the current code:
var
LHTTP: TIdHTTP;
LResponseStream: TStringStream;
LResponse: String;
begin
LResponseStream := TStringStream.Create();
try
LHTTP := TIdHTTP.Create(nil);
try
LHTTP.Get('url', LResponseStream); // Returns 'hęllo'
finally
LHTTP.Free;
end;
LResponseStream.Position := 0;
LResponse := LResponseStream.ReadString(LResponseStream.Size);
ShowMessage(LResponse); // Make me pretty
finally
LResponseStream.Free;
end;
end;
What should I change to get a regular Delphi String...?
TIdHTTP has an overloaded version of Get() that returns a String. It will decode the UTF-8 into UTF-16 for you:
LResponse := LHTTP.Get('url');
If the content you are trying to download is encoded as UTF-8 character set, you could simply force TStringStream to re-encode that data to UTF-8 internally in this way :
LResponseStream := TStringStream.Create('', TEncoding.UTF8);