MSXML: XMLHTTP doesn't support charset from header - delphi

I'm using MSXML2_TLB.pas generated from the Microsoft XML type library to call a pretty simple web-service, so I don't need the full XML wrapping and just do this:
var
r:XMLHTTP;
begin
r:=CoXMLHTTP.Create;
r.open('POST',WebServiceURL,false,'','');
r.setRequestHeader('Content-Type','application/xml');
r.send('<soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/" '+
'xmlns:ns="http://somebigcorporation.com/api/ws1/">'+
'<soapenv:Body>'+
//snip: irrelevant to the question
'</soapenv:Body></soapenv:Envelope>');
if r.status<>200 then raise Exception.Create(r.statusText);
Result:=(r.responseXML as DOMDocument).documentElement.firstChild.firstChild.selectSingleNode('importantData');
end;
The webservice responds nicely with status 200 and Content-Type: text/xml; charset=iso-8859-1.
In some cases, documentElement is nil (and the above code throws an access violation). Apparently responseXML exists, but only to offer a parseError object (it says so in the docs), and parseError.reason in these cases is An invalid character was found in text content..
Why is the parser not taking the 'charset' value from the response header? Is there a way to tell the XMLHTTP instance to use the charset value from the response header?
Update: I don't know if it's important, but the response also doesn't start with <?xml and even if it did I have no way to request/modify it so it would have an encoding="iso-8859-1" attribute.

I think you need to set the charset in the request so the result is correct.
r.setRequestHeader('Content-Type', 'application/xml; charset=ISO-8859-1');

Related

striping out the "content-type" from TIdMultiPartFormDataStream

This is kind of related to this post. I am trying to post some form data using TIdHTTP and TIdMultiPartFormDataStream, but when monitoring the communication using Wireshark, each form field gets a content-Type: text/plain attached to it and for some reason the server that I am sending these stuff to does not like it. Is there a way that I can make sure only the name and value is sent?
The Content-Transfer was also being added and I was able to remove that using:
aFieldItem := PostStream.AddFormField(fName, fValue);
aFieldItem.ContentTransfer := '';
but I cannot find any way to get rid of content type.
At this moment the data that is being sent looke like this (in Wireshark)
Boundary: \r\n----------051715151353026\r\n
Encapsulated multipart part: (text/plain)
Content-Disposition: form-data; name="description"\r\n
Content-Type: text/plain\r\n
Line-based text data: text/plain
\r\n
Testing new AW Mobile
and I want it to look like:
Boundary: \r\n------WebKitFormBoundary32hCBG8zkGMBpxqL\r\n
Encapsulated multipart part:
Content-Disposition: form-data; name="description"\r\n
Data (21 bytes)
Data: 0d0a5465737420616e6420747261636520636f6d6d
Length: 21
Thank you
Sam
HTML5 Section 4.10.22.7 alters how RFC 2388 applies to webform submissions:
The parts of the generated multipart/form-data resource that correspond to non-file fields must not have a Content-Type header specified. Their names and values must be encoded using the character encoding selected above (field names in particular do not get converted to a 7-bit safe encoding as suggested in RFC 2388).
This is different from RFC 2388:
As with all multipart MIME types, each part has an optional "Content-Type", which defaults to text/plain.
Your server is clearly expecting the HTML5 behavior.
The Content-Type header on each MIME part added to TIdMultipartFormDataStream is hard-coded and cannot be removed without altering TIdMultipartFormDataStream's source code can be omitted by setting the TIdFormDataField.ContentType property to a space character (not a blank string, like the ContentTransfer property allows):
aFieldItem := PostStream.AddFormField(fName, fValue);
aFieldItem.ContentTransfer := '';
aFieldItem.ContentType := ' '; // <-- here
If you set the ContentType property to a blank string, it will set the Content-Type header to application/octet-stream, but assigning a space character instead has a side effect of omitting the header when the property setter parses the new value.
That being said, I have already made some changes to TIdMultipartFormDataStream to account for this change in webform submission in HTML5, but I have not finalized and released it yet.

SignatureDoesNotMatch error when Content-type is 'text/*' using TAmazonStorageService.UploadObject

Using the following Delphi XE2 (update 4) code:
var
ConInfo: TAmazonConnectionInfo;
RespInfo: TCloudResponseInfo;
Service: TAmazonStorageService;
Content: TBytes;
Headers: TStringList;
begin
ConInfo:=TAmazonConnectionInfo.Create(self);
ConInfo.AccountName:='YOUR ACCOUNT NAME';
ConInfo.AccountKey:='YOUR ACCOUNT KEY';
ConInfo.Protocol:='http';
Service:=TAmazonStorageService.Create(ConInfo);
RespInfo:=TCloudResponseInfo.Create;
SetLength(Content, 128);
FillMemory(#Content[0], 128, Byte('x'));
Headers:=TStringList.Create;
Headers.Values['Content-type']:='text/plain';
if not Service.UploadObject('YOUR BUCKET', 'test.txt', Content, TRUE, nil, Headers, amzbaPrivate, RespInfo) then
ShowMessage('Failed:' + RespInfo.StatusMessage);
I always get an error on the call to UploadObject:
Failed:HTTP/1.1 403 Forbidden - The request signature we calculated
does not match the signature you provided. Check your key and signing
method. (SignatureDoesNotMatch)
This only happens when the Content-type is set to 'text/plain', 'text/html', or text anything. Using exactly the same code, if you just change the content-type to any other content-type, e.g. 'video/3gpp', then it works as expected and without error. The actual content of the object being uploaded isn't relevant and has no bearing on getting the error or not.
I've traced through the Indy code in Delphi, but I'm stumped as to why the text content type always gives this error.
Any ideas?
If you append "; charset=ISO-8859-1" to the Content-Type string, then it works:
Headers.Values['Content-type']:='text/plain; charset=ISO-8859-1';
Stepping through the code I see the Content-Type is being changed in TIdEntityHeaderInfo.SetHeaders (IdHTTPHeaderInfo.pas) which is called from TIdHTTPProtocol.BuildAndSendRequest (IdHTTP.pas).
Ultimately, it looks like the problem is that TIdEntityHeaderInfo.SetContentType (IdHTTPHeaderInfo.pas) is appending a character set to the content type if it is 'text' and it doesn't already have one. It shouldn't be changing the content type in these situations because the content type is part of the string to be signed, so changing it after the signing makes the signature invalid.
I had the very same problem. I also used application/octet-stream as content type, but still had some trouble. Later on, I discovered that bucket names have to be in lowercase (In US Standard Region, Amazon allows to define buckets with Uppercase or mixed case names; however, those buckets are not accessible through the HTTP API (including TAmazonStorageService). Instead of a not found message, I still got the 403 error (unauthenticated user).
However, one I changed the name to all lowercase, it worked fine.
Hope it helps

Microsoft Translator API answers 500 internal server error

I'm trying to use Microsoft's Translator API in my Rails app. Unfortunately and mostly unexpected, the server answers always with an internal server error. I also tried it manually with Poster[1] and I get the same results.
In more detail, what am I doing? I'm creating an XML string which goes into the body of the request. I used the C# Example of the API documentation. Well, and then I'm just invoking the RESTservice.
My code looks like this:
xmlns1 = "http://schemas.datacontract.org/2004/07/Microsoft.MT.Web.Service.V2"
xmlns2 = "http://schemas.microsoft.com/2003/10/Serialization/Arrays"
xml_builder = Nokogiri::XML::Builder.new(:encoding => 'UTF-8') do |xml|
xml.TranslateArrayRequest("xmlns:ms" => xmlns1, "xmlns:arr" => xmlns2) {
xml.AppId token #using temporary token instead of appId
xml.From source
xml.To target
xml.Options {
xml["ms"].ContentType {
xml.text "text/html"
}
}
xml.Texts {
translate.each do |key,val|
xml["arr"].string {
xml.text CGI::unescape(val)
}
end
}
}
end
headers = {
'Content-Type' => 'text/xml'
}
uri = URI.parse(##msTranslatorBase + "/TranslateArray" + "?appId=" + token)
req = Net::HTTP::Post.new(uri.path, headers)
req.body = xml_builder.to_xml
response = Net::HTTP.start(uri.host, uri.port) { |http| http.request(req) }
# [...]
The xml_builder produces something like the following XML. Differently to the example from the API page, I'm defining two namespaces instead of referencing them on the certain tags (mainly because I wanted to reduces the overhead) -- but this doesn't seem to be a problem, when I do it like the docu-example I also get an internal server error.
<?xml version="1.0" encoding="UTF-8"?>
<TranslateArrayRequest xmlns:ms="http://schemas.datacontract.org/2004/07/Microsoft.MT.Web.Service.V2" xmlns:arr="http://schemas.microsoft.com/2003/10/Serialization/Arrays">
<AppId>TX83NVx0MmIxxCzHjPwo2_HgYN7lmWIBqyjruYm7YzCpwnkZL5wtS5oucxqlEFKw9</AppId>
<From>de</From>
<To>en</To>
<Options>
<ms:ContentType>text/html</ms:ContentType>
</Options>
<Texts>
<arr:string>Bitte übersetze diesen Text.</arr:string>
<arr:string>Das hier muss auch noch übersetzt werden.</arr:string>
</Texts>
</TranslateArrayRequest>
Every time I request the service it answers with
#<Net::HTTPInternalServerError 500 The server encountered an error processing the request. Please see the server logs for more details.>
... except I do some unspecified things, like using GET instead of POST, then it answers with something like "method not allowed".
I thought it might be something wrong with the XML stuff, because I can request an AppIdToken and invoke the Translate method without problems. But to me, the XML looks just fine. The documentation states that there is a schema for the expected XML:
The request body is a xml string generated according to the schema specified at http:// api.microsofttranslator.com/v2/Http.svc/help
Unfortunately, I cannot find anything on that.
So now my question(s): Am I doing something wrong? Maybe someone experienced similar situations and can report on solutions or work-arounds?
[1] Poster FF plugin > addons.mozilla.org/en-US/firefox/addon/poster/
Well, after lot's of trial-and-error I think I made it. So in case someone has similar problems, here is how I fixed this:
Apparently, the API is kind of fussy with the incoming XML. But since there is no schema (or at least I couldn't find the one specified in the documentation) it's kind of hard to do it the right way: the ordering of the tags is crucial!
<TranslateArrayRequest>
<AppId/>
<From/>
<Options />
<Texts/>
<To/>
</TranslateArrayRequest>
When the XML has this ordering it works. Otherwise you'll only see the useless internal server error response. Furthermore, I read a couple of times that the API also breaks if the XML contains improper UTF-8. One can force untrusted UTF-8 (e.g. coming from a user form) this way:
ic = Iconv.new('UTF-8//IGNORE', 'UTF-8')
valid_string = ic.iconv(untrusted_string + ' ')[0..-2]

programatically get the file size from a remote file using delphi, before download it

How i can determine the size in bytes of a remote file hosted in the web, before download it, using Delphi?
Thanks In advance.
You could use Indy.
First include IdHTTP.
You can retrieve the size this way:
procedure TFormMain.Button1Click(Sender: TObject);
var
Http: TIdHTTP;
begin
Http := TIdHTTP.Create(nil);
try
Http.Head('http://live.sysinternals.com/ADExplorer.exe');
ShowMessage(IntToStr(Http.Response.ContentLength));
finally
Http.Free;
end;
end;
Short answer: use the HTTP HEAD command, available in the TIdHttp component of Indy Delphi.
Details:
HTTP protocol defines a HEAD method.
9.4 HEAD
The HEAD method is identical to GET
except that the server MUST NOT return
a message-body in the response. The
metainformation contained in the HTTP
headers in response to a HEAD request
SHOULD be identical to the information
sent in response to a GET request.
This method can be used for obtaining
metainformation about the entity
implied by the request without
transferring the entity-body itself.
This method is often used for testing
hypertext links for validity,
accessibility, and recent
modification.
The response to a HEAD request MAY be
cacheable in the sense that the
information contained in the response
MAY be used to update a previously
cached entity from that resource. If
the new field values indicate that the
cached entity differs from the current
entity (as would be indicated by a
change in Content-Length, Content-MD5,
ETag or Last-Modified), then the cache
MUST treat the cache entry as stale.
HEAD Asks for the response identical to the one that would correspond to a GET request, but without the response body, retrieving the complete response headers, without the entire content.
The HTTP response headers retrieved are documented in List of HTTP headers on Wikipedia.
http://en.wikipedia.org/wiki/List_of_HTTP_headers
HTTP Headers form the core of an HTTP request,
and are very important in an HTTP response.
They define various characteristics of the data
that is requested or the data that has been provided.
The headers are separated from the request or
response body by a blank line. HTTP headers
can be near-arbitrary strings, but only some
are commonly understood.
One of the headers that is always present for a valid URL to retrieve a content is
the Content-Length header.
14.13 Content-Length
The Content-Length entity-header field indicates the
size of the entity-body, in decimal number of OCTETs,
sent to the recipient or, in the case of the HEAD method,
the size of the entity-body that would have been sent
had the request been a GET.
Content-Length = "Content-Length" ":" 1*DIGIT
An example is
Content-Length: 3495
Applications SHOULD use this field to indicate the
transfer-length of the message-body, unless this is
prohibited by the rules in section 4.4.
Any Content-Length greater than or equal to zero is a
valid value. Section 4.4 describes how to determine
the length of a message-body if a Content-Length is not given.
Note that the meaning of this field is significantly
different from the corresponding definition in MIME,
where it is an optional field used within the
"message/external-body" content-type. In HTTP,
it SHOULD be sent whenever the message's length
can be determined prior to being transferred,
unless this is prohibited by the rules in section 4.4.
From Delphi, drop a TIdHttp component to your form. And paste the following code in one of your delphi event process methods.
var
url: string; // must contain a fully qualified url
contentLength: integer;
begin
....
contentLength:=0;
try
Idhttp1.Head(url);
contentLength:=idhttp1.response.ContentLength;
except end;
....
Be aware that not ALL servers will return a valid content size for a head request. If the content length = 0, then you will ONLY know if you issue a GET request. For example the HEAD request against the Google logo returns a 0 content-length, however a GET returns the proper length, but also retrieves the image. Some servers will return content-length as the length of the packet following the header.
You can use Synapse to get at this information also. Note that the data is transfered, but the buffer is thrown away. This is a much more reliable method, but at the cost of additional bandwidth.
var
HTTP : tHTTPSend;
begin
HTTP := THTTPSend.Create;
try
HTTP.HTTPMethod('GET',url);
DownloadSize := HTTP.DownloadSize;
finally
HTTP.Free;
end;
end;

Supporting the "Expect: 100-continue" header with ASP.NET MVC

I'm implementing a REST API using ASP.NET MVC, and a little stumbling block has come up in the form of the Expect: 100-continue request header for requests with a post body.
RFC 2616 states that:
Upon receiving a request which
includes an Expect request-header
field with the "100-continue" expectation, an origin server MUST
either respond with 100 (Continue) status and continue to read
from the input stream, or respond with a final status code. The
origin server MUST NOT wait for the request body before sending
the 100 (Continue) response. If it responds with a final status
code, it MAY close the transport connection or it MAY continue
to read and discard the rest of the request. It MUST NOT
perform the requested method if it returns a final status code.
This sounds to me like I need to make two responses to the request, i.e. it needs to immediately send a HTTP 100 Continue response, and then continue reading from the original request stream (i.e. HttpContext.Request.InputStream) without ending the request, and then finally sending the resultant status code (for the sake of argument, lets say it's a 204 No Content result).
So, questions are:
Am I reading the specification right, that I need to make two responses to a request?
How can this be done in ASP.NET MVC?
w.r.t. (2) I have tried using the following code before proceeding to read the input stream...
HttpContext.Response.StatusCode = 100;
HttpContext.Response.Flush();
HttpContext.Response.Clear();
...but when I try to set the final 204 status code I get the error:
System.Web.HttpException: Server cannot set status after HTTP headers have been sent.
The .NET framework by default always sends the expect: 100-continue header for every HTTP 1.1 post. This behavior can be programmatically controlled per request via the System.Net.ServicePoint.Expect100Continue property like so:
HttpWebRequest httpReq = GetHttpWebRequestForPost();
httpReq.ServicePoint.Expect100Continue = false;
It can also be globally controlled programmatically:
System.Net.ServicePointManager.Expect100Continue = false;
...or globally through configuration:
<system.net>
<settings>
<servicePointManager expect100Continue="false"/>
</settings>
</system.net>
Thank you Lance Olson and Phil Haack for this info.
100-continue should be handled by IIS. Is there a reason why you want to do this explicitly?
IIS handles the 100.
That said, no it's not two responses. In HTTP, when the Expect: 100-continue comes in as part of the message headers, the client should be waiting until it receives the response before sending the content.
Because of the way asp.net is architected, you have little control over the output stream. Any data that gets written to the stream is automatically put in a 200 response with chunked encoding whenever you flush, be it that you're in buffered mode or not.
Sadly all this stuff is hidden away in internal methods all over the place, and the result is that if you rely on asp.net, as does MVC, you're pretty much unable to bypass it.
Wait till you try and access the input stream in a non-buffered way. A whole load of pain.
Seb

Resources