Using of non latin characters in the Suave - f#

I want to use non latin symbols in the Suave, for example, cyrillic, but getting the weird result
MCVE
open Suave
open Suave.Filters
open Suave.Operators
open Suave.Successful
let app =
choose [
GET >=> OK "Привет, Мир!"
]
startWebServer defaultConfig app
Result
So, the Q is - how to fix it?

For text only responses you need to set the mine type encoding: >=> setMimeType "text/plain; charset=utf-8"
Set the Content-Type header to the mime type given. Remember that it should include the encoding of your content. So for example, specifying a mimeType value of 'application/json; charset=utf-8' is strongly recommended (but replace 'json' with your own MIME type, of course ;))

That looks like UTF-8 being interpreted as Latin-1. Try adding >=> setMimeType "text/html; charset=utf-8" to your app and see if that makes the browser treat your UTF-8 as actual UTF-8 instead of defaulting to the incorrect Latin-1.

Related

dojo/request/xhr text charset

I'm having troubles getting the correct encoding for a text file with xhr.
xhr(content.getContentUrl(), {
handleAs: "text",
headers: { 'Content-Type': 'text/plain; charset=iso-8859-1' }
}).then(function (data) {
console.log("DATA");
console.log(data); ... );
The data object is a text file that should be with ISO-8859-1 characters, but I get a ? instead of the special character, it's like the response encoding is UTF-8
Example: "PER-RW-C-MC-013,B,ABB, P�rtico 5B. Fundaciones. Memoria de
C�lculo,17/06/2011,23/06/2011,17/06/2011,01/07/2011,24/06/2011,20/07/2011,24/06/2011,19/07/2011,0,PER-RW-C-MC-013-C,PER-RW-C-MC-013-A"
Note: The content.getContentUrl() is a method from IBM filenet API that returns the text file URL in a filenet Repository.
Thanks in advance.
In response to your xhr request, you have code on your server that reads the file into a string and sends back that string as part of the response. This may very well be where the problem arises. See for example here (case of php) for a situation where this happened and a solution is suggested.

Is there a default media type for GET and other methods

Is there any default media type when the query is not specified with any supported media types in RESTCONF ?
No. There is no standard default. This is server implementation dependent, so do not rely on it.
From draft-ietf-netconf-restconf-17, Section 5.3, Message Encoding:
The server MUST support the "Accept" header field and "406 Not
Acceptable" status-line, as defined in [RFC7231]. The response
output content encoding formats that the client will accept are
identified with the Accept header field in the request. If it is not
specified, the request input encoding format SHOULD be used, or the
server MAY choose any supported content encoding format.
If there was no request input, then the default output encoding is
XML or JSON, depending on server preference. File extensions encoded
in the request are not used to identify format encoding.
And from draft-ietf-netconf-restconf-17, Section 7.1, Error Response Message:
The client SHOULD specify the desired encoding(s) for response
messages by specifying the appropriate media-type(s) in the Accept
header. If the client did not specify an Accept header, then the
same structured syntax name suffix used in the request message SHOULD
be used, or the server MAY choose any supported message encoding
format. If there is no request message the server MUST select
"application/yang-data+xml" or "application/yang-data+json",
depending on server preference.
The final RFC stood by the draft, just as #predi said:
On Message Encoding, Section 5.2:
If there was no request input, then the default output encoding is
XML or JSON, depending on server preference. File extensions encoded
in the request are not used to identify format encoding.
And Error Message Response, Section 7.1
If the client did not specify an "Accept" header, then the same
structured syntax name suffix used in the request message SHOULD be
used, or the server MAY choose any supported message-encoding
format. If there is no request message, the server MUST select
"application/yang-data+xml" or "application/yang-data+json",
depending on server preference.

character encoding in Wildfly

What is the default character encoding for sent requests in Wildfly?
Setting the encoding in contentType header of a request would insure that it will be used?
Thanks,
Tiberiu
You are talking about two different encodings
Request encoding: This is the encoding of parameters in the URL for example. By default is UTF-8 but if you want to change it to ISO-8859-1 (for example) it can be done with <http-listener url-charset="ISO-8859-1" .../> in your configuration file under the undertow subsystem.
Content Type encoding: This is the encoding that you are saying your files have and this is controlled by Content-Type http header and the charset parameter. Content-Type: text/html; charset=ISO-8859-1

striping out the "content-type" from TIdMultiPartFormDataStream

This is kind of related to this post. I am trying to post some form data using TIdHTTP and TIdMultiPartFormDataStream, but when monitoring the communication using Wireshark, each form field gets a content-Type: text/plain attached to it and for some reason the server that I am sending these stuff to does not like it. Is there a way that I can make sure only the name and value is sent?
The Content-Transfer was also being added and I was able to remove that using:
aFieldItem := PostStream.AddFormField(fName, fValue);
aFieldItem.ContentTransfer := '';
but I cannot find any way to get rid of content type.
At this moment the data that is being sent looke like this (in Wireshark)
Boundary: \r\n----------051715151353026\r\n
Encapsulated multipart part: (text/plain)
Content-Disposition: form-data; name="description"\r\n
Content-Type: text/plain\r\n
Line-based text data: text/plain
\r\n
Testing new AW Mobile
and I want it to look like:
Boundary: \r\n------WebKitFormBoundary32hCBG8zkGMBpxqL\r\n
Encapsulated multipart part:
Content-Disposition: form-data; name="description"\r\n
Data (21 bytes)
Data: 0d0a5465737420616e6420747261636520636f6d6d
Length: 21
Thank you
Sam
HTML5 Section 4.10.22.7 alters how RFC 2388 applies to webform submissions:
The parts of the generated multipart/form-data resource that correspond to non-file fields must not have a Content-Type header specified. Their names and values must be encoded using the character encoding selected above (field names in particular do not get converted to a 7-bit safe encoding as suggested in RFC 2388).
This is different from RFC 2388:
As with all multipart MIME types, each part has an optional "Content-Type", which defaults to text/plain.
Your server is clearly expecting the HTML5 behavior.
The Content-Type header on each MIME part added to TIdMultipartFormDataStream is hard-coded and cannot be removed without altering TIdMultipartFormDataStream's source code can be omitted by setting the TIdFormDataField.ContentType property to a space character (not a blank string, like the ContentTransfer property allows):
aFieldItem := PostStream.AddFormField(fName, fValue);
aFieldItem.ContentTransfer := '';
aFieldItem.ContentType := ' '; // <-- here
If you set the ContentType property to a blank string, it will set the Content-Type header to application/octet-stream, but assigning a space character instead has a side effect of omitting the header when the property setter parses the new value.
That being said, I have already made some changes to TIdMultipartFormDataStream to account for this change in webform submission in HTML5, but I have not finalized and released it yet.

Passing special characters via apache HTTPClient

I have a servlet which accepts HTML content as part of the request param. The HTML is a localized one which may be a french, spanish etc... content.
I'm also using apache HTTP client to make a request to this servlet for test purpose, which has the following header definition:
HttpClient client = new HttpClient();
PostMethod method = new PostMethod("<URL>");
String html = FileUtils.readFileToString(inputHTMLFile, "UTF-8");
method.addParameter("html", html);
method.addRequestHeader("Accept", "*/*");
method.setRequestHeader("accept-charset", "UTF-8");
Whatever HTML is read has the character encoding utf-8, sample text:
Télécharger un fichier
However when i get the html from the request param that texts becomes T?l?charger un fichier
I went through few links such as http://www.oracle.com/technetwork/articles/javase/httpcharset-142283.html which talks about charset and how normally a browser would encode the special characters. If i were to URLEncode the html with UTF-8 and then decode that with same charset in the servlet i get the HTML as expected.
Is this the only thing i can do to preserve the charsets? Am i missing something?
Thanks.
Now that the issue with the file itself is fixed, try modifying your code as follows:
HttpClient client = new HttpClient();
PostMethod postMethod = new PostMethod("<URL>");
postMethod.getParams().setContentCharset("utf-8"); //The line I added
...
Note that the client needs to decode the request as UTF-8 now. French and Spanish worked correctly because their characters are included in the default ISO-8859-1 charset. Chinese characters are not. If the French and Spanish were decoded correctly on client, the client is decoding the request as ISO-8859-1, and sending UTF-8 could fail.
So you could try also adding this:
postMethod.setRequestheader("Content-Type", "application/x-www-form-url-encoded; charset=utf-8");
Just try this for post method.
HttpPost request = new HttpPost(webServiceUrl);
StringEntity str = new StringEntity(YourData);
str.setContentType("application/json");
HttpPost.setEntity(new StringEntity(str, HTTP.UTF_8));
You should better change string to base64 encoded and then send.
I think I've found the cause by examining EntityBuilder decompiled code: the EntityBuilder ignores the contentEncoding field regarding the parameters, it uses the one from contentType field. And by looking on org.apache.http.entity.ContentType the only one predefined value having UTF-8 is org.apache.http.entity.ContentType.APPLICATION_JSON.
So in my case
HttpPost method = new HttPost("<URL>");
EntityBuilder builder = EntityBuilder.create();
builder.setContentType(ContentType.APPLICATION_JSON);
builder.setContentEncoding(StandardCharsets.UTF_8.name());
...
method.setEntity(builder.build());
did the job (although I think setting contentType is redundant here).
I'm using httpclient-osgi version 4.5.4.
PostMethod method = new PostMethod("URL");
method.setRequestHeader("Content-Type", "application/x-www-form-urlencoded; charset=UTF-8");

Resources