How can I simulate HTTP requests in different encodings? - ruby-on-rails

I run a Ruby on Rails application and since the site is becoming increasingly popular internationally, I started having errors related to encoding, eg:
Encoding::UndefinedConversionError: "\xE8" from ASCII-8BIT to UTF-8
I'm trying to find an HTTP request simulator that supports various encodings to reproduce the errors, but I'm not having much luck.
Does anyone know how to simulate or test HTTP requests with non UTF8 parameters / path infos?

An encoding is just how your represent your text into bytes. As long as you encode/decode the text using the same encoding, you should be fine. If you encode/decode it using different encodings, it'll interpret the byte sequences differently and result in errors.
Normally, you are in control of the encodings used and the webserver can handle basic conversions.
Browser <--encoding--> server <--encoding--> files
There is normally no need to "find an HTTP request simulator that supports various encodings" since you usually define which one is used on server side, or the webserver handles the conversion.
If there is some strange client using some strange encoding that can't be recognized, I'd say it's either a serious issue in your webserver, your configuration or something similar. ...or in the files themeselves that are not encoded in the same format you use to read them.
...lastly, I believe almost any HTTP client supports many encodings for the body.
EDIT:
Since you mention URLs:
URLs must be encoded using plain old ASCII. Even if you use fancy UTF8 characters, the browser will translate them underneath.
http://en.wikipedia.org/wiki/Percent-encoding
Using a strange encodings for URLs and invalid characters is a client error and should be fixed client side IMHO.

Related

Is Indy FTP client caching?

Looking at corrupted files on FTP server I think about verifying files uploaded with TIdFtp.Put by downloading them just after upload and comparing byte-to byte.
I think that TIdFtp may be theoretically caching data and return it from cache instead of actually downloading.
Please allay or confirm my concerns.
No, there is no caching, as there is no such thing in the FTP protocol in general. TIdFTP deals only with live data.
Are you, perhaps, uploading binary files in ASCII mode? If so, that would alter line break characters (CR and LF) during transmission. That is a common mistake to make, since ASCII is FTP's default mode. Make sure you are setting the TIdFTP.TransferType property as needed before transferring a file. ASCII mode should only be used for text files, if used at all.
And FWIW, you may not need to download a file to verify its bytes. If the server supports any X<Hash> commands (where Hash can be SHA512, SHA256, SHA1, MD5, or CRC) , TIdFTP has VerifyFile() methods to use them. That calculates a hash of a local file and then compares it to a hash calculated by the server for a remote file. No transfer of file data is needed.

Camel file component charset with more than one endpoints

when i have a route that sends data to two different file component endpoints,
where about one EP i don't really care about the encoding but about the other EP, i need to ensure a certain encoding, should i still set the charsetname in both encodings?
I'm asking because a client of ours had a problem in that area. the route receives UTF-8 and and we need to write iso-8859-1 to the file.
And now, after the whole hardware was restarted (after power-outage), we found things like "??" instead of the expected "รค".
Now, by specifying the charsetname on all file producer endpoints, we were able to solve the issue.
My actual question now is:
do you think i can now expect that the problem is solved for good?
Or shouldn't there be a relation and I would be well advised not to lean back until I 100% understand the issue.
Notes that might be helpfull:
in addition, before writing to any of those two file endpoints, we
also do .convertBodyTo(byte[].class, "iso-8859-1")
we use camel 2.16.1
In the end, the problem was not about having two file endpoints in one pipeline.
It was about the JVM's default encoding as written here:
http://camel.465427.n5.nabble.com/Q-about-how-to-help-the-file2-component-to-do-a-better-job-td5783381.html

receiving wm_copydata messages in delphi xe2 - unicode related

I have an application, written in Delphi, which I want to use to open files using the Windows "Open With" option. I could do this perfectly happily in pre-Unicode Delphi versions; Windows puts the filename into a WM_copydata message, so I could fish it out using the CopyDataStruct record. But in the Unicode world, this doesn't work; I only get half the filename in the lpdata buffer (followed by garbage). When I examine the cbdata entry in the CopyDataStruct record, I find it contains the length of the filename, in numbers of characters (plus 1 for the terminator), not (as I would have thought it should) the number of bytes, which is of course now twice the number of characters.
Note that it is not the case that my Delphi code is not reading the rest of the characters in the filename out of lpdata^ - I have looked in lpdata^, and they are not there.
There are many examples on the web (including in StackOverflow) of how to avoid this issue if you are generating the WM_copydata message yourself; my problem is that I am not generating it, I am receiving it from Windows (64-bit Win7 or Win8). Is there something that Delphi could be putting into the application, which I am not seeing, that is converting ANSI strings in lpdata to Unicode before I get at the WM_CopyData message? And if so, how could I disable it (or make it correct the cbdata value)?
Any help would be greatly appreciated.
The system isn't sending the WM_COPYDATA message. One of the apps is doing that. Very likely your own app!
You've probably got code that enforces a single instance. The second instance starts in response to the shell action. It detects an existing app and sends the WM_COPYDATA message. Then the second instance closes. The first instance receives the message and processes it.
The fact that the receiver is a Unicode aware app does not influence the content of the message. The sender determines its content. The system won't magically convert from 8 bit to 16 bit text. How could it? The content is opaque.
So, your next move is to find the code that sends the message and convert it to sending Unicode text.

Emoji icons interrupting REST call

In testing out our API, one of our testers found out that when they insert an emoji icon on their iOS device, it will successfully save to our MongoDB, however when retrieving it they do not get a response. I confirmed this, and our server (Node.js) will get the request, and start to send the data, but (I think) somewhere along the line, the emoji characters "terminiate" the request, or cause it never to finish in the eyes of the iOS client.
Has anyone experienced this? If so what is the best way you've gotten around with dealing with emoji icons. I know one way is to unescape() every string that goes out from Node.js, but it seems like a not-so-clean approach, and also I'd need to decode the text on the client-side.
MongoDB supports utf8, unfortunately the emoji characters are utf8mb4 which many applications and languages don't yet support (including MongoDB). Unescape seems like the best thing to do currently.
Alternatively you could store it as binary, but then you would need to query it differently and wouldn't be able to query with regular expressions (but would retain the native characters).

PHP fails to parse large post variable

I'm trying to pass a rather large post request to php, and when I var_dump $_POST array, one, the most large, variable is missing. (Actually that's base64 encoded binary upload as part of a post request)
Funny thing, that on my development PC exactly same request is parsed correctly, without any missing variables.
I checked out contents of php://input on server and development PC and they are exactly the same, md5 matches. Yet development PC recognizes all variables, and server misses one.
I tried changing many different options in php.ini, and got zero effect.
Maybe someone will point me to the right one.
Here is my php://input (~5 megabytes) http://www.mediafire.com/?lp0uox53vhr35df
It's possible the server is blocking it because of Suhosin extension.
http://www.hardened-php.net/suhosin/configuration.html#suhosin.post.max_value_length
suhosin.post.max_value_length
Type: Integer Default: 65000 Defines the maximum length of a variable
that is registered through a POST request.
This will have to be changed in the php.ini.
Keep in mind that this is different than the Suhosin patch which is common on alot of shared hosts. I don't know if the patch would cause this problem.

Resources