Using MSXML2.ServerXMLHTTP to access data from a web page returns truncated data in Lua - lua

I am trying to download a source code file from a web site which works fine for small files, but a couple of larger ones get truncated.
The example below should be returning a file 146,135 bytes in size, but returns one of 141,194 bytes with a status of 200.
I have tried winhttp.winhttprequest.5.1 as well, but both seem to truncate at the same point.
I have also found quite a few people with similar problems, but have not been able to find a solution.
require('luacom')
http = luacom.CreateObject('MSXML2.ServerXMLHTTP')
http:Open("GET","http://www.family-historian.co.uk/wp-content/plugins/forced-download2/download.php?path=/wp-content/uploads/formidable/tatewise/&file=Map-Life-Facts3.fh_lua&id=190",true)
http:Send()
http:WaitForResponse(30)
print('Status: '..http.Status)
print('----------------------------------------------------------------')
headers = http:GetAllResponseHeaders()
data = http.Responsetext
print('Data Size = '..#data)
print('----------------------------------------------------------------')
print(headers)

I finally worked out what was going on so will post it here for others.
To avoid the truncation I needed to use ResponseBody and not ResponseText, what appears to be happening is the file is being sent in binary format, the ResponseText data is the same number of bytes as the ResponseBody one, but is in UTF-8 format, this means the number if special characters in the file (which are double byte in UTF-8 are dropped from the end of the ResponseText. I am not sure at what level the "mistake" in the length is made, but the way to avoid it is to use ResponseBody.

Related

Django silently discarding uploaded files with long paths

I am having an issue where Django Rest Framework appears to be silently discarding uploaded files with long paths.
Here is my view class and post method:
class UploadMediaViewSet(viewsets.ViewSet):
parser_classes = [parser.MultiPartParser]
# POST /api/upload/media/
def create(self, request):
LOG.info(f"************** request.FILES = {request.FILES}")
The form data that is sent is as follows:
------WebKitFormBoundaryBEDAIwXzG6Ik2xVY
Content-Disposition: form-data; name="transactionId"
804d4146-0947-4d96-90b5-8ffbbc0b2135
------WebKitFormBoundaryBEDAIwXzG6Ik2xVY
Content-Disposition: form-data; name="oOJGp433ODZvBOZTCXNz1oO7ogG0j3BRRBo98jpx1iIlvMPeNoc8nBKvpoTjx9PsOl5ulGGWniur3TdbDSd9TpgsnWhhqurcQO3TnssSQNHWti7xm7nZGW6tFRtrjrvwoJm9Bds5AsMcNKxT7oBkzA35fA1fgo5jkiUAfHHiduMdGIYf3NJGk8LP54JAORfYEK05mdHdQ4zfpMKfDUNJLnc5tk3H/AndroidLandscape.mp4"; filename="oOJGp433ODZvBOZTCXNz1oO7ogG0j3BRRBo98jpx1iIlvMPeNoc8nBKvpoTjx9PsOl5ulGGWniur3TdbDSd9TpgsnWhhqurcQO3TnssSQNHWti7xm7nZGW6tFRtrjrvwoJm9Bds5AsMcNKxT7oBkzA35fA1fgo5jkiUAfHHiduMdGIYf3NJGk8LP54JAORfYEK05mdHdQ4zfpMKfDUNJLnc5tk3H/AndroidLandscape.mp4"
Content-Type: video/mp4
------WebKitFormBoundaryBEDAIwXzG6Ik2xVY
Content-Disposition: form-data; name="oOJGp433ODZvBOZTCXNz1oO7ogG0j3BRRBo98jpx1iIlvMPeNoc8nBKvpoTjx9PsOl5ulGGWniur3TdbDSd9TpgsnWhhqurcQO3TnssSQNHWti7xm7nZGW6tFRtrjrvwoJm9Bds5AsMcNKxT7oBkzA35fA1fgo5jkiUAfHHiduMdGIYf3NJGk8LP54JAORfYEK05mdHdQ4zfpMKfDUNJLnc5tk3H/Yym32tTMGQAfAMVGFTUJA1z9zQB3YremlDV1Hluotwj21UZWP9Aop6QTPvUMVIZVS8Hk6gADadVu4TihPloTy5N7JX99SgPqf3JZILRSMtEMCXLeT4gw34aq5e0HfxetOlKHTx6m2uS1SLFHi8OvcujtWEIAlTfXQW5pvsFGMJYOwNwWjncOoZETXaTs1LspDUHchPEHypp4CHEM5Y3e5HhsKBkA9cFJs6oA26XQW7y/AndroidPortrait.mp4"; filename="oOJGp433ODZvBOZTCXNz1oO7ogG0j3BRRBo98jpx1iIlvMPeNoc8nBKvpoTjx9PsOl5ulGGWniur3TdbDSd9TpgsnWhhqurcQO3TnssSQNHWti7xm7nZGW6tFRtrjrvwoJm9Bds5AsMcNKxT7oBkzA35fA1fgo5jkiUAfHHiduMdGIYf3NJGk8LP54JAORfYEK05mdHdQ4zfpMKfDUNJLnc5tk3H/Yym32tTMGQAfAMVGFTUJA1z9zQB3YremlDV1Hluotwj21UZWP9Aop6QTPvUMVIZVS8Hk6gADadVu4TihPloTy5N7JX99SgPqf3JZILRSMtEMCXLeT4gw34aq5e0HfxetOlKHTx6m2uS1SLFHi8OvcujtWEIAlTfXQW5pvsFGMJYOwNwWjncOoZETXaTs1LspDUHchPEHypp4CHEM5Y3e5HhsKBkA9cFJs6oA26XQW7y/AndroidPortrait.mp4"
Content-Type: video/mp4
------WebKitFormBoundaryBEDAIwXzG6Ik2xVY--
When my create() method receives the request, I find that request.FILES contains only the first file (AndroidLandscape.mp4). The second file (AndroidPortrait.mp4) seems to be silently discarded.
I suspect that this is being done by parser.MultiPartParser, but I'm not sure.
Is it being discarded because the path is too long?
(Update: I did some testing, and 470 characters seems to be the magic path length limit. If the path is 471 characters or longer, the file is NOT included in request.FILES)
If upload paths cannot be that long, I can accept that, but I need to detect that this has happened so that I can return an appropriate error response to the client, instead of silently discarding files. If so, how can I detect that in my method?
I finally found out why this is happening. The Django Rest Framework's multi-part form header parser has the maximum header length hard coded at 1024 bytes. With a long file path, the size of the header is too long, and the MultiPartParser stops reading from the stream after 1024 bytes. This results in the header being invalid, and the file is discarded. Unfortunately, this "overflow" is silently swallowed, and thus the file is silently discarded, leaving no way for the developer's code to even know that it happened, or that the file was even attempted to be uploaded.
I was able to implement a working solution by subclassing/overriding the affected classes, copy/pasting the library code for the affected methods, and finally changing the hard coded 1024 byte limit to a higher number.
It's not a great solution, because patching library code is brittle, and the solution could cause conflicts in future versions of DRF, but that's the only solution that I see at this point.
If anyone wants to implement this solution, the code with the hard coded limit is this:
in .../site-packages/django/http/multipartparser.py:
class Parser:
def __init__(self, stream, boundary):
self._stream = stream
self._separator = b'--' + boundary
def __iter__(self):
boundarystream = InterBoundaryIter(self._stream, self._separator)
for sub_stream in boundarystream:
# Iterate over each part
yield parse_boundary_stream(sub_stream, 1024)
The 1024 in the last line must be increased to the desired max header length.

How to replace these extended ascii codes?

I am opening up .txt files but when they are loaded on Xojo weird characters like these (’ , â€ک) show up.
I've tried DefineEncoding and ConvertEncoding but it still doesn't seem to work.
output.text = output.text.DefineEncoding(Encodings.WindowsANSI)
output.text = output.text.ConvertEncoding(Encodings.UTF8)
You may have to define the encoding already at time of loading, not afterwards, or you'll get UTF8 chara from loading that you will then mess up with your posted code. So, pass the encoding to the Read function or load the data as a binary file, not as a text file.

Unicode representation in iOS

I'm saving a file with unicode(korean) in its name and I'm storing the name of the file in memory for bookkeeping in my app. The file is saved fine, what's bothering me is the way the name is given back to me by the OS.
IF I make a fcntl(fd, F_GETPATH, charArray) call on the file's fd, the filename being returned is different to the file name returned by listing out its directory contents. I did some research on the filenames returned in the two cases and found out that the format in the initial case is Hangul Symbols(length of filename in bytes:18) and in the later case it is Hangul Jamo(length in bytes:36). IOS seamlessly works with the two formats, if I do a localizedcompare on the two names being returned it'll return a 1.
When I'm doing the bookkeeping I store the path to the file(including its name) and the length of this path. I'll do a quick compare on these two attributes when a request comes in and return a handle to the file only if they match. The problem now is when the file is being stored the fcntl call gives me the path in Hangul symbols and when the user requests it back the run time gives me the file path in hangul jamo. As I've stored the path in Hangul symbols, the app'll think its a different file that's somehow not created by the user and returns an 'invalid file' popup.
Visually the korean text looks the same in both the encoding schemes, the only difference is in the byte representation.
다른 문서.docx - Hangul Symbols - returned during file creation by FD
다른 문서.docx - Hangul Jamo - returned by the OS in runtime and also if I list the directory contents.
char *fileName1="다른 문서.docx"; //Hangul Symbols
NSLog(#"fileName3:%s length:%lu", fileName3, strlen(fileName3));
char *fileName2="다른 문서.docx"; //Hangul Jamo
NSLog(#"fileName4:%s length:%lu", fileName4, strlen(fileName4));
If you run the above code you could see the names are different in their memory footprint. Any idea on how/why iOS is changing the filename at run time from one scheme to another? and also if someone could explain how localizedcompare is returning 1 in both the cases would be great.

Mobile Safari makes multiple video requests

I am designing a web application for iPad which makes use of HTML5 in mobile safari. I am transmitting the file manually through an ASP.NET .ashx file hosted on IIS 7 running .NET Framework v2.0.
The essential code looks partly like this:
// If we receive range header only transmit partial file
if (context.Request.Headers["Range"] != null)
{
var fi = new FileInfo(filePath);
long fileSize = fi.Length;
// Read start/end index
string headerRange = context.Request.Headers["Range"].Replace("bytes=", "");
string[] range = headerRange.Split('-');
int startIndex = Convert.ToInt32(range[0]);
int endIndex = Convert.ToInt32(range[1]);
// Add header Content-Range,Last-Modified
context.Response.StatusCode = (int)HttpStatusCode.PartialContent;
context.Response.AddHeader(HttpWorkerRequest.GetKnownResponseHeaderName(HttpWorkerRequest.HeaderContentRange), String.Format("bytes {0}-{1}/{2}", startIndex, endIndex, fileSize));
context.Response.AddHeader(HttpWorkerRequest.GetKnownResponseHeaderName(HttpWorkerRequest.HeaderLastModified), String.Format("{0:r}", fi.CreationTime));
long length = (endIndex - startIndex) + 1;
context.Response.TransmitFile(filePath, startIndex, length);
}
else
context.Response.TransmitFile(filePath);
Now what confuses me to no end is the the protocols for requesting that safari seems to use. From proxying the requests through fiddler i get the following for an aprox 2MB file.
NOTE: When requesting an mp4 file, directly served through IIS 7, the protocol and amount of request are the same
First it requests 2 bytes which allows it to read the 'Content-Range' header.
Now it request the entire content (?)
-
It proceeds to do step 1. & 2. again (??)
-
It now requests only parts of the file (???)
If the file is larger the last steps will be many more. I have tested up to 99 request where each request contains a part of the file equally split. This makes sense and is what would be expected I think. What I cannot make sense of is why it makes 2 initial request for the first 2 bytes as well as 2 requests for the entire file before it finally requests the file in different parts.
As I conclude this results in the file being downloaded between 2 - 3 times, depending on the length of the file and whether the user watches it long enough.
Can anybody make sense of this behavior and maybe explain what I can do to prevent multiple downloads. Thanks.
Per my comment to your question, I've had a similar issue in the past. One thing you could try if you have control of the server (I did not) is to disable either gzip or identity encoding of the file. I believe that in the first request for the entire content (#2 in your list) it asks for the content with gzip encoding (compressed). Perhaps you can configure your IIS to not to serve the file for a gzip-encoding request.
Here is my original (unanswered) question on the subject:
https://stackoverflow.com/questions/4855485/mpmovieplayercontroller-not-playing-full-length-mp3

What's the difference between the four File Results in ASP.NET MVC

ASP.NET has four different types of file results:
FileContentResult: Sends the contents of a binary file to the response.
FilePathResult: Sends the contents of a file to the response
FileResult: Returns binary output to write to the response
FileStreamResult: Sends binary content to the response by using a Stream instance
Those descriptions are take from MSDN and with the exception of the FileStreamResult the first three sound identical. So what is the difference between them?
FileResult is an abstract base class for all the others.
FileContentResult - you use it when you have a byte array you would like to return as a file
FilePathResult - when you have a file on disk and would like to return its content (you give a path)
FileStreamResult - you have a stream open, you want to return its content as a file
However, you'll rarely have to use these classes - you can just use one of Controller.File overloads and let ASP.NET MVC do the magic for you.
Great question...and deserves more details. I find myself here as a result of an interesting situation. We were delivering some pdf attachments via the MVC3/C# environment. Our code got released and we started getting some responses from our clients that the downloads were behaving strangely when they were using Chrome and the file type was being converted over to 'pdf-, attachment.pdf-, attachment'. Yup...you got it...the whole thing. So, one could rewrite it to just be 'pdf' and the file would still save intact, but what a mess!
So, to describe the initial situation, we were setting the 'Content-Disposition' header then returning a FileContentResult...
var cd = new System.Net.Mime.ContentDisposition
{
FileName = result.Attachment.FileName,
Inline = false
};
Response.AppendHeader("Content-Disposition", cd.ToString());
return File(result.Attachment.Data, MimeExtensionHelper.GetMimeType(result.Attachment.FileName), result.Attachment.FileName);
Seemed good. Worked fine in IE. So I did some research and tried implementing FileStreamResult instead (keeping the Content-Disposition setter):
MemoryStream dataStream = new MemoryStream();
dataStream.Write(result.Attachment.Data, 0, result.Attachment.Data.Length);
dataStream.Position = 0;
return new FileStreamResult(dataStream, MimeExtensionHelper.GetMimeType(result.Attachment.FileName));
It fixed the issue in Chrome! Hmmm...but why in the heck should I have to take my perfectly good byte array and stream it and then return it via this to get the file name to work right?
Then came the Fiddler.
With FileContentResult, I got 2 Content-Dispositions in the header.
With FileStreamResult, I got 1.
FileContentResult appends a Content-Disposition header when providing the File Name and Chrome considers multiples of this header as an error.
Odd reaction...but definitely one that's good to know.

Resources