Django silently discarding uploaded files with long paths - parsing

I am having an issue where Django Rest Framework appears to be silently discarding uploaded files with long paths.
Here is my view class and post method:
class UploadMediaViewSet(viewsets.ViewSet):
parser_classes = [parser.MultiPartParser]
# POST /api/upload/media/
def create(self, request):
LOG.info(f"************** request.FILES = {request.FILES}")
The form data that is sent is as follows:
------WebKitFormBoundaryBEDAIwXzG6Ik2xVY
Content-Disposition: form-data; name="transactionId"
804d4146-0947-4d96-90b5-8ffbbc0b2135
------WebKitFormBoundaryBEDAIwXzG6Ik2xVY
Content-Disposition: form-data; name="oOJGp433ODZvBOZTCXNz1oO7ogG0j3BRRBo98jpx1iIlvMPeNoc8nBKvpoTjx9PsOl5ulGGWniur3TdbDSd9TpgsnWhhqurcQO3TnssSQNHWti7xm7nZGW6tFRtrjrvwoJm9Bds5AsMcNKxT7oBkzA35fA1fgo5jkiUAfHHiduMdGIYf3NJGk8LP54JAORfYEK05mdHdQ4zfpMKfDUNJLnc5tk3H/AndroidLandscape.mp4"; filename="oOJGp433ODZvBOZTCXNz1oO7ogG0j3BRRBo98jpx1iIlvMPeNoc8nBKvpoTjx9PsOl5ulGGWniur3TdbDSd9TpgsnWhhqurcQO3TnssSQNHWti7xm7nZGW6tFRtrjrvwoJm9Bds5AsMcNKxT7oBkzA35fA1fgo5jkiUAfHHiduMdGIYf3NJGk8LP54JAORfYEK05mdHdQ4zfpMKfDUNJLnc5tk3H/AndroidLandscape.mp4"
Content-Type: video/mp4
------WebKitFormBoundaryBEDAIwXzG6Ik2xVY
Content-Disposition: form-data; name="oOJGp433ODZvBOZTCXNz1oO7ogG0j3BRRBo98jpx1iIlvMPeNoc8nBKvpoTjx9PsOl5ulGGWniur3TdbDSd9TpgsnWhhqurcQO3TnssSQNHWti7xm7nZGW6tFRtrjrvwoJm9Bds5AsMcNKxT7oBkzA35fA1fgo5jkiUAfHHiduMdGIYf3NJGk8LP54JAORfYEK05mdHdQ4zfpMKfDUNJLnc5tk3H/Yym32tTMGQAfAMVGFTUJA1z9zQB3YremlDV1Hluotwj21UZWP9Aop6QTPvUMVIZVS8Hk6gADadVu4TihPloTy5N7JX99SgPqf3JZILRSMtEMCXLeT4gw34aq5e0HfxetOlKHTx6m2uS1SLFHi8OvcujtWEIAlTfXQW5pvsFGMJYOwNwWjncOoZETXaTs1LspDUHchPEHypp4CHEM5Y3e5HhsKBkA9cFJs6oA26XQW7y/AndroidPortrait.mp4"; filename="oOJGp433ODZvBOZTCXNz1oO7ogG0j3BRRBo98jpx1iIlvMPeNoc8nBKvpoTjx9PsOl5ulGGWniur3TdbDSd9TpgsnWhhqurcQO3TnssSQNHWti7xm7nZGW6tFRtrjrvwoJm9Bds5AsMcNKxT7oBkzA35fA1fgo5jkiUAfHHiduMdGIYf3NJGk8LP54JAORfYEK05mdHdQ4zfpMKfDUNJLnc5tk3H/Yym32tTMGQAfAMVGFTUJA1z9zQB3YremlDV1Hluotwj21UZWP9Aop6QTPvUMVIZVS8Hk6gADadVu4TihPloTy5N7JX99SgPqf3JZILRSMtEMCXLeT4gw34aq5e0HfxetOlKHTx6m2uS1SLFHi8OvcujtWEIAlTfXQW5pvsFGMJYOwNwWjncOoZETXaTs1LspDUHchPEHypp4CHEM5Y3e5HhsKBkA9cFJs6oA26XQW7y/AndroidPortrait.mp4"
Content-Type: video/mp4
------WebKitFormBoundaryBEDAIwXzG6Ik2xVY--
When my create() method receives the request, I find that request.FILES contains only the first file (AndroidLandscape.mp4). The second file (AndroidPortrait.mp4) seems to be silently discarded.
I suspect that this is being done by parser.MultiPartParser, but I'm not sure.
Is it being discarded because the path is too long?
(Update: I did some testing, and 470 characters seems to be the magic path length limit. If the path is 471 characters or longer, the file is NOT included in request.FILES)
If upload paths cannot be that long, I can accept that, but I need to detect that this has happened so that I can return an appropriate error response to the client, instead of silently discarding files. If so, how can I detect that in my method?

I finally found out why this is happening. The Django Rest Framework's multi-part form header parser has the maximum header length hard coded at 1024 bytes. With a long file path, the size of the header is too long, and the MultiPartParser stops reading from the stream after 1024 bytes. This results in the header being invalid, and the file is discarded. Unfortunately, this "overflow" is silently swallowed, and thus the file is silently discarded, leaving no way for the developer's code to even know that it happened, or that the file was even attempted to be uploaded.
I was able to implement a working solution by subclassing/overriding the affected classes, copy/pasting the library code for the affected methods, and finally changing the hard coded 1024 byte limit to a higher number.
It's not a great solution, because patching library code is brittle, and the solution could cause conflicts in future versions of DRF, but that's the only solution that I see at this point.
If anyone wants to implement this solution, the code with the hard coded limit is this:
in .../site-packages/django/http/multipartparser.py:
class Parser:
def __init__(self, stream, boundary):
self._stream = stream
self._separator = b'--' + boundary
def __iter__(self):
boundarystream = InterBoundaryIter(self._stream, self._separator)
for sub_stream in boundarystream:
# Iterate over each part
yield parse_boundary_stream(sub_stream, 1024)
The 1024 in the last line must be increased to the desired max header length.

Related

Handling Binary (excel) file in Multi-data Post data in Suave.IO

I am trying to build a simple Suave.IO application to centralize the sending of emails. Currently the application has one endpoint that takes subject, body, recipients, attachments, and sender as form data and turns them into an EWS email message from a logging email account.
Everything works as intended in most cases, but I get a file corruption issue when one of the attachments is an excel file. In those cases, the file seems to get corrupted.
Currently, I am filtering the request.multipartFields down to only the ones that are marked as attachment files, and then doing this:
for (fileField: (string*string)) in fileFields do
let fname = (fst fileField)
let fpath = "uploadedFiles\\" + fname
File.WriteAllBytes(fpath, Encoding.ASCII.GetBytes (snd fileField)) |> ignore
The file path and the attachment names are then fed into the EWS message before sending.
Again, this seems to work with all attachments except attachments with binary. It seems like Suave.IO automatically encodes all multiPartFields as (string*string), which may require special handling when it's binary data.
How should I handle upload of binary files?
Thanks all in advance.
It looks like the issue was one of encoding. I was testing using python's request interface, and by default the files are encoded as multipart/form-data. By specifying a specific encoding for each file, I was able to help the server identify the incoming data as a file.
instead of
requests.post(url, data=data, files={filename: open(filepath, 'rb')})
I needed to make it
requests.post(url, data=data, files={filename: (filename, open(filepath, 'rb'), mimetypes.guess(filepath)})
With the second python script, files do end up in the files section of the request and I was able to save the excel file without corruption.

GZip stream compression with Delphi (optionally with tar)

I am searching and searching since hours to create a valid .tar.gz file using streams in Delphi 10.
I was able to solve the tarball part using LibTar, which works well.
After some searching I also found examples to decompress gzip data using just System.ZLib. The secret lies in the WindowBits parameter:
// 31 bit wide window = gzip only mode
DecompStream:= TZDecompressionStream.Create(SourceStream, 15 + 16);
TarStream:= TTarArchive.Create(DecompStream);
TarStream.Reset;
while TarStream.FindNext(DirRec) do {...} TarStream.ReadFile(TargetStream);
Great! But is it really possible that System.ZLib is able to decompress gzip (I guess by just ignoring the gzip header by that +16?), but is not able to create such header by itself? Whatever I try, I only get a file that cannot be opened by 7zip or WinRar, because the header is missing.
Maybe it just can't work, because the gzip header contains a checksum, so it's not possible to write the header without knowing the following data. How to solve this? Edit: this is wrong, see comments: crc32 is in the trailer.
It seems, many others also have this problem - I found and tried multiple solutions to add this header, but nothing really worked and everything requires adding long units (not nice but acceptable) or even DLLs (not acceptable for me).
The secret lies in the WindowBits parameter - sounds familiar? :)
Believe it or not, compressing to gzip just works the same way! I couldn't find this anywhere using Google, or in the Embarcadero documentation/help. But have a look at this comment in the System.ZLib source of Delphi Tokyo:
Add 16 to windowBits to write a simple gzip header and
trailer around the compressed data instead of a zlib wrapper. The
gzip header will have no file name, no extra data, no comment, no
modification time (set to zero), no header crc, and the operating
system will be set to 255 (unknown).
It works:
TargetStream:= TFileStream.Create(TargetFilename, fmCreate);
CompressStream:= TZCompressionStream.Create(TargetStream, zcDefault, 15 + 16);
TarStream:= TTarWriter.Create(CompressStream);
TarStream.AddStream(SourceStream1, SourceFilename1, Now);
TarStream.AddString(SourceString2, SourceFilename2, Now);

Upload File with unknown size OneDrive

So I am uploading a file to one drive using a resumable file upload session. However I cannot know the size of the file before uploading. I know that google allows uploading of files with content ranges like
Content-Range: 0-128/*
I would assume OneDrive would also allow it as it is even specified in RFC2616
Content-Range = "Content-Range" ":" content-range-spec
content-range-spec = byte-content-range-spec
byte-content-range-spec = bytes-unit SP
byte-range-resp-spec "/"
( instance-length | "*" )
byte-range-resp-spec = (first-byte-pos "-" last-byte-pos)
| "*"
instance-length = 1*DIGIT
The header SHOULD indicate the total length of the full
entity-body, unless this length is unknown or difficult to
determine. The asterisk "*" character means that the
instance-length is unknown at the time when the response was
generated.
But after reading the OneDrive documentation I found it says
Example: In this example, the app is uploading the first 26 bytes of a
128 byte file.
The Content-Length header specifies the size of the current request.
The Content-Range header indicates the range of bytes in the overall
file that this request represents.
The total length of the file is
known before you can upload the first fragment of the file. HTTP
Copy PUT https://sn3302.up.1drv.com/up/fe6987415ace7X4e1eF866337
Content-Length: 26 Content-Range: bytes 0-25/128
<bytes 0-25 of the file>
Important: Your app must ensure the total
file size specified in the Content-Range header is the same for all
requests. If a byte range declares a different file size, the request
will fail.
Maybe I'm just reading the documentation wrong or it only applies to this example but before I go and build a entire uploader I would just like clarification if this is allowed for OneDrive
My Question is
Does OneDrive allow uploading of files with unknown sizes?
Thanks in advance for your help
For anyone else wondering, no you can't, file uploads must specify a size.

Using MSXML2.ServerXMLHTTP to access data from a web page returns truncated data in Lua

I am trying to download a source code file from a web site which works fine for small files, but a couple of larger ones get truncated.
The example below should be returning a file 146,135 bytes in size, but returns one of 141,194 bytes with a status of 200.
I have tried winhttp.winhttprequest.5.1 as well, but both seem to truncate at the same point.
I have also found quite a few people with similar problems, but have not been able to find a solution.
require('luacom')
http = luacom.CreateObject('MSXML2.ServerXMLHTTP')
http:Open("GET","http://www.family-historian.co.uk/wp-content/plugins/forced-download2/download.php?path=/wp-content/uploads/formidable/tatewise/&file=Map-Life-Facts3.fh_lua&id=190",true)
http:Send()
http:WaitForResponse(30)
print('Status: '..http.Status)
print('----------------------------------------------------------------')
headers = http:GetAllResponseHeaders()
data = http.Responsetext
print('Data Size = '..#data)
print('----------------------------------------------------------------')
print(headers)
I finally worked out what was going on so will post it here for others.
To avoid the truncation I needed to use ResponseBody and not ResponseText, what appears to be happening is the file is being sent in binary format, the ResponseText data is the same number of bytes as the ResponseBody one, but is in UTF-8 format, this means the number if special characters in the file (which are double byte in UTF-8 are dropped from the end of the ResponseText. I am not sure at what level the "mistake" in the length is made, but the way to avoid it is to use ResponseBody.

Mobile Safari makes multiple video requests

I am designing a web application for iPad which makes use of HTML5 in mobile safari. I am transmitting the file manually through an ASP.NET .ashx file hosted on IIS 7 running .NET Framework v2.0.
The essential code looks partly like this:
// If we receive range header only transmit partial file
if (context.Request.Headers["Range"] != null)
{
var fi = new FileInfo(filePath);
long fileSize = fi.Length;
// Read start/end index
string headerRange = context.Request.Headers["Range"].Replace("bytes=", "");
string[] range = headerRange.Split('-');
int startIndex = Convert.ToInt32(range[0]);
int endIndex = Convert.ToInt32(range[1]);
// Add header Content-Range,Last-Modified
context.Response.StatusCode = (int)HttpStatusCode.PartialContent;
context.Response.AddHeader(HttpWorkerRequest.GetKnownResponseHeaderName(HttpWorkerRequest.HeaderContentRange), String.Format("bytes {0}-{1}/{2}", startIndex, endIndex, fileSize));
context.Response.AddHeader(HttpWorkerRequest.GetKnownResponseHeaderName(HttpWorkerRequest.HeaderLastModified), String.Format("{0:r}", fi.CreationTime));
long length = (endIndex - startIndex) + 1;
context.Response.TransmitFile(filePath, startIndex, length);
}
else
context.Response.TransmitFile(filePath);
Now what confuses me to no end is the the protocols for requesting that safari seems to use. From proxying the requests through fiddler i get the following for an aprox 2MB file.
NOTE: When requesting an mp4 file, directly served through IIS 7, the protocol and amount of request are the same
First it requests 2 bytes which allows it to read the 'Content-Range' header.
Now it request the entire content (?)
-
It proceeds to do step 1. & 2. again (??)
-
It now requests only parts of the file (???)
If the file is larger the last steps will be many more. I have tested up to 99 request where each request contains a part of the file equally split. This makes sense and is what would be expected I think. What I cannot make sense of is why it makes 2 initial request for the first 2 bytes as well as 2 requests for the entire file before it finally requests the file in different parts.
As I conclude this results in the file being downloaded between 2 - 3 times, depending on the length of the file and whether the user watches it long enough.
Can anybody make sense of this behavior and maybe explain what I can do to prevent multiple downloads. Thanks.
Per my comment to your question, I've had a similar issue in the past. One thing you could try if you have control of the server (I did not) is to disable either gzip or identity encoding of the file. I believe that in the first request for the entire content (#2 in your list) it asks for the content with gzip encoding (compressed). Perhaps you can configure your IIS to not to serve the file for a gzip-encoding request.
Here is my original (unanswered) question on the subject:
https://stackoverflow.com/questions/4855485/mpmovieplayercontroller-not-playing-full-length-mp3

Resources