Efficiently parsing a CSV file from a web service (using CHSVParser) - ios

My app is currently parsing CSV files from a web service by using a combination of componentsSeparatedByCharactersInSet and componentsSeparatedByString methods.
As the files are quite large (> 1 mb on average), parsing takes a couple of seconds on an iPad, which is too slow. The memory footprint of my solution is an issue too (I am holding the full text file in memory).
This is why I am looking for a faster and more memory-efficient solution. I came across CHCSVParser which can parse NSInputStreams directly, e.g.
NSInputStream *stream = [NSInputStream inputStreamWithFileAtPath:file];
CHCSVParser * p = [[CHCSVParser alloc] initWithInputStream:stream
usedEncoding:&encoding delimiter:';'];
(Source from the sample project on CHCSVParser)
My question:
How can I get an NSInputStream as the result of an NSURLRequest? (Currently I am getting the whole CSV file as a NSData object and converting it to NSString in order to parse it).
Could I use the NSInputStream from an NSURLRequest directly with CHSVParser?
Would you generally recommend using CHCSVParsers initWithInputStream method with a NSURLRequest or rather download the document to memory and parse if after the full download?

Download the file to disk using NSURLConnectionDelegate and NSOutputStream (that way you use as little memory as possible while downloading) and then open an NSInputStream to the same file and pass it into CHCSVParser.

Related

Can I create an NSURL that refers to in-memory NSData?

The docs for NSURL state that:
An NSURL object represents a URL that can potentially contain the
location of a resource on a remote server, the path of a local file on
disk, or even an arbitrary piece of encoded data.
I have a blob of in-memory data that I'd like to hand to a library that wants to load a resource via an NSURL. Sure, I can first write this NSData to a temp file and then create a file:// NSURL from that, but I'd prefer to have the URL point directly to the buffer that I already have present in memory.
The docs quoted above seem to suggest this is possible, but I can't find any hint of how to accomplish it. Am I missing something?
NSURL supports the data:// URL-Scheme (RFC 2397).
This scheme allows you to build URLs in the form of
data://data:MIME-Type;base64,<data>
A working Cocoa example would be:
NSImage* img = [NSImage imageNamed:#"img"];
NSData* imgData = [img TIFFRepresentation];
NSString* dataFormatString = #"data:image/png;base64,%#";
NSString* dataString = [NSString stringWithFormat:dataFormatString, [imgData base64EncodedStringWithOptions:0]];
NSURL* dataURL = [NSURL URLWithString:dataString];
Passing around large binary blobs with data URLs might be a bit inefficient due to the nature of base64 encoding.
You could also implement a custom NSURLProtocol that specifically deals with your data.
Apple has some sample code that uses a custom protocol to pass around image objects: https://developer.apple.com/library/mac/samplecode/SpecialPictureProtocol/Introduction/Intro.html#//apple_ref/doc/uid/DTS10003816
What you are missing is the NSURLProtocol class. Takes about three dozen lines of code, and any code that handles URLs properly can access your in-memory data. Read the documentation, it's not difficult and there is sample code available.
Unfortunately there are some APIs that take an NSURL as a parameter, but can only handle file URLs.

Dropbox sync api large video file upload

I am using Dropbox sync api for downloading text file and upload video file from/to dropbox via my ios application.
I am struggling while uploading heavy video file.While i am uploading video file of duration 15 to 20 minutes its uploaded correctly, but if the duration is more than 25 minutes
then it gets memory waring and app crashes.
I am using this code on upload button action
DBPath *paths=[[DBPath root] childPath:[self.allVideoArray objectAtIndex:Selectedvideo]];
DBFile *createfile=[filesystem createFile:paths error:nil];
NSData *data=[[NSData alloc]initWithContentsOfFile:self.path];
[createfile writeData:data error:nil];
[data relese];
Please some body way me out from this problem. Any help should be appreciable, Thanks in advance.
The problem is that you create an NSData instance containing the entire file. If the file is too big to fit into memory your app will crash. There are better ways to write large files to a DBFile.
Since you have a path to the local file you could do:
DBPath *paths=[[DBPath root] childPath:[self.allVideoArray objectAtIndex:Selectedvideo]];
DBFile *createfile=[filesystem createFile:paths error:nil];
[createFile writeContentsOfFile:self.path shouldSteal:NO error:nil];
Another option would be to read the file at self.path in smaller chunks and use DBFile appendData:error:.
Side note - you really need to check return values to make sure these calls are working or not and make use of the error parameter to log the cause of the problem (if any).

Enqueueing into NSInputStream?

I would like to add three "parts" to an NSInputStream: an NSString, an output from another stream and then another NSString. The idea is the following:
The first and last NSStrings represent the beginning and end of a SOAP request while the output from the stream is a result of loading a very large file and encoding it as Base64 string. So, in the end I would have the final NSInputStream hold the whole SOAP request like this:
< soap beginning > < Base64 encoded data > < soap ending >
The reason I want the whole request to be held in NSInputStream is two-fold:
I don't what to load the very large data file into memory
I think that this is the only way to enforce sending the final request in HTTP 1.1 chunks (which I need because otherwise, if the request becomes too big, the server won't accept it). So, I know that doing this:
NSInputStream *dataStream = ....;
[request setHTTPBodyStream:dataStream];
ensures that the request will be sent as HTTP 1.1 chunks and not as one huge raw SOAP request.
So, I wonder how this can be achieved - namely, how do I "enqueue" things into an NSInputStream? Can it be even done? Is there an alternative way?
Just for reference, in Java this can be done as follows
Vector<InputStream> streamVec = new Vector<InputStream>();
BufferedInputStream fStream = new BufferedInputStream(fileData.getInputStream());
Base64InputStream b64stream = new Base64InputStream(fStream, true);
String[] SOAPBody = GenerateSOAPBody(fileInfo).split("CUT_HERE");
streamVec.add(new ByteArrayInputStream(SOAPBody[0].getBytes()));
streamVec.add(b64stream);
streamVec.add(new ByteArrayInputStream(SOAPBody[1].getBytes()));
SequenceInputStream seqStream = new SequenceInputStream(streamVec.elements());
because Java has these objects available, but NSStreams in objective-c look like very low level objects and are very hard to work with.
Note: I completely re-wrote the original question as I asked it 2 days ago, since I think the new edit explains more clearly what the problem is. I hope it would help it be easier comprehended and maybe answered
UPDATE 2
Here is what I've been able to achieve so far: Instead of trying to enqueue into a stream, I am using a temp file to first write the < soap beginning >, then I set up an input stream to read from the file in chunks, encode each chunk as a Base64 string and write this to the same temp file, finally, when my stream closes, I write the < soap ending > to the temp file. Then I set up another input stream with the contents of this file which I pass to the NSMutableURLRequest:
NSMutableURLRequest* request = [NSMutableURLRequest requestWithURL:url];
...
NSInputStream *dataStream = [NSInputStream inputStreamWithFileAtPath:_tempFilePath];
[request setHTTPBodyStream:dataStream];
This ensures HTTP 1.1 chunked transfer of the contents of the file. After the connection finishes, delete the temp file.
This seems to work fine but of course this is an annoying work-about. I don't want to be writing to a temp file when it all could have been handled by streams (ideally.) If anybody still has better suggestions, let me know :)
UPDATE 3
OK, another update is in order. While my writing to file seems to work, I am now hitting an unexpected issue with some of my requests failing to upload to the server. Specifically, everything is going according to the plan, I am reading the contents of the temp file into a stream and set HTTP body of my request to be this stream and it starts transmitting the HTTP 1.1 chunks as I want it to - but for some reason some packets get dropped and the final request - this is my guess - gets malformed and thus fails. I think the issue of dropped packets is random, because I observe it on larger requests - that is, the issue just has more chance to show up - while my smaller requests usually go thru just fine. This is of course a separate issue from the original in this question. If anybody has a good idea what might be causing this, I asked about the problem here: Packets dropped during chunked HTTP 1.1 request sent by NSURLConnection
Your solution is an ok option, but you can do it with a stream. It means subclassing NSInputStream, and that isn't trivial because there are a bunch of methods you need to implement.
Basically your subclass would initially return the header bytes, then it would return bytes from the 'internal' stream to the file content, then when that's used up it returns the footer bytes. It means maintaining a record of how big the header and footer are and how much has been processed so far, but that isn't a big issue.
There's an example of creating a subclass here which shows the tricky hidden methods you need to implement to get the stream subclass to work properly without throwing exceptions.

decoding a HUGE NSString, running out of memory

I'm looking for ideas on how to improve a process of decoding a 40+MB NSString with base64 encoding and saving it to a file while being able to fit the process into iPad 1's 256 MB of RAM
I get the NSString from NSXMLParser:
id pointerToString;
- (void)parser:(NSXMLParser *)parser foundCharacters:(NSString *)string{
if ([currentElement isEqualToString:#"myElement"])
{
pointerToString = [string retain];
}
}
Then I use the pointerToString in a callback:
[handler performSelector: action withObject: pointerToString];
In the callback (id value is the pointerToString). I initialize NSData with the pointerToString while decoding it with base64 encoding.
^(id value)
{
if ( [[value class] isSubclassOfClass:[NSString class]] )
{
NSData *data = [NSData dataFromBase64String:value];
[data writeToFile:file.path atomically:YES];
}
}
the iPad 1 device runs out of memory and gets killed by the iOS when the memory allocation reaches around 130MB after or during the NSData call.
I have determined that in order to process the 40+MB NSString this way, I'd need about 180+MB of RAM (this is what the maximum memory allocation is on iPad 2 & 3, where the process works because of more RAM)
Any ideas/tips ?
Thank you
Edit:
When dealing with a file of this size, you probably do not want to load the entire multi-megabyte file in memory at one time, neither the huge input file nor the almost-as-huge output file. You should be parsing this in a streaming fashion, decoding the data in your foundCharacters as you go along, not holding any significant portions in memory.
The traditional techniques, though, may hold your entire XML file memory in three phases of the process:
As you download the XML file from the server;
As the XML parser parses that file; and
As you do the Base64-decode of the file.
The trick is to employ a streaming technique, that does these three processes at once, for small chunks of the single, large XML file. Bottom line, as you're downloading the entire 50mb file, grab a few kb, parse the XML, and if you're parsing the Base64-encoded field, perform the Base64-decode for that few kb, and the proceed to the next chunk of data.
For an example of this (at least the streaming XML downloading-and-parsing, not including the Base64-decoding), please see Apple's XMLPerformance sample project. You'll see that it will demonstrate two XML parsers, the NSXMLParser that we're all familiar with, as well as the less familiar LibXML parser. The issue with NSXMLParser is that, left to it's own devices, will load the entire XML file in memory before it starts parsing, even if you use initWithContentsOfURL.
In my previous answer, I mistakenly claimed that by using initWithContentsOfURL, the NSXMLParser would parse the URL's contents in nice little packets as they were being downloaded. The foundCharacters method of NSXMLParserDelegate protocol seems so analogous to the NSURLConnectionDelegate method, didReceiveData, that I was sure that NSXMLParser was going to handle the stream just like NSURLConnection does, namely returning information as the download was in progress. Sadly, it doesn't.
By using LibXML, though, like the Apple XMLPerformance sample project, you can actually use the NSURLConnection ability of streaming, and thus parse the XML on the fly.
I have created a little test project, but I might suggest that you go through Apple's XMLPerformance sample project in some detail. But in my experiment, a 56mb XML file consumed well over 100mb when parsing and converting via NSXMLParser but only consumed 2mb when using LibXML2.
In your comments, you describe the desire to download the Base64-encoded data to a file and then decode that. That approach seems a lot less efficient, but certainly could work. By the way, on that initial download, you have the same memory problem (that I solve above). I urge you to make sure that your initial download of the Base64-encoded data does not blithely load it into RAM like most routines do. You want to, assuming you're using NSURLConnection, write the data to the NSOutputStream as you receive the data in didReceiveData, not hold it in RAM.
See the didReceiveResponse in AdvancedGetController.m of Apple's AdvancedURLConnections example for an example of how to write a file as it's being received, rather than typical patterns of adding it to a NSMutableData (because most of these routines just assume you're dealing with a reasonably sized file). (Ignore all the stuff in that AdvancedURLConnections sample about authentication and the like, but focus on understanding how it's writing to the NSOutputStream as it goes.) This technique will address the first of the three problems listed at the top of this answer, but not the latter two. For that, you'll have to contemplate using LibXML2 as illustrated in Apple's XMLPerformance sample project, or other similar techniques.
The method
- (void)parser:(NSXMLParser *)parser foundCharacters:(NSString *)string
is probably not receiving all the data at once.
Doc is saying
"Sent by a parser object to provide its delegate with a string representing all or part of the characters of the current element."
So it is called multiple times.
It looks like you are trying to write the whole string at once (sorry if I am wrong).
So you could append the received data to the file by doing the following:
You can use a combination of
-writeData:
and
-seekToEndOfFile
methods from NSFileHandle class for writing NSData to the end of a file.
But be carefull with your base64 encoding on partial data receivment!

Partial file reading and writing with iOS SDK

I want to get the first 8 bytes or so of a file without reading the whole file. I'm using NSData to operate on the data and such, but I don't want to slow down my application with excessive file reads and writes because in some cases I'm having to read a 200 kilobyte file just to extract the first 2 bytes of data from the file. Is there any way to only read or write a part of the file without reading or overwriting the whole thing in Xcode with the iOS SDK?
The file system that I'm using is just the default one that's accessible through the NSFileManager class (I don't know of any other iOS file system).
You may take advantage of the higher level NSFileHandle class. The NSFileHandle class is an object-oriented wrapper for a file descriptor. You use file handle objects to access data associated with files, sockets, pipes, and devices. For files, you can read, write, and seek within the file. For sockets, pipes, and devices, you can use a file handle object to monitor the device and process data asynchronously.
- (NSData *)readDataOfLength:(NSUInteger)length
You can get more info in official documentation NSFileHandle Class Reference
Use the standard C file API (either FILE* or int file descriptors). The caveat is that you have to properly convert the string path to a correct char* file path. Also, don't forget to close the file when done. Consider a category on NSData, something kinda like this...
+ (id)dataWithContentsOfFile:(NSString *)filePath numBytes:(NSUInteger)numBytes
{
void *bytes = malloc(numBytes);
NSData *result = [NSData dataWithBytesNoCopy:bytes length:numBytes];
char const *path = [[NSFileManager defaultManager] fileSystemRepresentationWithPath:filePath];
int fd;
if ((fd = open(path, O_RDONLY)) < 0 || read(fd, bytes, numBytes) != numBytes) {
result = nil;
}
close(fd);
return result;
}

Resources