decoding a HUGE NSString, running out of memory

decoding a HUGE NSString, running out of memory - ios

I'm looking for ideas on how to improve a process of decoding a 40+MB NSString with base64 encoding and saving it to a file while being able to fit the process into iPad 1's 256 MB of RAM
I get the NSString from NSXMLParser:
id pointerToString;
- (void)parser:(NSXMLParser *)parser foundCharacters:(NSString *)string{
if ([currentElement isEqualToString:#"myElement"])
{
pointerToString = [string retain];
}
}
Then I use the pointerToString in a callback:
[handler performSelector: action withObject: pointerToString];
In the callback (id value is the pointerToString). I initialize NSData with the pointerToString while decoding it with base64 encoding.
^(id value)
{
if ( [[value class] isSubclassOfClass:[NSString class]] )
{
NSData *data = [NSData dataFromBase64String:value];
[data writeToFile:file.path atomically:YES];
}
}
the iPad 1 device runs out of memory and gets killed by the iOS when the memory allocation reaches around 130MB after or during the NSData call.
I have determined that in order to process the 40+MB NSString this way, I'd need about 180+MB of RAM (this is what the maximum memory allocation is on iPad 2 & 3, where the process works because of more RAM)
Any ideas/tips ?
Thank you

Edit:
When dealing with a file of this size, you probably do not want to load the entire multi-megabyte file in memory at one time, neither the huge input file nor the almost-as-huge output file. You should be parsing this in a streaming fashion, decoding the data in your foundCharacters as you go along, not holding any significant portions in memory.
The traditional techniques, though, may hold your entire XML file memory in three phases of the process:
As you download the XML file from the server;
As the XML parser parses that file; and
As you do the Base64-decode of the file.
The trick is to employ a streaming technique, that does these three processes at once, for small chunks of the single, large XML file. Bottom line, as you're downloading the entire 50mb file, grab a few kb, parse the XML, and if you're parsing the Base64-encoded field, perform the Base64-decode for that few kb, and the proceed to the next chunk of data.
For an example of this (at least the streaming XML downloading-and-parsing, not including the Base64-decoding), please see Apple's XMLPerformance sample project. You'll see that it will demonstrate two XML parsers, the NSXMLParser that we're all familiar with, as well as the less familiar LibXML parser. The issue with NSXMLParser is that, left to it's own devices, will load the entire XML file in memory before it starts parsing, even if you use initWithContentsOfURL.
In my previous answer, I mistakenly claimed that by using initWithContentsOfURL, the NSXMLParser would parse the URL's contents in nice little packets as they were being downloaded. The foundCharacters method of NSXMLParserDelegate protocol seems so analogous to the NSURLConnectionDelegate method, didReceiveData, that I was sure that NSXMLParser was going to handle the stream just like NSURLConnection does, namely returning information as the download was in progress. Sadly, it doesn't.
By using LibXML, though, like the Apple XMLPerformance sample project, you can actually use the NSURLConnection ability of streaming, and thus parse the XML on the fly.
I have created a little test project, but I might suggest that you go through Apple's XMLPerformance sample project in some detail. But in my experiment, a 56mb XML file consumed well over 100mb when parsing and converting via NSXMLParser but only consumed 2mb when using LibXML2.
In your comments, you describe the desire to download the Base64-encoded data to a file and then decode that. That approach seems a lot less efficient, but certainly could work. By the way, on that initial download, you have the same memory problem (that I solve above). I urge you to make sure that your initial download of the Base64-encoded data does not blithely load it into RAM like most routines do. You want to, assuming you're using NSURLConnection, write the data to the NSOutputStream as you receive the data in didReceiveData, not hold it in RAM.
See the didReceiveResponse in AdvancedGetController.m of Apple's AdvancedURLConnections example for an example of how to write a file as it's being received, rather than typical patterns of adding it to a NSMutableData (because most of these routines just assume you're dealing with a reasonably sized file). (Ignore all the stuff in that AdvancedURLConnections sample about authentication and the like, but focus on understanding how it's writing to the NSOutputStream as it goes.) This technique will address the first of the three problems listed at the top of this answer, but not the latter two. For that, you'll have to contemplate using LibXML2 as illustrated in Apple's XMLPerformance sample project, or other similar techniques.

The method
- (void)parser:(NSXMLParser *)parser foundCharacters:(NSString *)string
is probably not receiving all the data at once.
Doc is saying
"Sent by a parser object to provide its delegate with a string representing all or part of the characters of the current element."
So it is called multiple times.
It looks like you are trying to write the whole string at once (sorry if I am wrong).
So you could append the received data to the file by doing the following:
You can use a combination of
-writeData:
and
-seekToEndOfFile
methods from NSFileHandle class for writing NSData to the end of a file.
But be carefull with your base64 encoding on partial data receivment!

Related

AES128 encryption on iOS without loading entire NSData

Using samples from Apple Dev library (https://developer.apple.com/library/prerelease/ios/samplecode/CryptoExercise/Listings/Classes_SecKeyWrapper_m.html),
in particular this method
- (NSData *)doCipher:(NSData *)plainText key:(NSData *)symmetricKey context:(CCOperation)encryptOrDecrypt padding:(CCOptions *)pkcs7 { return nil; }
I can decrypt and encrypt files by reading them to NSData and then passing to this method.
however, this has an unfortunate consequence of me having to use a lot of RAM for this process.
Is there any way to decrypt the file right on disk without loading the file to NSData?

You can use the individual Common Crypto routines: CCCrytorCreate(), CCCryptorUpdate(), CCCryptorFinal(), and CCCryptorRelease() repeating CCCryptorUpdate() for each chunk.
See Mike Ash's Friday on A Tour of CommonCrypto.

iOS AES encryption wrong result

Here is how I encrypt the string:
+ (NSString *)encrypt:(NSString *)message password:(NSString *)password {
NSData *encryptedData = [[message dataUsingEncoding:NSUTF8StringEncoding] AES256EncryptedDataUsingKey:[password dataUsingEncoding:NSUTF8StringEncoding] error:nil];
NSString *base64EncodedString = [NSString base64StringFromData:encryptedData length:[encryptedData length]];
return base64EncodedString;
}
The plain text is:
{"roomID":"{\"array\":[\"949156\",\"949157\"]}","duration":15,"link":"","type":"text","thumbnailBlobID":"","posy":103.6809424405021,"text":"Aa","className":"Message","originalBlobID":"","datetime":"20140319214528457","selfDestructive":0,"userID":"949157","posx":1.347154229880634,"status":"normal","entityID":"20140319214528457and949157and{\"array\":[\"949156\",\"949157\"]}"}
This is what I get
gXqxfDhImRD7S20lUdYuCPAlXfqRnG6xk4w4K5Op/WnYMh6VgJUUqMifK2vHkUpAbnZ+wKdSWjfzU1PuOwvJ4dJ9EiHwjeyyorezFNG6eylYcOvMWNeU6+5Z9XxfcFngqhmxM6k1lf7bkttTu0FnEHad/czFgiMVTy60DJpFMLSODkKEVezqQB9s/f3Qy/B6+sF5Hs5E0FDn7kU6Jtm6mLkFjGzDCXTdFXNjdussbkTL8C1gcOnn4hrNkqQKb82MgqqYf8sVgs4FVIjsmoJd0ALY8y/5QbBkgc6ZyB4aOQPPx/u4HS3F7HXHkIkkAjZS/hiHQBRyfwCvi2uwFedno5twYogNW56pSMQqBeJBxBAhPMpXzb51853GLP4bCotGtOyEfU96x5kWHDOR5QA2WhYZkB3AALDJ2kfqzWR8iOKHo3zE6DCQ7aH0RwEFlNPi8vsNwvUqtQ7nUODA5lUMYah6W2rfDh/em8BD8dGF5J6IUTIlSerx8wWPA9bn/SxO
From website http://aesencryption.net (256 bit)
(Which i assume to be the correct answer
5MdV0TelF++/8Cy9bnkeah0nQ5JbC04CEdCcHfdlavQtZaxg3ZSXklp9yXbeP05hcIeQDgFcMr9NlD6aKhjBL3Xh70ksYqc6Xv5BZvCbXrvO4ufAf4gjmDRQr9DYSbjFct6N82fFGDtrcuFm36Zv+QAQtR/scT86A++Vn/EBlPwFb7ZmxlMPkJWjQ98ObreXHeKkZ8f2npMKfJ0i36nIZ8CZeL0EYeg/njo9ykPTfm9wfKieqlIICn1qNZAXE//P9hTleW/GNs5+ET2gxNSCmdO+ByUB9Q3sZ/+57qXbsfCxHr8dsuBrsbRI+cVIXyquQL1IC/zuz3G3fcyoiLrD/PnFtV5z5XR0hpUiU8JjovjYwyXaBfyTBnO71zxmdoZdsyPwA1LQO0pedn8UsICT2KbfBKwuumW2CJPexbnMmVzpIJ/VPISikdg18V3rdJqiPMIb4Zq2PGKO0Wtq1dCTMusTv/ZnqxgVQFQlUivgBqtnOLCDaMAGL636NXda95V2

There is no single standard way to apply AES, or standard data format for the output. AES requires a number of helpers when used on data that is not exactly 16-bytes long, and they can be configured in different ways. I have no idea how the aesencryption.net tool is applying these helpers; it doesn't say. If AES256EncryptedDataUsingKey: is the particular piece of code I assume it, it applies them very poorly (it's very similar to the code I discuss in Properly Encrypting With AES With CommonCrypto). I would not be surprised if aesencryption.net does something different.
If you have a piece of plaintext and a key, and you pass it to an encryptor twice and get the same answer back, then your encryptor is broken. A correct AES encryptor (for almost any common use of AES) should always return different results for the same plaintext+key (otherwise an attacker can determine that two plaintexts are equal, which breaks the security proof of AES). In the most common case, this is achieved by having a unique initialization vector (IV). For password-based AES, you also include a random salt. So even if these were good implementations of AES, you wouldn't expect your results to match.

Is it possible that the escape characters (the back slashes) are being interpreted differently in code versus via the web? The bottom line here is I would (in code) decode what you just encoded and you should come out with the same as what you put in. This is probably the test you want to conduct. Hope this helps. Also see comment below from #RobNapier

Efficiently parsing a CSV file from a web service (using CHSVParser)

My app is currently parsing CSV files from a web service by using a combination of componentsSeparatedByCharactersInSet and componentsSeparatedByString methods.
As the files are quite large (> 1 mb on average), parsing takes a couple of seconds on an iPad, which is too slow. The memory footprint of my solution is an issue too (I am holding the full text file in memory).
This is why I am looking for a faster and more memory-efficient solution. I came across CHCSVParser which can parse NSInputStreams directly, e.g.
NSInputStream *stream = [NSInputStream inputStreamWithFileAtPath:file];
CHCSVParser * p = [[CHCSVParser alloc] initWithInputStream:stream
usedEncoding:&encoding delimiter:';'];
(Source from the sample project on CHCSVParser)
My question:
How can I get an NSInputStream as the result of an NSURLRequest? (Currently I am getting the whole CSV file as a NSData object and converting it to NSString in order to parse it).
Could I use the NSInputStream from an NSURLRequest directly with CHSVParser?
Would you generally recommend using CHCSVParsers initWithInputStream method with a NSURLRequest or rather download the document to memory and parse if after the full download?

Download the file to disk using NSURLConnectionDelegate and NSOutputStream (that way you use as little memory as possible while downloading) and then open an NSInputStream to the same file and pass it into CHCSVParser.

Enqueueing into NSInputStream?

I would like to add three "parts" to an NSInputStream: an NSString, an output from another stream and then another NSString. The idea is the following:
The first and last NSStrings represent the beginning and end of a SOAP request while the output from the stream is a result of loading a very large file and encoding it as Base64 string. So, in the end I would have the final NSInputStream hold the whole SOAP request like this:
< soap beginning > < Base64 encoded data > < soap ending >
The reason I want the whole request to be held in NSInputStream is two-fold:
I don't what to load the very large data file into memory
I think that this is the only way to enforce sending the final request in HTTP 1.1 chunks (which I need because otherwise, if the request becomes too big, the server won't accept it). So, I know that doing this:
NSInputStream *dataStream = ....;
[request setHTTPBodyStream:dataStream];
ensures that the request will be sent as HTTP 1.1 chunks and not as one huge raw SOAP request.
So, I wonder how this can be achieved - namely, how do I "enqueue" things into an NSInputStream? Can it be even done? Is there an alternative way?
Just for reference, in Java this can be done as follows
Vector<InputStream> streamVec = new Vector<InputStream>();
BufferedInputStream fStream = new BufferedInputStream(fileData.getInputStream());
Base64InputStream b64stream = new Base64InputStream(fStream, true);
String[] SOAPBody = GenerateSOAPBody(fileInfo).split("CUT_HERE");
streamVec.add(new ByteArrayInputStream(SOAPBody[0].getBytes()));
streamVec.add(b64stream);
streamVec.add(new ByteArrayInputStream(SOAPBody[1].getBytes()));
SequenceInputStream seqStream = new SequenceInputStream(streamVec.elements());
because Java has these objects available, but NSStreams in objective-c look like very low level objects and are very hard to work with.
Note: I completely re-wrote the original question as I asked it 2 days ago, since I think the new edit explains more clearly what the problem is. I hope it would help it be easier comprehended and maybe answered
UPDATE 2
Here is what I've been able to achieve so far: Instead of trying to enqueue into a stream, I am using a temp file to first write the < soap beginning >, then I set up an input stream to read from the file in chunks, encode each chunk as a Base64 string and write this to the same temp file, finally, when my stream closes, I write the < soap ending > to the temp file. Then I set up another input stream with the contents of this file which I pass to the NSMutableURLRequest:
NSMutableURLRequest* request = [NSMutableURLRequest requestWithURL:url];
...
NSInputStream *dataStream = [NSInputStream inputStreamWithFileAtPath:_tempFilePath];
[request setHTTPBodyStream:dataStream];
This ensures HTTP 1.1 chunked transfer of the contents of the file. After the connection finishes, delete the temp file.
This seems to work fine but of course this is an annoying work-about. I don't want to be writing to a temp file when it all could have been handled by streams (ideally.) If anybody still has better suggestions, let me know :)
UPDATE 3
OK, another update is in order. While my writing to file seems to work, I am now hitting an unexpected issue with some of my requests failing to upload to the server. Specifically, everything is going according to the plan, I am reading the contents of the temp file into a stream and set HTTP body of my request to be this stream and it starts transmitting the HTTP 1.1 chunks as I want it to - but for some reason some packets get dropped and the final request - this is my guess - gets malformed and thus fails. I think the issue of dropped packets is random, because I observe it on larger requests - that is, the issue just has more chance to show up - while my smaller requests usually go thru just fine. This is of course a separate issue from the original in this question. If anybody has a good idea what might be causing this, I asked about the problem here: Packets dropped during chunked HTTP 1.1 request sent by NSURLConnection

Your solution is an ok option, but you can do it with a stream. It means subclassing NSInputStream, and that isn't trivial because there are a bunch of methods you need to implement.
Basically your subclass would initially return the header bytes, then it would return bytes from the 'internal' stream to the file content, then when that's used up it returns the footer bytes. It means maintaining a record of how big the header and footer are and how much has been processed so far, but that isn't a big issue.
There's an example of creating a subclass here which shows the tricky hidden methods you need to implement to get the stream subclass to work properly without throwing exceptions.

How to pack multiple informations to NSData and Send/Receive/Process the data?

I'm making a small card game on iOS. I'm using GameKit/GKsession to handle my network data transfer.
My question is how to "pack" multiple informations to a NSData and send it. And when the server receive the NSData, how to unpack it and process the information in right way.
For example, I can send and receive the NSString with no problem. But my game has different data type need to send and receive such as UIImage/NSString/NSArray/...
I found the sample project GKTank in SDK. But it's really hard to understand for me. In my guess, it has defined several data types.
Can someone tell me how to let the server know what kind of data the client is sending(NSString?UIImage?) in this method:
- (void)receiveData:(NSData *)data fromPeer:(NSString *)peer inSession:(GKSession *)session context:(void *)context

You are actually looking for two things.
1) a protocol that bother the sender and receiver understands;
2) a way to "Serialize" your objects to the data type that feeds to the protocol and "Deserialize" the data for objects.
For 1, you have various choices such as JSON, XML (string based) and bytes based protocol such as Protocol Buffer.
For 2, you have various parsers such as SBJson, TBXML and protobuf that helps you to encode and decode the protocol you choose in (1).

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart