Partly loading a PDF into memory - ios

is there a way to load (large) PDF files only partly? So, let's say: Don't load the complete PDF file, but only the first 5 pages.
Because I'm actually handling large PDF files (30 - 50 MB) and when I call CGPDFRetain the whole document, so the complete 30-50 MB are retained in memory.
Can somebody help me with that? Is it possible to fetch single pages out of PDF without first loading the complete PDF into memory?
Can somebody help me with that problem?
Update:
Due to the fact, that my app needs to support offline access, the PDFs should be loaded from local storage.
Update 2: I tried different strategies by now, but the app is still on memory edge, because I'm loading my PDF completely into the memory in one single step. But somehow it should be possible to support big PDF files, shouldn't it?

I don't know what CGPDFRetain is, so I might be totally off. PDF is designed in such a way that you only need parts of it to render it correctly. There is something called a "web optimized" PDF which has its objects arranged in a special way. Every webserver is able to send a byte range of a document, and these two mechanisms allow the partial loading of a PDF.
You should elaborate where you load the PDF.

It's not like this. CGPDFDocument POINTS to your a disk space and has parts cached in memory, but never the whole document.
There are some problems with CGPDFDocument getting too greedy with memory, but in that case just destroy and re-create the CGPDFDocument and you're fine. Otherwise, your app might just crash after CGPDFDocument has allocated too much memory.

Related

Massive text file, NSString and Core Data

I've scratched my head about this issue for a long, long time now, but I still haven't figured out a way to do this efficiently and without using too much memory at once (on iOS, where memory is very limited).
I essentially have very large plain text files (15MB on average), which I then need to parse and import to Core Data.
My current implementation is to have an Article entity on Core Data, that has a "many" relationship with a Page entity.
I am also using a slightly modified version of this line reader library: https://github.com/johnjohndoe/LineReader
Naturally, the more Page entities I create, the more memory overhead I create (on top of the actual NSString lines).
No matter how much I tweak the amount of lines per page, or the amount of characters per line, the memory usage goes absolutely crazy (~300MB+), while just importing the whole text file as a single string quickly peaks at ~180MB and finished in a matter of seconds, with the former taking a couple of minutes.
The line reader itself might be at fault here, since I am refreshing the pages in the managed context after they're done, which to my knowledge should release them from memory.
At any rate, does anyone have any notes, techniques or ideas on how I should go about implementing this? Ideally I'd like to have support for pages, since I need to be able to navigate the text anyway, and later loading the entire text to memory doesn't sound like much fun.
Note: The Article & Page entity method works fine after the import, but the importing itself is going way overboard with memory usage.
EDIT: For some reason, Core Data is also consuming ~300MB of memory when removing an Article entity from the context. Any ideas on why that might be happening, or how that could be remedied?

ImageResizer .net for multiple product images perfomance issues?

I'm building an Asp.Net MVC4 Application with product pages. I have come by the ImageResizer Library for handling and serving the images. My page has jpg thumbnails 160x160px in dimensions and 3~5KB size each.
To my understanding using the ImageResizer library i could just upload the original large product image 600 x 600px & 10~20KB and resize it on the fly to the thumbnail size when the visitor requests the page. Something like:
<img src="#Url.Content("~/images/imagename?width=160&height=160")" alt="">
Which i understand is fine for a couple of images but my product page consists of 20 to 100 product jpg unique thumbnails (depending on pagesize).
Should performance hurt with processing on-the-fly 20-100 pics each time? Has anyone faced a similar scenario? I could always go back back and generate 2 different images (thumbnail and large) during the upload process but i'm very curius if i could get away with just one image per product and dynamic resizing.
When i say performance i mean that anything above 0.5 - 1s extra response time is a no-no for me.
In documentation it is mentioned, that there's caching plugin, which improves performance by 100-10000X:
Every public-facing website needs disk caching for their dynamically resized images (no, ASP.NET's output cache won't work). This module is extremely fast, but decoding the original image requires a large amount of contiguous RAM (usually 50-100MB) to be available. Since it requires contiguous, non-paged, non-fragmented RAM, it can't be used a (D)DOS attack vector, but it does mean that there is a RAM-based limit on how many concurrent image processing requests can be handled. The DiskCache plugin improves the throughput 100-10,000X by delegating the serving of the cached files back to IIS and by utilizing a hash-tree disk structure. It easily scales to 100,000 variants and can be used with as many as a million images. It is part of the Performance edition, which costs $249. The DiskCache plugin requires you to use the URL API (read why).
http://imageresizing.net/plugins/diskcache
http://imageresizing.net/docs/basics
When it comes to websites, every operation that can be cached should be. This allows the server to deal with more visitors rather than more processing.
You could either use the caching plugin for ImageResizer, or manually write to file using a certain filename, e.g.: product_154_180x180.jpg where 154 is product id, and 180 is the width and height, then check for whether it exists when you want to display it.
If you do the latter, you may be able to use the server to manage this for you, by linking to the expected filename in the page source, and if it doesn't exist, the server then calls a script that resizes and writes the resized image to disk using imageresizer.
This last method will also avoid the call to ImageResizer saving you some processing power.

memory issue iPad 4.2 crashes

I am developing a application which receives 600-700 KB of XML data from the server. I have to do some manipulations in that data so once received the data the memory increases to 600 KB to 2 M.B. Already view occupied 4 M.B of memory in the application.
So while processing the XML data i m doing some manipulation(pre-parsing) and the memory increases to 600 K.B to 2 M.B and finally decreases to 600 K.B. due to increase in memory, application gives the memory warning. While getting memory warning i m releasing all the views in the navigation controller but it releases only 1 M.B of memory. Even though I release all the views the application is crashing.
Please help me out in this issue. It happens in iPad 4.2.
Thanks in advance
There's no magical answer here. You're using too much memory and you need to figure out how to use less. Without knowing more about your application it's difficult to be specific, though clearly loading in nearly 1Mb of data and playing around with it isn't helping.
Maybe you can stream the data rather than loading it all into memory? There's an open source library that helps: StreamingXMLParser.
Also, your view sounds huge (over a megabyte!). I'm sure there's some optimisation that can be performed there. Use Instruments to see where your memory is being used.
Maybe only 1MB is released because of a parameter value which can be altered or you may need to manually start a garbage collection operation during your development session, if relevant to the language in use. You could sectionalise the xml input if possible or you could invoke [your own] compact or compress of the xml when stored if you have access to the script or code in a way that allows it.

What is the fastest way for reading huge files in Delphi?

My program needs to read chunks from a huge binary file with random access. I have got a list of offsets and lengths which may have several thousand entries. The user selects an entry and the program seeks to the offset and reads length bytes.
The program internally uses a TMemoryStream to store and process the chunks read from the file. Reading the data is done via a TFileStream like this:
FileStream.Position := Offset;
MemoryStream.CopyFrom(FileStream, Size);
This works fine but unfortunately it becomes increasingly slower as the files get larger. The file size starts at a few megabytes but frequently reaches several tens of gigabytes. The chunks read are around 100 kbytes in size.
The file's content is only read by my program. It is the only program accessing the file at the time. Also the files are stored locally so this is not a network issue.
I am using Delphi 2007 on a Windows XP box.
What can I do to speed up this file access?
edit:
The file access is slow for large files, regardless of which part of the file is being read.
The program usually does not read the file sequentially. The order of the chunks is user driven and cannot be predicted.
It is always slower to read a chunk from a large file than to read an equally large chunk from a small file.
I am talking about the performance for reading a chunk from the file, not about the overall time it takes to process a whole file. The latter would obviously take longer for larger files, but that's not the issue here.
I need to apologize to everybody: After I implemented file access using a memory mapped file as suggested it turned out that it did not make much of a difference. But it also turned out after I added some more timing code that it is not the file access that slows down the program. The file access takes actually nearly constant time regardless of the file size. Some part of the user interface (which I have yet to identify) seems to have a performance problem with large amounts of data and somehow I failed to see the difference when I first timed the processes.
I am sorry for being sloppy in identifying the bottleneck.
If you open help topic for CreateFile() WinAPI function, you will find interesting flags there such as FILE_FLAG_NO_BUFFERING and FILE_FLAG_RANDOM_ACCESS . You can play with them to gain some performance.
Next, copying the file data, even 100Kb in size, is an extra step which slows down operations. It is a good idea to use CreateFileMapping and MapViewOfFile functions to get the ready for use pointer to the data. This way you avoid copying and also possibly get certain performance benefits (but you need to measure speed carefully).
Maybe you can take this approach:
Sort the entries on max fileposition and then to the following:
Take the entries that only need the first X MB of the file (till a certain fileposition)
Read X MB from the file into a buffer (TMemorystream
Now read the entries from the buffer (maybe multithreaded)
Repeat this for all the entries.
In short: cache a part of the file and read all entries that fit into it (multhithreaded), then cache the next part etc.
Maybe you can gain speed if you just take your original approach, but sort the entries on position.
The stock TMemoryStream in Delphi is slow due to the way it allocates memory. The NexusDB company has TnxMemoryStream which is much more efficient. There might be some free ones out there that work better.
The stock Delphi TFileStream is also not the most efficient component. Wayback in history Julian Bucknall published a component named BufferedFileStream in a magazine or somewhere that worked with file streams very efficiently.
Good luck.

Generate thumbnail images at run-time when requested, or pre-generate thumbnail in harddisk?

I was wondering, which way of managing thumbnail images make less impact to web server performance.
This is the scenario:
1) each order can have maximum of 10 images.
2) images does not need to store after order has completed (max period is 2 weeks).
3) potentially, there may have a few thousands of active orders at anytime.
4) orders with images will frequently visit by customers.
IMO, pre-generate thumbnail in hard disk is a better solution as hard disk are cheaper even with RAID.
But what about disk I/O speed, and resource it need to load images? will it take more resource than generate thumbnails at real-time?
It would be most appreciate if you could share your opinion.
I suggest a combination of both - dynamic generation with disk caching. This prevents wasted space from unused images, yet adds absolutely no overhead for repeatedly requested images. SQL and mem caching are not good choices, both require too much RAM. IIS can serve large images from disk while only using 100k of RAM.
While creating http://imageresizing.net, I discovered 29 image resizing pitfalls, and few of them are obvious. I strongly suggest reading the list, even if it's a bit boring. You'll need an HttpModule to be able to pass cached requests off to IIS.
Although - why re-invent the wheel? The ImageResizer library is widely used and well tested.
If the orders are visited frequently by customers, it is better to create the thumbnails ones and store on disk. this way the webserver doesn't need to process the page that long. It will speed up the loading time of your webpages.
It depends on your load. If the resource is being requested multiple times then it makes sense to cache it.
Will there always have to be an image? If not, you can create it on the first request and then cache it either in memory, or more likely a database, for subsequent requests.
However, if you always need the n images to exists per order, and/or you have multiple orders being created regularly, you will be better off passing the thumbnail creation off to a worker thread or some kind of asynchronous page. That way, multiple request's can be stacked up, reducing load on the server.

Resources