I'm currently developing a corporate intranet that serves large PDF files. Users get frustrated when they have to wait for entire PDF files to download before they can view them. I have used the embedded Google documents viewer ( http://googlesystem.blogspot.com/2009/09/embeddable-google-document-viewer.html ) on other public facing websites for lazy loading and ease of document navigation, but this is not feasible as the solution is required for an intranet. Is it possible to achieve lazy loading of a PDF nativity within a browser and if so what are the requirements for this to happen? I am using ASP .NET MVC 3.
Thanks
You should make sure that the PDF documents that your serve are 'linearized' (optimized for web). It allows the browser to download the PDF document partially to display typically the first page fast. When the user navigates to another page, again just a part of the PDF document is downloaded. Here is a good article on the topic:
http://www.jpedal.org/PDFblog/2010/02/linearized-pdf-files/
In this scenario you would not write directly to the Response stream.
First - this question has nothing to do with ASP.NET MVC..
Second - this question has nothing to do with lazy loading. Lazy loading is "pattern" in object-relational mapping, it is not synonym for streaming
Finally - it depends on the PDF viewer you use. Browser does not display PDF files, some plugin in the browser does, typically Adobe Reader. So your question in fact is :
Can i stream PDF file so that it can be open and read before it is whole on the client ?
As far as i know, yes you can. But you must use .NET streams - for example "plug" the HttpContext Response stream as output of your PDF generator.
Related
It seems, pdf.js itself requesting whole byte range requests of a PDF file. Instead, is it possible to request only 5 pages on PDF load, On scroll can able to load another set of 5 pages, like that.. Is there a way to achieve this by using pdf.js ?
Long story short - No.
PDF is not a contiguous storage format. If the PDF file is formatted for fast web view then you can get it to show page 1 whilst other pages are still streaming in, but you can't ask to start at a specific page or page range. Internally pdf uses a bunch of sections, links/pointers between them and digests. Think of them as wooden blocks with bits of string between them. You can't render anything until you have 'enough' of the file to provide the parts you need, but the organisation of the internal sections is pretty much random as far as your question is concerned.
The only way to get specific pages would be to have a server-side component split them out of the PDF file for you and make a new PDF file containing just those parts, but paging on to page 6 would mean opening a new document, etc.
Edit: There are startup params for Acrobat viewer that could allow you to set the first page to be displayed, and other viewers may offer this feature, but unless you have some very smart client-server interaction this would still require the entire PDF document to be present in the client first.
Edit 2: As per comment from #async5, PDF.js 'may' be able to do page-range loading. See this section of the PDF.js docs. But note that there are requirements on the web server that is serving the PDF file.
As described in an issue here, old versions of PDF.js did not handle linearized PDF files properly(as described by Peter in comment, when you try to load page 1000 it loads page 1-1000).
It seems the problem has been resolved at some point (I dont know specific version #) and now the behaviour when you set those params correctly (namely disableAutoFetch and disableStream both to true) and load page 1000, it would only load page 1000.
I want to download hundreds of pdf documents from a site. I have tried tools such as SiteSucker and similar, but it does not work, because there appears to be some "separation" between the files and the page that links to them. I don't know how to describe this in a better way, since I don't know that much about website programming or scraping. Any advice on what this can be and how one can circumvent it?
More specifically, I am trying to download pdfs of UN resolutions, stored on pages like this one: http://www.un.org/depts/dhl/resguide/r53_en.shtml
It appears there is an in-built "search function," on the UN site, which makes dummy scraping, like SiteSucker, not work as intended.
Are there other tools that I can use?
Clicking a link on the page you mentioned redirects to a page composed by two frames (html). The first one is the "header" and the second one loads a page to generate the PDF file and embed it inside. The URL of the PDF file is hard to guess. I don't know of free tool that could scrap this type of page.
Here is an example of the url in the second frame that ends to the PDF file:
http://daccess-dds-ny.un.org/doc/UNDOC/GEN/N99/774/43/PDF/N9977443.pdf?OpenElement
Here im hitting my head againt the wall.
My client provided a pdf with buttons(just like buttons,when user tap on button,it will load next page and previous page etc.).
This buttons will work only when we open it in adobe reader.
I tried the QLpreviewview,quickview but it is not working,all what i can do is just to load the pdf in the webview.
Can anyone please help me in how to load an interactive pdf in iOS.
Thanks in advance.
Have a look at PSPDFKit, it is the most advanced framework I've found for PDFs in iOS. They have an impressive list of customers as well.
It is a bit pricy though, but you have the option to get the Source Code too if you need to modify anything. Could be worth it if your client need that kind of performance and other features as well.
(I am not in any way affiliated with PSPDFKit)
The limitations are due to the capabilities (or non-capabilities) of the PDF viewer used.
Currently the leading PDF viewer on iDevices is PDFExpert by Readdle. Adobe Reader for iDevices is weaker, but can deal to some extent with form elements.
For page navigation etc. you might use links instead of button fields (as far as you can live with the capabilities of links, and not use JavaScript). Links are said to be handled properly with many PDF viewers.
You may have to require certain PDF viewers on instructional level, because you don't have control over the viewer used by the actual user. And, as you noticed, many PDF viewers are simply too dumb do deal with active elements.
Another approach would be looking at PDF-to-HTML5 converters, and serve HTML5 from a server.
Let's say I have a 400MB PDF, on a web server, and I want an ipad user to be able to open it and to start looking at it as soon as possible, without downloading the file entirely first. What are the options?
Is Safari able to stream a large PDF like this? Can it start showing the first pages while the file is still downloading?
Is there a way to build a native app to achieve this? If so, should the PDF be "spitted" first on the server? How?
Any tips on how to open a large PDF, in a friendly way and on an ipad, would be appreciated. Bonus points if the index of the PDF is accessible!
To clear things out:
Linearized PDF has its order of objects reordered, so that all required data to display page are in front of the others. You can read more details in PDF specification in Annex F (page 683)
In case You haven't found application, that supports Linearized PDF, the best solution from my perspective would be rendering pages at the server side and create custom protocol to transfer only a single page with the table of contents.
Unless You want to create custom PDF viewer with Linearized PDF, You just create small portable "page viewer", which can quickly show the table of contents and then ask server for exact page number.
Example Server Interface:
class Pages
{
string title;
int number;
}
Stream GetDocumentPage(Page n);
List<Page> GetDocumentPages();
For this, one can use open-source solutions (such as Ghostscript) to parse pdf file and render the required page. Then send the binary data over Your custom protocol and display page at Your small portable "page viewer".
This solution does not require downloading whole document as it can cause significant network traffic.
Hope this helps.
I am using ASP.NET MVC 4 for a web site. The site manages online events for our group and gives registered users access to online materials, archives of web events and instructional videos. I have built a system for uploading and managing the videos, now I need to build the Controller Actions to send the video files to the web page. We are using VideoJS as the viewer and I am pretty happy with that right now. We need to maintain security on the files so just having the files sit at a location on the web server doesn't seem to work for us.
My main question is what is a good method for returning the files to the viewer? I am used to using ActionResult and JSONResult classes, but they don't quite seem right for video files. The files can be VERY large, sometimes up to a GB or more. I see the MVC FileResult class, the FileStreamResult class and the FileContentResult class. Which one should I use and what other considerations should I be thinking about when I build this?
I appreciate your help.
Doug
You most certainly should not send the entire video as a response to the viewer, as they would be waiting around for a good while for it to download. You need to stream it to them. I imagine you'd need some kind of byte stream being returned from the controller.
There's a reason that places like YouTube offer their videos via flash - because the quality and rate can be controlled easily, and it offers a certain amount of copy protection (though it is not foolproof). I just did a quick Google search, and found this:
http://www.longtailvideo.com/jw-player/download/
Might be useful, but I can't vouch for it personally!
Apparently, Razor offers it's own handling of video files, that you might find useful:
http://www.asp.net/web-pages/tutorials/files,-images,-and-media/10-working-with-video
Also, HTML5 supports video streaming (which I'm sure you knew as VideoJS uses it):
http://www.w3schools.com/html/html5_video.asp