I cannot find a random walker, that starts with a given URL and goes to a random link in the page every X seconds. Just for fun.
it can be a Firefox extension
it can be a html source file
or it can be a webpage with this functionality somewhere on the web.
I suppose this should be a few lines in your favorite script?
Related
hii we have a problem we got a link(Url) of pdf from google Drive
url like This:-"https://drive.google.com/file/d/1xYjCqV4LBljBMvn6OsFyK8U6-RK3aGaB/view"
how we can convert this into pdf format.
There are at least 3 ways to include your file in a PDF
Just embed a link to the online location, not ideal from a security aspect.
Embed the media file to be extracted and viewed by another user, but requires them bypassing their security.
Include the graphics as simple images. (Audio would need to be attached if needed)
You asked how to print thus option 3 is the one needed to output a selection of frames as seen here (note the moving pencils on the left)
There are many tools to do that but the simplest I found was https://document.online-convert.com/convert-to-pdf which produced 2 images per page using 300 Tweens = 150 pages and the first 10 pages are dark fade in intro so can be discarded using a PDF arranger.
It seems, pdf.js itself requesting whole byte range requests of a PDF file. Instead, is it possible to request only 5 pages on PDF load, On scroll can able to load another set of 5 pages, like that.. Is there a way to achieve this by using pdf.js ?
Long story short - No.
PDF is not a contiguous storage format. If the PDF file is formatted for fast web view then you can get it to show page 1 whilst other pages are still streaming in, but you can't ask to start at a specific page or page range. Internally pdf uses a bunch of sections, links/pointers between them and digests. Think of them as wooden blocks with bits of string between them. You can't render anything until you have 'enough' of the file to provide the parts you need, but the organisation of the internal sections is pretty much random as far as your question is concerned.
The only way to get specific pages would be to have a server-side component split them out of the PDF file for you and make a new PDF file containing just those parts, but paging on to page 6 would mean opening a new document, etc.
Edit: There are startup params for Acrobat viewer that could allow you to set the first page to be displayed, and other viewers may offer this feature, but unless you have some very smart client-server interaction this would still require the entire PDF document to be present in the client first.
Edit 2: As per comment from #async5, PDF.js 'may' be able to do page-range loading. See this section of the PDF.js docs. But note that there are requirements on the web server that is serving the PDF file.
As described in an issue here, old versions of PDF.js did not handle linearized PDF files properly(as described by Peter in comment, when you try to load page 1000 it loads page 1-1000).
It seems the problem has been resolved at some point (I dont know specific version #) and now the behaviour when you set those params correctly (namely disableAutoFetch and disableStream both to true) and load page 1000, it would only load page 1000.
I want to download hundreds of pdf documents from a site. I have tried tools such as SiteSucker and similar, but it does not work, because there appears to be some "separation" between the files and the page that links to them. I don't know how to describe this in a better way, since I don't know that much about website programming or scraping. Any advice on what this can be and how one can circumvent it?
More specifically, I am trying to download pdfs of UN resolutions, stored on pages like this one: http://www.un.org/depts/dhl/resguide/r53_en.shtml
It appears there is an in-built "search function," on the UN site, which makes dummy scraping, like SiteSucker, not work as intended.
Are there other tools that I can use?
Clicking a link on the page you mentioned redirects to a page composed by two frames (html). The first one is the "header" and the second one loads a page to generate the PDF file and embed it inside. The URL of the PDF file is hard to guess. I don't know of free tool that could scrap this type of page.
Here is an example of the url in the second frame that ends to the PDF file:
http://daccess-dds-ny.un.org/doc/UNDOC/GEN/N99/774/43/PDF/N9977443.pdf?OpenElement
I found this link that describes how to add content to the PDF. So as I understood right the pages will be generated based on input NSString. So the count of pages depends on the length of the string.
I need to implement the same but with the images. So for example I have 10 images, and all of them will be places on by one on each page. Like first image on the first page, second images on the second page and etc.
I found also this library that can generate PDF files but seems I don't have chance to set count of pages. Is there any good solution how to implement it?
I can upload large document as pdf file into web page no problem. but i want to use arrows to navigate the book pages not to upload the whole book at once as this may take long.
can any one help how to do this in mvc app with or without database? if database is necessary would Mongodb be a better choice? i do not want people to download the book; they can just read it online?
First you cannot prevent people to download your content if you visually display it BUT you can discourage them by making it difficult to do so.
That being said you wouldn't have a need a database to do what you want to do. You can but it's not necessary. You can simply find some library online that handle PDF such as iTextSharp cut the book in 1 PDF per page with it when it get uploaded so you have bunch of small files.
Then the trick is simple you query the PDF library to load the file Page1.PDF (arbitrary name) extract text format and output as text nicely has HTML. when the person click the link Page 2 then reload the page with the new PDF to use for display.
Doing so prevent the user from seeing or having access to the PDF file itself and if he want to download it all he will have to copy paste every single page manually or by code if he's a dev. Most common user wont go around copy pasting manually 300 pages because of laziness.
What i would personally do is each file uploaded i would create a folder with the name of the book and call the files 1.pdf, 2.pdf .... per page. Like that if i query the listing of directories i get the list of all books, and if i check the count of files in it i know the total page number. That would allow me to run all that without database.