Extracting URLs from multiple pages of archive.org? - url

Usually i search for books at archive.org,i find them one by one,open each one in a new tab,and when there are 30 tabs i open new chrome windows.Each tab has a URL like
https://archive.org/details/in.gov.ignca.12237 .So at the end i got some chrome windows,each with 30 tabs.I want a list of the URLs of the PDF files which are available at each tab(1 PDF is at each tabs).Now,how could i do it as easy as possible,and for free?
Best,

Related

How Do We Create One PDF Page Per Slide (Not Per Fragment)?

It used to be that reveal.js would generate one PDF page per slide. Now, as of this issue, it generates one PDF page per fragment. I would like to generate a PDF in the old one-page-per-slide format, though.
This comment on that issue suggests that there is a configuration option for this, but it's not obvious what it is from the project documentation.
Which configuration option controls this? Or, is there another way to get reveal.js PDF export to do one page per slide?
From reveal.js' README.md, under PDF Export > Separate pages for fragments:
Fragments are printed on separate slides by default. Meaning if you have a slide with three fragment steps, it will generate three separate slides where the fragments appear incrementally.
If you prefer printing all fragments in their visible states on the same slide you can set the pdfSeparateFragments config option to false.
So, it seems you can export multiple fragments per page via:
Reveal.configure({ pdfSeparateFragments: false });

Change structure of automaticly generated urls in prestashop

I have this website of my client made by someone in prestashop which has search input, and after searching for an item it will display a list of matching products, each linking to its page with a url looking like this:
www.website.com/category/full-product-name.html?search_query=search_phrase&results=2
Where a regular url of the product page looks like this:
www.website.com/category/full-product-name.html
The problem is now the google indexes the duplicated urls as separate pages.
I've never worked with prestashop before but I've looked into the template files and found something what I'd assume is file responsible for generating the content with line responsible for the link looking like this:
<a class="product_img_link" href="{$product.link|escape:'html':'UTF-8'}" title="{$product.name|escape:'html':'UTF-8'}" itemprop="url">
Now as I don't know much about prestashop I don't want to blindly change stuff. How could I change it to have the links from the search results have the same structure as the normal product page urls?
Well I don't know what's the point of allowing search engines indexing search pages but the problem is here. For whatever reason the developers decided to include query string into search result links.
You can create an override of search controller (or custom search module would be even better) and throw that line out and you should have normal product links.

How to rewrite prestashop friendly urls? (not the htaccess file but url for each product)

For some reason in my shop some of the friendly urls are mixed up. For example I have a product with name "White wine glass 280 ml" and the friendly url is "250-red-wine-glass-300-ml.html". If I go to product edit page and select SEO and click generate URL than the url is corect and all is fine but I don't want to do that for each product in shop.
How can I do that for all the products at once?
I tried finding where are the urls stored in db to delete them and hope presta would autognerate them but I couldn't find where they are saved.
You have to do an override of Dispatcher.php class or use hookModulesRoutes in a Module.
Just create overrides/classes/Dispatcher.php and modify original methods/variable to get what you want to achieve here. You will have to modify how pages are grabed by PHP/Prestashop by altering the script getting the (for exemple) product ID and make a search inside your products url_rewrite to get the good one.
Honestly this is a tought job and lots of modules available are doing it very well, you should have a look as some are very cheap (less than 20$).

Creating a sample preview document

Is it possible to create a sample preview document easily using final document by replacing, hiding or replacing some of the pages by blank pages?
What I want to do is to create a preview document (very similar to the way google books or amazon show few pages of the entire book and hide many pages). Is this possible to generate such a document using some tricks, commands in latex?
The pagesel package allows you to select only certain pages to be output:
\documentclass{article}
\usepackage[-4,3,even,7-8]{pagesel}% Keep only pages 2, 4, 8
\usepackage{eso-pic,graphicx,multido}
\AddToShipoutPictureFG{%
\AtPageCenter{%
\makebox[0pt]{%
\raisebox{-.5\height}[0pt][0pt]{%
\resizebox{.8\paperwidth}{.8\paperheight}{\thepage}}}}}
\pagestyle{empty}% No headers/footers
\begin{document}
\multido{\i=1+1}{10}{\mbox{}\clearpage}% Create 10 pages
\end{document}
Using blank pages might be awkward from the end-user's perspective.

Find youtube links on other pages

I want to create a php page, that finds things on other sites.
Well, lets give an example to make you understand. Lets say there is a website called "anwebsite.com" and it have an page called "anwebsite.com/page.php"
I want to create a php script that find in that link if there are any youtube links (from embed for example) in his source, and get it.
Example:
$thesitelink = ("http://anwebsite.com/page.php") ( Lets say I put the link mannualy in the php page)
Then, here should be some scrips to find if on that page are any youtube links. If there is at least on, it doesn't matter if there are many, its enought at least one then put the link in a varialbe like $theyoutubelink. Like this:
$theyoutubelink = http://www.youtube.com/watch?v=xxxxxxxx
So the input data is $thesitelink and the output should be an youtube link (if there is one on that page)
What you need is a spider/crawler + parser.
First things first:
- Use CURL to get all the source html from the website crawled.
- Use Regular Expressions to parse Youtube links (don't forget short url ones)
Do you have any code made already?

Resources