There is a web page www.somepage.com/images/
I know some of the images there (e.g. www.somepage.com/images/cat_523.jpg, www.somepage.com/images/dog_179.jpg)
I know there are some more but I don't know the names of those photos. How can I scan whole /images/ folder?
you can use wget to download all the files
--no-parent to grab all the files below in the directory hierachy
--recursive to look into subfolders
wget --recursive --no-parent -A jpeg,jpg,bmp,gif,png http://example.com/
If they are on the webpage as an img tag you could try just searching the page source for an img tag. If you are using terminal you could also try using a tool such as wget to download the web page and then try using grep on the file for the img tag.
Related
Problem outline
I'm trying to get all the files from an URL: https://archive-gw-1.kat.ac.za/public/repository/10.48479/7epd-w356/data/basic_products/bucket_contents.html
which appears to be a list of contents of an S3 bucket with associated download links.
When I attempt to download all the files with the extension *.jpeg, I'm simply returned the directory structure leading up to an subdirectory with no downloaded files.
Things I've tried
To do this I've tried all the variations of leading parameters for:
$ wget -r -np -A '*.jpeg' https://archive-gw-1.kat.ac.za/public/repository/10.48479/7epd-w356/data/basic_products/
...that I can think of, but none have actually downloaded the jpeg files.
If you provide the path to a specific file e.g.
$ wget https://archive-gw-1.kat.ac.za/public/repository/10.48479/7epd-w356/data/basic_products/Abell_133_hi.jpeg
...the files can be downloaded, which would suggest that I must be mishandling the wildcard aspect of the download surely?
Thoughts which could be wrong owing to limited knowledge of wget and website protocols
Unless the fact that the contents are held in a bucket_contents.html rather than an index.html is causing problems?
I want to allow user to download some files using Luci web user interface of my openWrt linux .
I have uploaded my files in /etc and /tmp folders of openWrt.
But i dont know how can i give a url of this uploaded files to user .
Can any one help me ? Thanks in advance
The easiest way is to create a symlink to this file in /www directory. For example, to download /etc/passwd file
ln -s /etc/passwd /www/test
Then, in your web browser, go to 192.168.1.1/test to download the file.
On server A, I created a tar file (backup.tar.gz) of the entire website /www. The tar file includes the top-level directory www
On server B, I want to put those files into /public_html but not include the top level directory www
Of course, tar -xzif backup.tar.gz places everything into /public_html/www
How do I do this?
Thanks!
You can use the --transform option to change the beginning of the archived file names to something else. As an example, in my case I had installed owncloud in directory named sscloud instead of owncloud. This caused problems when upgrading from the *.tar file. So I used the transform option like so:
tar xvf owncloud-10.3.2.tar.bz2 --transform='s/owncloud/sscloud/' --overwrite
The transform option takes sed-like commands. The above will replace the first occurrence of owncloud with sscloud.
Answer is:
tar --strip-components 1 -xvf backup.tar.gz
I want to download all accessible html files under www.site.com/en/. However, there are a lot of linked URLS with post parameters on the site (e.g. pages 1,2,3.. for each product category). I want wget NOT to download these links. I'm using
-R "*\?*"
But it's not perfect because it only removes the file after downloading it.
Is there some way for example to filter the links followed by wget with a regex?
It is possible to avoid those files with a regex, you would have to use --reject-regex '(.*)\?(.*)' but it will work only with wget version 1.15, so I would recommend you to check your wget version first.
How to write a script to download all videos from the links in a webpage
Hey Guys,
I want to write a script to download all rails screen casts from this location http://railscasts.com/episodes/archive
Any ideas on how this can be automated?
I'd personally go with wget -l inf -r -np http://railscasts.com/episodes.