Get similar links from one site using wget

Get similar links from one site using wget - grep

I have a site (http://a-site.com) with many links like that. How can I use wget to crawl and grep all similar links to a file?
Follow
I tried this but this command only get me all similar links on one page but not recursively follow other links to find similar links.
$ wget -erobots=off --no-verbose -r --quiet -O - http://a-site.com 2>&1 | \
grep -o '['"'"'"][^"'"'"']*/follow_user['"'"'"]'

You may want to use the --accept-regex option of wget rather than piping through grep :
wget -r --accept-regex '['"'"'"][^"'"'"']*/follow_user['"'"'"]' http://a-site.com
(not tested, the regex may need adjustment or specification of --regex-type (see man wget), and of course add other options you find useful).

Related

Anydesk installation by bash script using wget

I'm trying to write a bash script for automating the installation of anydesk by wget with the help of the following code:
echo -e "[ - ] Installing AnyDesk..."
wget --max-redirect 1 --trust-server-names 'https://anydesk.com/en/downloads/thank-you?dv=deb_64' -O anydesk.deb
sudo apt install ./anydesk.deb
echo -e "[ ✔ ] AnyDesk ➜ INSTALLED\n"
The problem is that https://anydesk.com/en/downloads/thank-you?dv=deb_64 returns a HTML page, not a Debian package.
How can I parse the HTML page to find the download link to the Debian package?

I examined page source of https://anydesk.com/en/downloads/thank-you?dv=deb_64 and download is triggered by JavaScript depending on User-Agent of browser, wget does not support JavaScript execution therefore you are actually getting HTML page source not actual .deb file. Use tool which support JavaScript execution to get actual file.

You can run the following command:
wget -O anydesk.deb https://download.anydesk.com/linux/anydesk_6.2.1-1_amd64.deb
this will allow you to download Anydesk, via wget.

tmux: variable indicating whether text is selected?

I use vi keybindings in Tmux's copy-mode, and I'd like to make Esc clear the current selection if there is one, or exit copy-mode if nothing was selected.
bind -T copy-mode-vi Escape if-shell -F '#{selection_active_flag}' \
'send-keys -X clear-selection' \
'send-keys -X cancel'
I was hoping Tmux might expose a variable that indicates the selection state (I made up selection_active_flag to express my intent, it doesn't actually exist), similar to window_zoomed_flag (which does exist).
Is there a way to achieve this?

Tmux 2.6 introduced selection_present. As stated in the changelog,
Add selection_present format when in copy mode (allows key bindings that do
something different if there is a selection).
This is exactly what I was looking for, and though I'm running Tmux 2.6, it seems I have an outdated man page, as it made no mention of selection_present.
The final working solution is:
bind -T copy-mode-vi Escape if-shell -F '#{selection_present}' \
'send-keys -X clear-selection' \
'send-keys -X cancel'

How can I test a Rails app using WGET

I want to test my rails app using "wget", specifically the part that returns JSON-data. I don't really understand the syntax I should use. I have tried this:
wget --user=username#example.com --password=somepass localhost:3000/folders/1.json
and variations of it, without any success. Which is the exact syntax? Would it be better to use CURL instead?
-- edit --
I found at this blog:
http://blogs.operationaldynamics.com/andrew/software/research/testing-rest-the-hard-way
this suggestion:
$ wget -S -q --header "Accept: application/json" -O - http://localhost:3000/folders/1
but even when I add
--user=username#example.com --password=somepass
...I get 401 Unauthorised. The username is correct, I can login via the browser.

curl -u username:password http://localhost:3000/folders/1.json
Read more here http://curl.haxx.se/docs/manpage.html#-u

An alternative to curl is the less well known but equally capable httpie - https://github.com/jkbrzt/httpie - I find it to be a bit more straightforward and friendly to use, and includes syntax colouring for output.
http -a user:pass :3030/folders/1.json would do the trick here, I think.

Get URLs from a remote page and then download to txt file

I tried lots of suggestion but i can't find a solution (I don't know if it's possible) I use terminal of Ubuntu 15.04
I'd need to download in a text file all of internal and external links from mywebsite.com/links_ (all links start with links_) For example http://www.mywebsite.com/links_sony.aspx I don't need all other links ex. mywebsite.com/index.aspx or conditions.asp etc. I use
wget --spider --recursive --no-verbose --output-file="links.csv" http://www.mywebsite.com
Can you help me please? Thanks in advance

If you don't mind using a couple of other tools to coax wget, then you can try this bash script that employs awk, grep, wget and lynx:
#! /bin/bash
lynx --dump $1 | awk '/http/{print $2}' | grep $2 > /tmp/urls.txt
for i in $( cat /tmp/urls.txt ); do wget $i; done
Save the above script as getlinks and then run it as
./getlinks 'http://www.mywebsite.com' 'links_' > mycollection.txt
This approach does not load/need too many other tools; instead reuses commonly available tools.
You may have to play with quoting depending what shell you are using. The above works in standard bash and is not dependent on specific versions of these tools.
You could customize the part
do wget $1
with appropriate switches to meet your specific needs, such as recursive, spider, verbosity, etc. Insert those switches between wget and $1.

How to get fully working grep in git bash (msysgit) on windows?

I would like to use grep -o, but in git bash there is no -o option. Is there a way to get full working grep in git bash, just like it's in linux bash shell?

There is no -o flag for grep
http://pubs.opengroup.org/onlinepubs/9699919799/utilities/grep.html
You can use sed instead

There is an open issue for that on Github (even though it's under "nvm"). User UltCombo posted a workaround. Quoting:
Open <Git install directory>/bin and overwrite grep.exe with a more up to date version. I found two alternatives that provide -o support:
GnuWin32's grep 2.5.4 (link).
ezwinports' grep 2.10 (link). Note: You also have to extract libprce-0.dll in the same folder as grep.
Though ezwinports' grep port is much more up to date, I can't say whether any of these will cause stability/compatibility issues. I haven't found any issues yet, but use it at your own risk.
Marking this Community Wiki because it's really somebody else's work.
Alternatively, get the pretty awesome MSYS2 and enjoy full grep and co.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

Get similar links from one site using wget - grep

Related

Anydesk installation by bash script using wget

tmux: variable indicating whether text is selected?

How can I test a Rails app using WGET

Get URLs from a remote page and then download to txt file

How to get fully working grep in git bash (msysgit) on windows?

Categories

Resources