How to detect all the urls of a website? - url

I wonder how to detect all the urls of a specified website, let's say, I know the website of https://stackoverflow.com/, how can I know that it has some urls like https://stackoverflow.com/questions and https://stackoverflow.com/tags, is there a method or a tool to know the result?

There are some possibilities:
If u don't want to write code you can use something like Xenu or Webspider to scan or save a website.
If you want to use it as part of your own tool you can write it in PHP:
Directory listening on foreign server is on:
$dir = "stackoverflow.com/";
foreach(scandir($dir) as $file){
print ''.$file.'<br>';
}
Directory listening is of:
Then you need to open the site per php_get_contents and filter for links per preg_match.

Related

dilemma of using "createHashHistory" vs "createBrowserHistory" incontext of deployment

i am using "gulp-connect" as a development server and i am trying to implement react router 1.0.0-rc1.
Currently i am using "createHashHistory" which adds junk something like: ?_k=ckuvup in the URL, which is deliberate as defined in the document. I am ok with it until i am sending query strings along with URL and my link looks something like this with the junk appending just after the domain name rather then at the end:
http://localhost:8080/#/?_k=y754gg/jobs?latitude=27.686784000000003&longitude=85.2690875&query_location=Liverpool, United Kingdom&query=fjdkf
Expected URL (something like this) :
http://localhost:8080/#/jobs?latitude=27.686784000000003&longitude=85.2690875&query_location=Liverpool, United Kingdom&query=fjdkf/?_k=y754gg
I could have used "createBrowserHistory" which has a much clear URL but the problem is:
1) Server configuration. Example provided only shows how to do in Express. I am planning to use nginx in production and am using gulp-connect in development. As i could not find any reference on how to do in this servers i had to choose "createBrowserHistory".
2) My backend is on rails and if i through my front end in "public" folder, URL with # should separate client and server routes. But i keep on thinking there must be a way to use createBrowserHistory with some configuration in nginx.
My priority from this question is the first part on appending the key at the end. Any reference on how configuration are done in different server will be appreciated.
You should be able to disable the URL hash by setting queryKey: false when creating your history:
var history = History.createHashHistory({
queryKey: false
});

how can i use wildcards in a web address to download a file where the filename periodically changes

i am downloading files from a web server into my IOS iPad application.
my problem is that now the hardcoded url addresses are subject to change.
how can i use wildcards in my url address to compensate for the changed address
e.g this is the current url address
http://www.testserver/modules/public/sheets/HZ_TECAPET__black_gb_DE_201301.pdf
the 201301 changes, so how can i code the url address using wildcard?
e.g http://www.testserver/modules/public/sheets/HZ_TECAPET__black_gb_DE_??????.pdf
the first part of the address remains static it s just the numbers at the end that are subject to change
thanks
That's a bit harder then. But you can do it on the server side. You can write a simple script (BASH) that will run on the server. It will count and list all files in the directory and save results in txt, which you can access by http://example.com/files.txt
Something like:
for file in "$sheets"/*
do
echo "$file" >> files.txt
done
EDIT:
Aha, so there actually is a pattern. Then you can try to download each of the possible patterns. Then check if the HTTP status code is 200 (OK) or 404 (Not found).

Dynamically creating web page (using portion of URL as a variable)

In a site I'm developing I have a page that presents a post based on the variable in the url:
http://www.mywebsite.com?id=18
So this would load the post who's ID is 18 in the mySQL database.
I would like the create the same effect, but with the url being something like:
http://www.mywebsite.com/articles/title-of-article-18/
Would there be a way to create these pages on the fly with dynamic post content, where the url would originally be created by:
"http://www.mywebsite.com/articles/" + postTitle
You are looking for mod_rewrite and rewriting of urls via htaccess.
What it does is it takes patterns from your url, and the htaccess file detects the pattern redirects that to http://www.mywebsite.com?id=18. Users still see the nice url.
The directory /articles/title-of-article-18/ will not actually exist, and the user never really reaches that location because the htaccess secretly changes the url that the server processes.
See
http://en.wikipedia.org/wiki/Rewrite_engine
or a random tutorial I found:
http://www.blogstorm.co.uk/htaccess-mod_rewrite-ultimate-guide/
try url-rewriting
http://www.simple-talk.com/dotnet/asp.net/a-complete-url-rewriting-solution-for-asp.net-2.0/

Creating a shortened URL for all objects in the database

I would like to display a shortened URL besides the content items on my site for ease of sharing.
What would be the most efficient way of doing so, and are there any suitable gems / libraries?
I am using rails on a mongodb/mongoid stack
should be simple enough (regardless if you are on Mongo / MySQL or anything else). what you need is a small collection (mongo if i may) that holds some kind of an MD5 hash of the real url you are after and the real url itself, for example:
ShortLink.create(:hash_link => Digest::MD5.hexdigest(resource_url(#resource)), :real_link => resource_url(#resource))
I suggest adding another route that catches those like this:
match "l/:key", "ShortLinks#show"
should be easy.
I think you can use bitly gem to shorten your URL.
The following link helps you to configure bitly:
http://www.marketingformavens.com/blog/url-shortening-bitly-ruby-on-rails

How to change filename prompt text browser Save As dialog?

In my web page (rendered by Rails), I'd like to let the user right-click on a photo to bring up the browser's Save As dialog, to let the user save the photo to their hard drive.
However, the photos on my server have unusual filenames (long hex names) with no file extension. The filename prompt in the Save As dialog has this ugly filename. If the user hits save, they'll end up with a poorly-named file, with no file extension.
The web page is aware of the photo's real file name (the name that came off the camera, for example). Is there a way for me to programmatically override the Save As dialog's filename prompt with a filename of my choosing?
I'm aware of the Content-Dispostion header, and that via this header a filename can be specified. However, I think that in order to be able to make use of this header, I need to load/render the entire file to the browser. If the asset to be made available for download is a movie, that loading of the file could timeout the browser...like, if it's a 100meg video.
Thoughts?
-A
I think I understand the problem here because I encountered (and resolved) at least part of it myself not too long ago.
I have some large mp3's and I link to them on my website
A few problems
I needed to set my content-disposition header to attachment in order to prevent files from automatically streaming whenever a user clicked the download button
my files are on a remote server
my files are large (100MB)
large files can tie up rails controllers if not handled properly
Now, Michael Koziarsky advises in this article that the best way to keep your rails processes free when serving large files, is to create a download action in your controller, and the do something like this (note the use of x_sendfile=>true):
def download
send_file '/path/to/podcast.mp3', :type => 'application/octet-stream', :disposition => 'attachment', :filename=>'something.mp3', :x_sendfile=>true
end
:x_sendfile tells apache to let the file through without tying up a rails controller process. The rest of the code sets the filename and the content-disposition header.
Great, but I'm on heroku, like everyone else nowadays. So I can't use x_sendfile.
I found that I couldn't modify the nginx configuration file either as it's locked down by heroku so it was not possible to get x-accel-redirect (nginx equivalent of x-sendfile) working
So, I decided to add a perl script (see below) to the cgi-bin on our asset-host and this script sets the content-disposition to attachment and gives our file a name too.
Instead of doing a restful download like this:
link_to "download", download_podcast_path(#podcast.mp3)
we just link to the mp3 making sure that we go in through the cgi-bin so that the perl script gets called on every mp3 that leaves the server
# I'm using haml
%a{:href=>"http://afmpodcast.com/cgi-bin/download.cgi?ID=#{#podcast.mp3}"}
download
The result is that my rails controller is no longer called into action when someone downloads a file
I found the perl script here and chopped it up a bit to work for me:
#!/usr/local/bin/perl -wT
use CGI ':standard';
use CGI::Carp qw(fatalsToBrowser);
my $files_location;
my $ID;
my #fileholder;
$files_location = "../";
$ID = param('ID');
open(DLFILE, "<$files_location/$ID") || Error('open', 'file');
#fileholder = <DLFILE>;
close (DLFILE) || Error ('close', 'file');
print "Content-Type:application/x-download\n";
print "Content-Disposition:attachment;filename=$ID\n\n";
print #fileholder
My code, is on github but you'll likely have all sorts of problems using it on your machine as i make heavy use of ENV variables that I store in bashrc and I have no documentation or tests ^hides^
You could do some smart server side url rewrite, like for example rewriting foo.mpeg to youveryuglyfilenamewithoutextension.
Set the Content-Disposition to "attachment; filename="...that's fine. "attachment" explicitly means it's not to be rendered in the browser, file renaming works nonetheless (or possibly particularly for that case).
Based on your comments, you have a few problems.
You want to set the filename using your Rails app.
The file is on a remote host and your Rails app is acting as a middleman.
The file might be big, so you want the file to be sent out to the browser as you receive it instead of queuing the whole thing.
Streaming only with Rails is tricky for a few reasons.
You would need an HTTP client that lets you access the message body as you receive data instead of blocking until you have everything. Net::HTTP is not that client. I'm not sure what library would be better suited.
Once you have a more event-driven way to get your file in pieces, you can pass a proc to the render:
render :text => proc { |response, output| ... }
output can be used like an IO object. Some servers may buffer before sending anyway, though, so that's something to look out for.
It would be easier not handle the byte-shuffling in Rails.
If your webserver or the proxy in front of your webserver supports the X-REPROXY-URL HTTP header, your application can set that header and your webserver or proxy will stream the file.
Perlbal is the only proxy server I know of that supports that header out of the box.
An Apache2 module is also available.

Resources