I've looked around but wasn't able to find what I was looking for. I'm looking for a way to automatically create short URLs displayed in the browser, not using a URL shortener. Basically I would like to re-create something like this:
idzr.org/1ptb
I upload screenshots to my server with "GrabUp" on a regular basis but it creates rather long URLs for example:
/2523e3c90d60f08e952215424e7c5d99.png
It's a bit annoying having to shorten them each time.
I have seen this method a lot lately with pretty much any file including html files. If this has been discussed already I'm sorry I'm posting it again. I just seem to be stuck.
Thanks in advance for any help & advice!
I don't know, what webserver do you use.
You write rule for rewrite
-- htaccess for Apache or equivalent for IIS
You push content to user thru your code, because browser doesn't know what content get from web server
-- use http header - MIME type
Related
I'd like to know any Subpages of a certain URL. E.g. I have the URL example.com. There might exist the subpages example.com/home, example.com/help and so on. Is it possible to get all of such subpages without knowing there exact name?
I thought I can handle this problem with a web crawler. But it just crawls for pages mentioned on the page itself.
I hope you understand my problem and can help me with it.
Thank you!
To answer your question, yes. Scrapy "crawl" spiders work by setting rules that can be set to do exactly what you're trying to. When in doubt, always go to the docs!
Couple things to note:
You can create a crawl spider the same way when creating the generic spider!
scrapy genspider -t crawl nameOfSpider website.com
With a crawl spider, you then have to set rules to basically tell scrapy where and where not to go; how's your regex?!
class MySpider(CrawlSpider):
name = 'example.com'
allowed_domains = ['example.com'] # PART 1: Domain Restriction
start_urls = ['http://www.example.com']
rules = (
Rule(LinkExtractor(allow=('.*')), callback='parse_item'), # PART 2: Call Back
)
Now I copied and pasted this from the Official docs, and changed up what it should look like for you but I havent checked the code so yeah... teh logic is there though..
IThis works by getting ALL the link that it can see depending on the rules you set, does something with said link.
You want to restrict all other domains but the one your scrapinng
In the example I set the wildcard to literally accept every and any page in the domain... once you figure out tehs tructure of a website, you can use logic to build out what you need.
You should take a look at the docs more often though. I have been using scrapy for about 6-7 years and I still find myself going back to the man pages!
No, you can’t.
The way you describe the situation, the website intends those desired URLs to be secret.
Any way to find such URLs would be a security exploit that should be reported to the website owners right away so they can fix it.
I use a website that has a URL like....
https://wwws.something.com/overview.event
I have never seen a period used in a URL like this before.
I cannot find anything on google or stack overflow of anyone describing this
What does it mean? How is it used?
To clarify it is the "overview.event" that I am confused about
Times where url was a path to file on server are gone. Now HTTP servers use rewriting (like mod_rewrite in Apache) to map url's to files with proper parameters.
Old PHP sites had url's like www.myblog.com/page.php?page=1 where page.php was actual file and ?page=1 was GET argument that was used by PHP interpreter.
Some people decided that pages looks nicer and are more readable if we do something like www.myblog.com/page/1 but there is no problem to do www.myblog.com/page.1 as well.
The URL just means that we want to mean it !
See informations on wikipedia : http://en.wikipedia.org/wiki/Uniform_resource_locator
You can have URL like http://www.example.com/image.jpg and have an gif image or a simple page or a video...
I have seen on varias different websites that when a forum post or something like that is doing they all have different URLs that make it look like they are in different directories, but I am sure they cannot make different directories for each post.
If you look at this website: https://oc.tc/forums/topics/5181a374ba6087261f000c59
The number at the end (5181a374ba6087261f000c59) changes for each post and it looks liek this is a different directory but I am sure it is not!
Could you please explain how they do this?
Thanks in advance!
Rob
Use apache .htaccess file to handle redirect to your php script
What they're doing there is providing the 518.... as a query string parameter. Their site interprets the request as http://oc.tc/forums/topics/{post}. It would be the same as doing something like http://oc.tc/forums/topics?post=5181a374ba6087261f000c59 (this is an example to show the idea).
Those sites use a technique called URL rewiring at Apache.
What they do is that they convert URL requests like this:
http://site.com/products/categoryA/myawesomeproduct
To something internal like:
http://site.com/?query=products/categoryA/myawesomeproduct
Then the process the rest in PHP. You can learn how to it with examples on the following link: http://roshanbh.com.np/2008/03/url-rewriting-examples-htaccess.html
Edit:
A full guide to redirects here: http://httpd.apache.org/docs/2.0/misc/rewriteguide.html
i have a website, its to exchange links, files... to say it quickly it's my 'version' of twitter+megaupload,
Well, users add links all the time and so on, but i would like user be able to syinch his bookmarks from the browser to the ones he has at his profile of mywebsite,
Where should i look into?
Basically i need to be able to:
- Acces bookmarks file (1)
- being able to send the urls to my service ( 2 )
- maybe adding the login feature (in the future)
I was google'ing about this for ages few weeks a go and i kind of give up, because i'm ok with PHP and JS, but with this plugin languages i'm very lost. So i decided posting here, wich always brings positive answers
(1) - > I don't even know where to start
(2) -> i was thinking to have a website.com/auto_import_no_confirm.php?url=[URL] and put it in a for each.
how many different languages and extension files do i have to work with? I really need any kind of tip with point (1)
feel like?
-edit-
Just found This -> https://developer.mozilla.org/En/Code_snippets/Bookmarks
wich really looks like i need, but where do i place this code?
thanks!
Might not be a bad question, but there are too many subtopics raised to answer that. (And there is too much tagspam as well. Break up your question into PHP- and Javascript-specific tasks, when you have devised the general application scheme.)
But to get started, download similar Firefox extensions (.xpi) and unzip them to inspect the general structure. You'll find examplary code for bookmark handling and invoking remote APIs pretty quickly. And basically you only need Javascript for the extension itself. (It sounds like your extension does not need much UI.)
And there are many tutorials on designing Firefox addons: http://roachfiend.com/archives/2004/12/08/how-to-create-firefox-extensions/ or http://www.google.com/search?q=firefox+develop+an+xpi
The good news first, you won't need much more than javascript if you just want to access bookmarks and send them to a server, neither on firefox nor on chrome.
But still you'll have to make yourself familiar with the apis of the browsers and learn how to develop extensions.
However, both Mozilla and Google provide all necessary information on their developer sites.
For Chrome, this is a good place to start, you'll find the api for bookmark access here.
The Corresponding site for Firefox can be found here, with information on bookmark access here.
I am unlucky to be in charge of maintaining some old Yahoo! Store built using their RTML-based platform.
Recently I've noticed that HTML code generated by some RTML functions is sprinkled all over with "padding images" (or whatever is the conventional name for those 1x1 pixel images used to enforce layout). I have nothing against using such images, but... all those images are supplied with an ALT attribute like this:
<img href="http://.../image1x1.gif" alt="pad">
With all due respect to the original authors of RTML, but they must have been smoking something when they came up with this "accessibility enhancement"... :-(
Anyway, here are my questions:
Does anybody know a list of all RTML functions that generate HTML with all these "pad" images?
Is there any way to get rid of all those alt="pad" attributes without rewriting a lot of RTML code?
NB: This may sound a little cynical, but improved accessibility is not the main goal here. The main goal is to stop exposing those moronic alt="pad" attributes to Google and other smart search engines. So client-side scripting is not going to help, as far as I know.
Thank you!
P.S. Probably, most of you are really lucky and never heard of RTML. Because if somebody would establish a prize for software products based on
commercial success
------------------
usability
ratio, this RTML-based "platform" would probably win the first place.
P.P.S. Apparently someone from Yahoo! finally listened, because I can no longer find those silly "pad" tags in the RTML generated for our store. Nevertheless, one of the ideas offered in response to my original question does provide a very practical solution - not just to the original problem but to any similar problem with RTML platform. See the winning answer - it's really good.
The only way I see is to have your own website front-end that will filter whatever you want from the RTML site....
for example, your rtml site is at http://rtmlusglysite.yahoo.com/store/XYZ01134 , you could host a simple PHP front-end at http:://www.example.com that would be acting like a "filtering" HTTP web proxy, so http://rtmlusglysite.yahoo.com/store/XYZ01134/item1234.rtml would be accessed by http://www.example.com/item1234.html
It's not an ideal solution, but it should work, and you could do some more fancy stuff.
Nice try from the other posters, but there is a very simple RTML command that will do it. . .
TEXT PAT-SUBST s GRAB
MULTI
HEAD
BODY
TEXT #var-with-alt-tag-equals-pad-in-it
frompat "alt=\"pad\""
topat ""
The above RTML will find all instances of alt="pad" and replace it with nothing.
Well you're right on RTML being relatively untraveled :)
Do you have a way to add your own attributes to these images tags? If so, would it be possible to override the alt attribute? If you specify alt="", I would think that would override Yahoo's... Otherwise consider putting a useful alt tag in there for the blind and dialup types.
It's the first time I'm hearing about this platform, but here is an idea: if you can add javascript to the pages, you could write a function that will run after the page has loaded and remove all the alt="pad" attributes from the page.
Unfortunately this solutions works only with browsers that know about scripting, so lynx or some other text based browsers might not support it.
I have shared a link official RTML guide from yahoo. Hope it will help. Thanks!
List of available RTML books and resources