i would like to redirect www.abc.com/folder1/index.php to www.example.com/index.php what exactly happens is when i am in a sub folder like www.website.com/subdirectory1/ and i click on the home button i get taken to www.website.com/subdirectory1/index.php instead of www.website.com/index.php so i need the rewrite rules to fix the issue.
i have already started rewriting the links one by one like
redirect /content/index.php http://www.example.com/index.php
but it takes a lot of time when new links are created and i cant be changing each one of them everytime. Thank you
edit: this only happens when friendly url is turned on.
I recommend you to take a read on the official documentation it could save you many times.
Related
I'd like to know any Subpages of a certain URL. E.g. I have the URL example.com. There might exist the subpages example.com/home, example.com/help and so on. Is it possible to get all of such subpages without knowing there exact name?
I thought I can handle this problem with a web crawler. But it just crawls for pages mentioned on the page itself.
I hope you understand my problem and can help me with it.
Thank you!
To answer your question, yes. Scrapy "crawl" spiders work by setting rules that can be set to do exactly what you're trying to. When in doubt, always go to the docs!
Couple things to note:
You can create a crawl spider the same way when creating the generic spider!
scrapy genspider -t crawl nameOfSpider website.com
With a crawl spider, you then have to set rules to basically tell scrapy where and where not to go; how's your regex?!
class MySpider(CrawlSpider):
name = 'example.com'
allowed_domains = ['example.com'] # PART 1: Domain Restriction
start_urls = ['http://www.example.com']
rules = (
Rule(LinkExtractor(allow=('.*')), callback='parse_item'), # PART 2: Call Back
)
Now I copied and pasted this from the Official docs, and changed up what it should look like for you but I havent checked the code so yeah... teh logic is there though..
IThis works by getting ALL the link that it can see depending on the rules you set, does something with said link.
You want to restrict all other domains but the one your scrapinng
In the example I set the wildcard to literally accept every and any page in the domain... once you figure out tehs tructure of a website, you can use logic to build out what you need.
You should take a look at the docs more often though. I have been using scrapy for about 6-7 years and I still find myself going back to the man pages!
No, you can’t.
The way you describe the situation, the website intends those desired URLs to be secret.
Any way to find such URLs would be a security exploit that should be reported to the website owners right away so they can fix it.
I'm trying to make my site more SEO friendly and I' noticing that whenever I go to an product through either a tag or a different page (2,3,4 ect) that it adds it to the URL.
For example:
www.wisdomsurvival.com/Guardian-Survival-kit/culinary-can-of-preparedness-seeds.html?page=2
I would like to remove ?page=2 from the path
Opencart 1.5.4
Any help would be greatly appreciated.
EDIT: My main goal is to have one URL for each page instead of multiple paths. For example:
http://www.wisdomsurvival.com/person-guardian-preparedness-package-camping-bug-out
http://www.wisdomsurvival.com/camping-and-bug-out/person-guardian-preparedness-package-camping-bug-out
The first URL is a direct link, the second comes from clicking from a category, the third (not shown because I can only post 2 links) comes from clicking from a subcategory and the fourth (also not shown) from a manufacturer list.
I need to have them all either redirect to the first URL or just go directly to the first URL without redirecting, along with any other URLS such as the ones that have the page ID path or tag path.
I recognise that theme :-)
Where is the ?page=2 coming from as the link works perfectly without it. You need to trace the source of the link. First try the template views and see if it is a simple link edit in the layout that will accomplish what you need.
If not you may find the information is coded in the controller if it is being dynamically generated. Again you should be able to edit the code that generates the link there.
If not you may find that it is in a model that is being called. Again, just find the model and edit the link structure you find there.
The url on your page will only be a reflection of the url you generated somewhere else in order for the link to be followed in the first place.
Usually when I am building with opencart I find the theme modules are often not coded very well in terms of SEO. Fortunately with opencart these things are usually very easy to remedy.
Top trick -> I often stick additional bits into my urls that have no impact on the page generated but Google picks up on as keywords anyway.
If you post your code if you are having problems reformatting the link formats I will have a look for you,
Hope that helps,
Paul.
trying to create a pinterest link with javascript. It opens up pinterest, shows the correct images and description but when i click PIN IN it just refreshes and doesn't pin it.
Creating a custom link and heres a URL created that i think should be working -
http://pinterest.com/pin/create/button/?url=http%3A%2F%2Fsandbox.modernactivity.co.uk%2Findependent_02%2F%3Fattachment_id%3D743&media=http%3A%2F%2Fsandbox.modernactivity.co.uk%2Findependent_02%2Fwp-content%2Fuploads%2F2012%2F06%2FBBC%20-%20MEAT-NEW%20WEBSITE%20TEST%204%3A3%20to%2016%3A9%20cropping-743-still-150x84.jpg&description=Independent%20Films%2C%20%E2%80%98Meat%E2%80%99&ref=http%3A%2F%2Fsandbox.modernactivity.co.uk%2Findependent_02%2Fdirectors%2Fdaniel-levi%2Fshowreels%2Flive-action%2Fvideo%2F743%2F%23
Anyone know what might be wrong?
best, Dan.
Okay.. Let me start with a disclaimer. This answer might not even be right, but it did work for me. I had the same problem and my URL has lots of '+' in it.. which the URL encoded equivalent for a ' '. So, essentially pinterest seems to have a problem "pinning" them, although there seems to be no problem in rendering them...
Your URI seems to have a lot of spaces too..
so, if the URI is in your control, you may
Create the uri after URLEncoding them
Make sure that spaces and such like dont appear on the URI.
Looking through my IIS Logs I noticed that Pinterest was redirecting users to my website without a leading http:// even if specified in the address, this seems to be causing the error for me. Unsure how to fix this in IIS, but thought I'd throw you a clue I found.
Would be great if you guys could shed some light on this, has baffled me:
I was asked by a client if I could try and make the search term for his comedy night "sketchercise" put his website top of the Google ranking. I simply changed the title tag of the header for the whole site from "Allnutt and Simpson" to "Allnutt and Simpson - Sketchercise # Ginglik - Sketch Duo". It did the trick and now the site comes up top of the Google listing when typing in "sketchercise". However, it gives off this very strange link:
http://www.allnuttandsimpson.com/index.php/videos/
This is the link to the google search result too:
http://www.google.co.uk/search?sourceid=chrome&ie=UTF-8&q=sketchercise
This link is invalid, it doesn't make any sense. I guess it has something to do with the use of hash tags and the AJAX driven site, but before I changed the title tag, it linked to the site fine using the # tags. What is the deal with this slash?
The strangest part is that the valid URL for the videos page on that site is /index.php#vidspics, I have never used the word "videos" in a url!
If anyone can explain the cause of this or just help me stop it from happening, I'd be very grateful. I realise that this is an SEO question and I hate that stuff generally, but I hope you can see this is a bit of a strange case!
Just to compare, if you google "allnutt and simpson" it works just fine links to the site and all of it's pages absolutely fine as .php pages (and then my JS converts them to hash tags to keep things clean)
It's because there must be a folder called 'videos' under your hosted files, use an FTP client and check this.
Google crawls every folder and file unless you tell him not to do this, look for robot.txt files to learn how to avoid indexation.
Also ask google to remove that result when you solve this.
Finally that behaviour is not related with hash tags, these are just references to javascript in order to display the appropiate content in you webpage.
Not sure why its posted like this but the only way to stop that page from appearing is using a google webmaster account for this website and make sure the crawlers can't find this link anymore. The alternative is have the site admin put this tag, <META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW"> , in the header when isset($_REQUEST(videos)) is true.
The slash in the address is the parsed form of www.allnuttandsimpson.com/index.php?=videos. You can have the web server change all the php parameters into slashes to make the links look pretty.
Best option for correct results is to create a sitemap and submit it to https://www.google.com/webmasters/tools/ for that site. You will need access.
Oh forgot, the sitemap will make google see all the pages you want it to post, use this for the major pages like those in the main menu. To remove links you don't want requires a robots.txt in the main directory of the site.
I have a pager on a table using ajax and I would like each such request also to change the browser's url, so when I hit refresh button I won't skip back to first page. I was fighting the Url parameter of AjaxOptions, but it keeps winning over me. Please help.
Trim
You can safely change the URL past the hash mark without redirecting the page. However, the user can (in most browsers) navigate through these changes with the Back and Forwards buttons. This technique is usually called "history."
Because the technique is difficult to get working in all browsers, you'll want to use a framework. Take a look at http://www.mikage.to/jquery/jquery_history.html.
I can also recommend ExtJS's history stuff too. Take a look at this example:
http://www.extjs.com/deploy/dev/examples/history/history.html#main-tabs:tab2
Again, notice that not only does the URL change when the user does stuff, but changing the URL (via Back and Forward) also affects the page. This is good, awesome even, but means it must be done very carefully.
There is not really a quick and easy way to do this, here is an article on the topic. The problem is that not only does the Ajax have to generate the URLs, it also has to take those URLs into account when loading the page to get the appropriate content.