Why doesn't pmwiki have an ID system like MediaWiki? - pmwiki

Every single site that runs on Mediawiki that I have ever visited has the option of replacing the title of an article with the following phrase in the url:?&curid=[any number]. For example: http://rationalwiki.org/wiki/?&curid=1999, https://en.m.wikipedia.org/w/index.php?&curid=2001
So, since PMWiki is wiki software like MediaWiki and has a url structure that is similar to Mediawiki, why don't pmwiki site urls have any kind of ID system?

In PmWiki, page urls are deduced from page names themselves.
For example, the page MyGroup.MyPage may be reached via either:
http://mywiki/?n=MyGroup.MyPage
http://mywiki/MyGroup/MyPage
(according to wiki's Clean Url configuration)
The SQLite PageStore cookbook recipe (ie. PmWiki's addon), would provide shortened urls.
It also should be noted that page names could remain short, using (:title ...:) markup in the page itself to provide a more detailed title.

Related

include the page id in a url on my umbraco site

I have a site running on Umbraco 7, that uses the default Url scheme, a url might look like this:
http://domain.com/page-name/subpage-name
As the content creators likes to "optimize" the pages on the website, they often change the page titles, and in the process change the url of the page, breaking all links linking to the page (google links, ad-words campaigns, partner sites ect.)
I would like to keep the page title in the url, for SEO purposes, and not be stuck with one URL when a page was first created, for this i was thinking of adding the id to the url of the page. I have seen many sites have a url that looks like this:
http://domain.com/page/id/subpage-name or http://domain.com/page/id-subpage-name
And then lookup the page based on the id instead of the name
Is it possible to achieve this with Umbraco?
I'd agree with Pekka on this one, it's super easy to create separate field(s) for page header/navigation title/browser title. In my opinion it's a better solution than adding ids to the URL, but that's just my two cents.
You should be able to make a custom URL handler like this: http://24days.in/umbraco/2014/urlprovider-and-contentfinder/ - the article is from 2014, so some stuff may have changed. But the concept should still be relevant.

How to delete old Google Urls with parameters

a month ago i relaunched a Website in Typo3 CMS. Before that, the site was hosted with Joomla CMS.
In Joomla Config, SEO Links were disabled, so Google indexed the Page Urls this:
www.domain.de/index.php?com_component&itemid=123....
for example.
Now, a month later (after the Typo3 Relaunch), these Links are still visible in Google because the Urls don't return a 404-Error. That's because "index.php" also exists on Typo3 and Typo3 doesnt care about the additional query string/variables - it returns a 200 status code and shows the front page.
In Google Webmaster Tools it's possible to delete single Urls from the Google Index, but that way i have to delete about 10000 Urls manually...
My Question is: Is there a way to remove these old Urls from the Google Index?
Greetings
With this amount of URL's there is only one sensible solution, implement the proper 404 handling in your TYPO3, or even better redirections to same content placed in TYPO3.
You can use TYPO3's handler (search for it in Install Tool > All configuration) it's called pageNotFound_handling, you can use options like REDIRECT for redirecting to some page or even USER_FUNCTION, which allow you to use own PHP script, check the description in the Install Tool.
You can also write a simple condition in TypoScript and check if Joomla typical params exists in the URL - so that easy way you can return custom 404 page. If it's important to you to make more sophisticated condition (for an example, you want to redirect links which previously pointed to some gallery in Joomla, to new gallery in TYPO3) you can make usage of userFunc condition and that would be probably best option for SEO
If these urls contain an acceptable number of common indicators, you could redirect these links with a rule in your virtual host or .htaccess so that google will run into the correct error message.
I wrote a google chrome extension to remove urls in bulk in google webmaster tools. Check it out here: https://github.com/noitcudni/google-webmaster-tools-bulk-url-removal.
Basically, it's a glorified for loop. You put all the urls in a text file. For example,
http://your-domain/link-1
http://your-domain/link-2
Having installed the extension as described in the README, you'll find a new "choose a file" button.
Select the file you just created. The extension reads it in, loops thru all the urls and submits them for removal.

Hide website filenames in URL

I would like to hide the webpage name in the url and only display either the domain name or parts of it.
For example:
I have a website called "MyWebSite". The url is: localhost:8080/mywebsite/welcome.xhtml. I would like to display only the "localhost:8080/mywebsite/".
However if the page is at, for example, localhost:8080/mywebsite/restricted/restricted.xhtml then I would like to display localhost:8080/mywebsite/restricted/.
I believe this can be done in the web.xml file.
I believe that you want URL rewriting. Check out this link: http://en.wikipedia.org/wiki/Rewrite_engine - there are many approaches to URL rewriting, you need to decide what is appropriate for you. Some of the approaches do make use of the web.config file.
You can do this in several ways. The one I see most is to have a "front door" called a rewrite engine that parses the URL dynamically to internally redirect the request, without exposing details about how that might happen as you would see if you used simple query strings, etc. This allows the URL you specify to be digested into a request for a master page with specific content, instead of just looking up a physical page at that location to serve.
The StackExchange sites do this so that you can link to a question in a semi-permanent fashion (and thus can use search engines with crawlers that log these URLs) without them having to have a real page in the file system for every question that's ever been asked (we're up to 9,387,788 questions as of this one).

Is there a way to find all the pages' link by a URL?

If I have a link say http://yahoo.com/ so can I get the links inside yahoo? For example, I have a website http://umair.com/ and I know there are just 5 pages Home, About, Portfolio, FAQ, Contact so can I get links as follows programmatically?
http://umair.com/index.html
http://umair.com/about.html
http://umair.com/portfolio.html
http://umair.com/faq.html
http://umair.com/contact.html
Define what you mean by "links inside yahoo".
Do you mean all pages for which there is a link to on the page returned by "http://www.yahoo.com"? If so, you could read the HTML returned by an HTTP GET request, and parse through it looking for <a> elements. You could use the "HTML Agility Pack" for help.
If you mean, "All pages on the server at that domain", probably not. Most websites define a default page which you get when you don't explicitly request one. (for example, requesting http://umair.com almost certainly returns http://umair.com/index.html). Very few website don't define a default, and they will return a list of files.
If you mean, "All pages on the server at that domain, even if they define a default page", no that cannot be done. It would be an extreme breach of security.
This could be done by a Web Crawler, read some basic information about it:
http://en.wikipedia.org/wiki/Web_crawler
Includes Open Source crawlers, see if any of them is what you are looking for.

Hidden file names in URLs

I usually like defining my pages to know exactly what page does what. However, on a number of sites, I see where the filename is hidden from view and I was just a little curious.
Is there any specific benefit of having URLs appear like this:
http://mydomain/my_directory/my_subdirectory/
As opposed to this:
http://mydomain/my_directory/my_subdirectory/index.php
Thanks.
Done correctly it can be better for:
the end user, it is easier to say and remember.
SEO, the page name may just detract from the URL in terms of search parsing.
Note: In your example (at least with IIS) all that may have happened is you've made index.php the default document of that sub directory. You could use both URLs to access the page which could again affect SEO page rank. A search engine would see both URLs as different, but the page content would be the same, resulting in duplicate content being flagged. The solution to this would be to:
301 redirect from one of the URLs to the other
Add a canonical tag to the page saying which URL you want page rank to be given to.
Some technologies simply don't match url with files. Java Servlet for examples.
These are not file names. These are URLs. Their goal is to describe the resource. Nobody cares whether you did it in PHP or ASP or typed your HTML in the Emacs. Nobody cares that you named your file index.php. We like to see clean URLs with clear structure and semantics.

Resources