What's the difference between beginAt and gotoPage in JWebUnit? - jwebunit

JWebUnit.beginAt:
Begin conversation at a URL absolute or relative to base URL. Use getTestContext().setBaseUrl(String) to define base URL. Absolute URL should start with "http://", "https://" or "www.".
JWebUnit.gotoPage:
Go to the given page like if user has typed the URL manually in the browser. Use getTestContext().setBaseUrl(String) to define base URL. Absolute URL should start with "http://", "https://" or "www.".
So, one says "Begin conversation at URL absolute or relative to base URL", while the other says "Go to the given page like if user has typed the URL manually in the browser". This doesn't help me in the slightest in understanding them (well, specifically the former; the latter makes sense). What's the actual difference between them? Which should I be using, and when?

I finally did manage to find the answer in the source code.
beginAt does two things: start the browser, then call gotoPage with its argument. Thus, you need to use beginAt the first time, and gotoPage subsequent times. (Perhaps if managing multiple windows it has more use; I haven't dug that deeply.)

Related

How to make href's work as expected

I have two questions here, that I thought there were already asked, but I could not find anything related.
Let's suppose I have the following URL:
http://www.domain.com/folder/page
And I have an anchor like this:
Page2
First:
Of course when it is clicked, it will navigate to
http://www.domain.com/folder/page2
But if the user has this URL:
http://www.domain.com/folder/page/ <-- Note the last slash
Then the anchor will navigate to:
http://www.domain.com/folder/page/page2
The first question is:
How can I avoid this?
And the second question would be:
How to always do this?
I mean that even if the url ends with a slash or not, navigate to:
http://www.domain.com/folder/page/page2
I know I can do this with javascript, but the idea is to keep using the href without using javascript in every case this happens. I also know I can use relative urls starting with / to referrer the root, but I can't in this case because the url has some IDs in the middle that may change.
Your basic problem is that you have two URLs that resolve to the same resource.
Pick one of them to be canonical and redirect from the other one two it using HTTP.
Failing that, use root relative URIs:
href="/folder/page2"

Are protocol-relative URLs relative URLs?

So consider a protocol-relative URL like so;
//www.example.com/file.jpg
The idea I've had in my head for as long as I can remember is that protocol-relative URLs are in fact absolute URLs. They behave exactly like absolute URLs, and never do they work like relative URLs. I wouldn't expect this to make the browser go find something at
http://www.example.com///www.example.com/file.jpg
The URL defines the host and the path (like an absolute URL does), and the scheme is inherited from whatever the page used, and therefore it makes a complete unambiguous URL, i.e. an absolute URL.
Right?
Now, upon further research into this, I came upon this answer, which states;
A URL is called an absolute URL if it begins with the scheme and scheme specific part (here // after http:). Anything else is a relative URL.
Neither the question nor the answer specifically discuss protocol-relative URLs, so I'm mindful that it can just be an oversight in wording.
However, I'm now also now running into an issue in my development, where a system that only accepts absolute URLs doesn't function with protocol-relative URLs, and I don't know if that's by design or due to a bug.
The RFC3986 section which is often linked to in relation to protocol-relative URLs also splashes the word "relative" around a lot. 4.3 then goes on to say that absolute URIs define a scheme.
All this evidence against my initial assumption led me to the question;
Are protocol-relative URLs relative or absolute?
Every relative URL is an unambiguous URL given the URL it is relative to. So if your page is http://mypage.com/some/folder/ then you know the relative URL this/that corresponds to http://mypage.com/some/folder/this/that and you know the relative URL //otherpage.com/ resolves to http://otherpage.com/. Importantly, it cannot be resolved without knowing the page URL it is relative to.
A relative URL is any URL that is relative to something and cannot be resolved by itself. An aboslute URL does not require any context whatsoever to resolve.
What you are calling a “protocol-relative URL” WHATWG calls a “scheme-relative URL” in the URL Standard document, and it is not an absolute URL, but a relative URL.
Granted most sites available on HTTPS show the same content on the corresponding HTTP URLs, that is not necessarily the case, and it therefore makes sense a URL that does not include the scheme cannot be considered absolute.
From the document:
An absolute URL must be a scheme, followed by ":", followed by either a scheme-relative URL, if scheme is a relative scheme, or scheme data otherwise, optionally followed by "?" and a query.
Specifically answering your question, we have:
A relative URL must be either a scheme-relative URL, an absolute-path-relative URL, or a path-relative URL that does not start with a scheme and ":", optionally followed by a "?" and a query.
At the point where a relative URL is parsed, a base URL must be in scope.
Examples (brackets indicate optional)
path-relative URL [path segment][/[path segment]]…
about
about/staff.html
about/staff.html?
about/staff.html?parameters
absolute-path-relative URL: /[path-relative URL]
/
/about
/about/staff.html
/about/staff.html?
/about/staff.html?parameters
scheme-relative URL: //[userinfo#]host[:port][absolute-path-relative URL]
//username:password#example.com:8888
//username#example.com
//example.com
//example.com/
//example.com/about
//example.com/about/staff.html
//example.com/about/staff.html?
//example.com/about/staff.html?parameters
absolute URL: scheme:[scheme-relative URL][?parameters]
https://username:password#example.com:8888
https://username#example.com
https://example.com
https://example.com/
https://example.com/about
https://example.com/about/staff.html
https://example.com/about/staff.html?
https://example.com/about/staff.html?parameters
relative URL:
Anything from scheme-relative URL list
Anything from absolute-path-relative URL list
Anything from path-relative URL list
Note: This answer does not disagree with the first answer, but it was only somewhat clear to me that post answered the question after reading it several times and doing further research. Hopefully this answer spells it out better for others stumbling on this.

Jsoup parse link <a href="www.abc.com">

I want to extract links from html, using jsoup
Expected output: absolute link.
I use "abs:href" for that.
This works:
Jsoup.parse("<a \n\r\t href=\"http://www.ibm.com/123/?id=abc\">\nhaha</a>", "http://www.ibm.com");
delivers: http://www.ibm.com/123/?id=abc
This doesnt work:
Jsoup.parse("<a \n\r\t href=\"www.ibm.com/123/?id=abc\">\nhaha</a>", "http://www.ibm.com");
delivers: http://www.ibm.com/www.ibm.com/123/?id=abc
I know its kinda difficult to know whether "www.ibm.com" is an absolute or relative link. It might be a top level domain, but also a foldername. Any proven solutions? Just this hack comes into my mind:
String domain = url.replace("http://", "");
url.replace(domain + domain, domain);
Your second example is unambiguously a relative URL. An absolute URL, by definition, starts with a protocol (e.g. http or https). All browsers will give the same output for your example.
Can you provide an example URL that you're working with? Why does it have these pseudo-absolute URLs?

How do SO URLs self correct themselves if they are mistyped?

If an extra character (like a period, comma or a bracket or even alphabets) gets accidentally added to URL on the stackoverflow.com domain, a 404 error page is not thrown. Instead, URLs self correct themselves & the user is led to the relevant webpage.
For instance, the extra 4 letters I added to the end of a valid SO URL to demonstrate this would be automatically removed when you access the below URL -
https://stackoverflow.com/questions/194812/list-of-freely-available-programming-booksasdf
I guess this has something to do with ASP.NET MVC Routing. How is this feature implemented?
Well, this is quite simple to explain I guess, even without knowing the code behind it:
The text is just candy for search engines and people reading the URL:
This URL will work as well, with the complete text removed!
The only part really important is the question ID that's also embedded in the "path".
This is because EVERYTHING after http://stackoverflow.com/questions/194812 is ignored. It is just there to make the link, if posted somewhere, if more speaking.
Internally the URL is mapped to a handler, e.g., by a rewrite, that transforms into something like: http://stackoverflow.com/questions.php?id=194812 (just an example, don't know the correct internal URL)
This also makes the URL search engine friendly, besides being more readable to humans.

Can an URL shortener pass parameters?

I use bit.ly to shorten my urls.
My problem - paramters are not passed.
Let me explain I use http://bit.ly/MYiPhoneApps which redirects (let's say) to http://iphone.pp-p.net/default.aspx
Now when I try http://bit.ly/MYiPhoneApps?param=xx this param is not added to the resulting url.
I know I could create an extra "short url" including a paramter - so http://bit.ly/WithParam would result in http://www.mysite.com/somepath/apage.aspx?Par1=yy and so forth.
But what I want is to have a short URL directing to a page - and then I want to add a parameter to this shortened url - which shoul (of course) land at my page.
Is this a shortcome of bit.ly (and others are maybe able to do it) - or does "parameter forwarding" not work with 301 redirections?
Manfred
There's no technical reason why it couldn't be done. The service would simply have to look at what parameters it is being sent, and then rewrite the target URL accordingly.
The problem is that it's not necessarily well defined how to do that.
Suppose you have the url http://example.com/default.aspx?foo=bar, and it has the short url http://foo.com/ABCD. What should happen if you try to access http://foo.com/ABCD?foo=baz? Should it replace the value, so you get foo=baz? Should it append it to make foo=bar&foo=baz? If we include both, which order should they be in?
The system cannot know which parameters are safe to override and which are not, because sometimes, you DO want both of them in the URL, and it may matter what order things are added in.
You could argue "Well, just don't allow this for URLs where parameters are already present", but there's also the issue that it's going to complicate the process a lot more. Without this, you just lookup a key in a database and send a redirect header. Now, you need to also analyze the URL to check for parameters, and append part of the URL you were called by. That requires more system resources per redirect, which may become a big problem if your service is used very frequently - you'll need more server power to handle the same amount of redirects. I don't think that tradeoff is considered to be "worth it".
As mentioned in comments by rinogo and Jurgen
In Clickmeter
Destination URL : www.yoursite.com?myparam1={id1}&myparam2={id2}
Tracking link : www.go.clickmeter.com/38w2?id1=123&id2=abc
After click : www.yoursite.com?myparam1=123&myparam2=abc
In TinyUrl
Destination URL : http://x.com?a=1
Shorten URL : https://tiny url.com/y6gh7ovk
Shorten URL + param : https://tiny url.com/y6gh7ovk?a=2
Resultant URL : http://x.com/?a=1&a=2
Added space to post tinyurl
URL shortening associates a unique key based on a full URL (parameters and all), so it is not possible to pass parameters to a shortening service.
Typically
http://iphone.pp-p.net/default.aspx?param=10
must produce a different key to
http://iphone.pp-p.net/default.aspx?param=22
'Parameter forwarding' is simply not possible in these kinds of redirects, as parameters are not valid parts of a shortened URL is most (if not all) services.

Resources